A static poppler build: the easy way
👋 This page was last updated ~3 years ago. Just so you know.
So! Now our asset processing pipeline is almost complete. But we've just traded dependencies against CLI tools, for dependencies against dynamic libraries:
$ ldd ./target/debug/pdftocairo linux-vdso.so.1 (0x00007ffd615be000) libpoppler-glib.so.8 => /lib64/libpoppler-glib.so.8 (0x00007f2ba1bb4000) libgobject-2.0.so.0 => /lib64/libgobject-2.0.so.0 (0x00007f2ba1b59000) libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f2ba1a1e000) libcairo.so.2 => /lib64/libcairo.so.2 (0x00007f2ba1902000) libcairo-gobject.so.2 => /lib64/libcairo-gobject.so.2 (0x00007f2ba18f6000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2ba18dc000) libm.so.6 => /lib64/libm.so.6 (0x00007f2ba17fe000) libc.so.6 => /lib64/libc.so.6 (0x00007f2ba15f4000) /lib64/ld-linux-x86-64.so.2 (0x00007f2ba216c000) libpoppler.so.112 => /lib64/libpoppler.so.112 (0x00007f2ba1288000) libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f2ba11bd000) libgio-2.0.so.0 => /lib64/libgio-2.0.so.0 (0x00007f2ba0fe4000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f2ba0dc5000) libffi.so.6 => /lib64/libffi.so.6 (0x00007f2ba0db8000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f2ba0d40000) libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f2ba0c94000) libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007f2ba0c45000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f2ba0c0c000) libxcb-shm.so.0 => /lib64/libxcb-shm.so.0 (0x00007f2ba0c07000) libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f2ba0bda000) libxcb-render.so.0 => /lib64/libxcb-render.so.0 (0x00007f2ba0bca000) libXrender.so.1 => /lib64/libXrender.so.1 (0x00007f2ba0bbd000) libX11.so.6 => /lib64/libX11.so.6 (0x00007f2ba0a75000) libXext.so.6 => /lib64/libXext.so.6 (0x00007f2ba0a60000) libz.so.1 => /lib64/libz.so.1 (0x00007f2ba0a46000) libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007f2ba09c2000) libopenjp2.so.7 => /lib64/libopenjp2.so.7 (0x00007f2ba0968000) liblcms2.so.2 => /lib64/liblcms2.so.2 (0x00007f2ba0903000) libtiff.so.5 => /lib64/libtiff.so.5 (0x00007f2ba087c000) libsmime3.so => /lib64/libsmime3.so (0x00007f2ba0850000) libnss3.so => /lib64/libnss3.so (0x00007f2ba0712000) libplc4.so => /lib64/libplc4.so (0x00007f2ba0709000) libnspr4.so => /lib64/libnspr4.so (0x00007f2ba06c6000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f2ba06b3000) libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007f2ba05dd000) libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007f2ba05cf000) libgmodule-2.0.so.0 => /lib64/libgmodule-2.0.so.0 (0x00007f2ba05c8000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f2ba0581000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f2ba0556000) libxml2.so.2 => /lib64/libxml2.so.2 (0x00007f2ba03cd000) libXau.so.6 => /lib64/libXau.so.6 (0x00007f2ba03c7000) libwebp.so.7 => /lib64/libwebp.so.7 (0x00007f2ba0358000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f2ba0260000) libjbig.so.2.1 => /lib64/libjbig.so.2.1 (0x00007f2ba0252000) libnssutil3.so => /lib64/libnssutil3.so (0x00007f2ba021f000) libplds4.so => /lib64/libplds4.so (0x00007f2ba021a000) libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007f2ba01f9000) libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 (0x00007f2ba01d4000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f2ba019c000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f2ba0105000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f2ba00d9000)
Whew, that's a LOT of dependencies.
It is. There's a lot of stuff in there we don't strictly need, they're optional dependencies in the poppler dependency tree: we never need cairo to render to X11/xcb for example, only to an SVG surface.
On Linux, this mostly means installing a lot of different packages just to get our binary running. And those packages are named differently depending on which Linux distribution you're using: I use Fedora, Ubuntu and ArchLinux on a regular basis. Their contents can be different too, again, some package maintainers make Executive Decisions and that makes it really hard to ship cross-distro packages.
This is where folks who work on AppImage, snapcraft or flatpak would usually come in and pitch their solution to that problem.
These are all fine for various usecases, they each make their own compromises, and I'm familiar with them - I'm making an informed decision not to use them.
Similarly, if you're a Nix aficionado, I'm super happy for you, but go write your own article! I'll retweet them, too! Today we're just learning how to build / link / distribute software for internal use.
On Windows, I'd have to be careful to distribute a bunch of .dll
files
alongside the .exe
, which can get messy real fast if you have a ~/bin
folder
where you just throw all your CLI tools.
Anyway, we can address both problems by just making our own build of poppler as a static library.
Before we do, let's record the size of the pdftocairo
binary, before and after
stripping, when it's linked dynamically against poppler & friends:
$ ls -lhA ./target/debug/pdftocairo -rwxr-xr-x. 2 amos amos 48M Nov 25 12:12 ./target/debug/pdftocairo $ objcopy --compress-debug-sections ./target/debug/pdftocairo /tmp/pdftocairo.compressed-debug $ ls -lhA /tmp/pdftocairo.compressed-debug -rwxr-xr-x. 1 amos amos 16M Nov 25 13:03 /tmp/pdftocairo.compressed-debug $ objcopy --strip-all ./target/debug/pdftocairo /tmp/pdftocairo.stripped $ ls -lhA /tmp/pdftocairo.stripped -rwxr-xr-x. 1 amos amos 5.3M Nov 25 13:04 /tmp/pdftocairo.stripped
During my tenure as "chief fucking around and finding out officer", I've built software from sources "manually" a ton. If you somehow need to get good at this and have a month of free time, Linux From Scratch might be a nice thing to hyperfocus on.
For poppler, I determined that these were the dependencies I needed to build (in order):
pcre
libffi
zlib
libpng
glib
fontconfig
freetype
pixman
cairo
poppler
One could argue harfbuzz is missing from that
list, but things went fine without it. Because in this case, the input is the
result of Chrome's "Print to PDF" feature, the text is already laid out, and so
I don't think harfbuzz
is actually needed here.
I made a bunch of bash scripts for this, which is one tiny step up from "typing everything into a terminal once".
They all have the same structure: we'll follow the script for pcre
for
example.
#!/bin/bash -eux # ^ set some useful flags: # -e Exits immediately if a pipeline fails. For example sh exits if curl in # `curl | sh` dies. Super useful. # -u Treats unset variables as errors rather than just an empty string. # Also a lifesaver. # -x Prints commands before running them. Useful to follow what's going on, # especially for commands that don't print anything. # Absolute path to this script. /home/user/bin/foo.sh SCRIPT=$(readlink -f $0) # Absolute path this script is in. /home/user/bin SCRIPTPATH=`dirname $SCRIPT` # This is where headers, libraries, pkg-config/Gir/docs files will be installed export PREFIX=${SCRIPTPATH}/prefix mkdir -p ${PREFIX} # Some of these servers are _really slow_, it helps to be able to skip the # download when working on those scripts. Downloading can be skipped by # exporting `SKIP_DL=1` # # The `${FOOBAR:-otherwise}` expression will evaluate to "-otherwise" if # the `FOOBAR` variable is not set, which if [[ "${SKIP_DL:-unset}" == "1" ]]; then echo "Skipping download..." else rm -rf sources/pcre mkdir -p sources/pcre # Pipe curl into tar to extract on the fly (possible with tarballs, not so easy # with zips) # # curl's -f (--fail) flag will fail if it gets a non-2xx status code. # Unless it's a redirect (3xx), in which case -L (--follow) will follow # the redirect. # # tar's `x` extracts, `j` is for `.bz2`, and `--strip-components` extracts # `foobar-1.2.3/blah` as `blah` instead. `-C` "changes directory", essentially # specifying the destination. curl -f -L https://sourceforge.net/projects/pcre/files/pcre/8.45/pcre-8.45.tar.bz2/download | tar xj -C sources/pcre --strip-components 1 fi # Always re-build from scratch. Incremental rebuilds are always messy, # especially when reconfiguring, I'd rather not risk it. rm -rf builds/pcre mkdir -p builds/pcre # Like `cd`, but we can `popd` later to get out of it pushd builds/pcre # See later discussion export CFLAGS="-fPIC" # Look mom, it's autotools! # --prefix specifies "where to install stuff" # I'm doing this from Fedora, which uses `${PREFIX}/lib64` rather than # Debian's `${PREFIX}/${TRIPLET}/lib`, for some reason --libdir is needed. # --disable-shared says not to build shared/dynamic libraries (.so), and # --enable-static says to build static libraries (.a). ../../sources/pcre/configure \ --prefix=${PREFIX} \ --libdir=${PREFIX}/lib64 \ --disable-shared \ --enable-static \ # Build all that code, using all available processors/cores/hyperthreads make -j $(nproc) # Install all that make install # Kinda unnecessary since all of this is happening in a sub-shell that # exits immediately after, but the symmetry is nice. popd
So that's with autotools! Not too bad, all things considered.
That reminds me of a tweet by Tim Martin:
I saw a book entitled "Die GNU Autotools" and I thought "My feelings exactly". Turns out the book was in German.
The one surprising thing is this line:
export CFLAGS="-fPIC"
And that line exists because... the default Rust toolchain tries to make position-independent executables. We discussed this in Position-independent code, but back then we were hyperfocused on learning about ELF. Here we're just hitting it in the real world!
Position-independent code is generally a good thing, but with most C build
systems / compilers, you have to explicitly opt into it, unless you're building
shared objects (.so
). Since we're building static library archives here (.a
),
we have to tell the C compiler that we intend for that code to eventually be
linked into a position-independent executable (a PIE).
As a result, it'll use a different set of relocations, that are compatible with position-independent executable. If we forget that part, we'll just get an error at link-time, where the linker will be like "uhh yeah I can't make a PIE with those relocations". It's not as bad as silently building an executable that panics at runtime.
And autotools is relatively hands-off here: it's mostly concerned with making sure it doesn't have to enable a hundred workarounds, that are only needed if you're building for HP-UX or Irix or something.
I mean, technically libtool has a thing where it'll generate both PIC and
non-PIC objects, and there's the --with-pic
option but eh... I don't super
trust it.
Other build systems have other ways to specify that you want position-independent code!
Here's a cmake example:
# (most of the file is skipped, it looks a lot like the previous example) # Configure the build. # -S sets the source folder, -B sets the build folder, # -D specifies an option. "OFF" and "ON" are for features, "True" and "False" # are for booleans. We disable most features. cmake \ -S sources/poppler \ -B builds/poppler \ -DCMAKE_INSTALL_PREFIX=${PREFIX} \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_POSITION_INDEPENDENT_CODE=True \ -DBUILD_GTK_TESTS=OFF \ -DBUILD_QT5_TESTS=OFF \ -DBUILD_QT6_TESTS=OFF \ -DBUILD_CPP_TESTS=OFF \ -DBUILD_MANUAL_TESTS=OFF \ -DENABLE_BOOST=OFF \ -DENABLE_UTILS=OFF \ -DENABLE_CPP=OFF \ -DENABLE_GLIB=ON \ -DENABLE_GOBJECT_INTROSPECTION=OFF \ -DENABLE_GTK_DOC=OFF \ -DENABLE_QT5=OFF \ -DENABLE_QT6=OFF \ -DENABLE_LIBOPENJPEG=none \ -DENABLE_CMS=none \ -DENABLE_DCTDECODER=none \ -DENABLE_LIBCURL=OFF \ -DENABLE_ZLIB=OFF \ -DBUILD_SHARED_LIBS=OFF \ -DRUN_GPERF_IF_PRESENT=OFF \ ; # Build using aaaaall available processors/cores/hyperthreads cmake --build builds/poppler --parallel $(nproc) # Install cmake --install builds/poppler
The position-independent flag here is -DCMAKE_POSITION_INDEPENDENT_CODE=True
.
Here's an interesting distinction between autotools and CMake: although they
both have this "configure" stage, where they generate some more files that will
be required to build, CMake actually provides a --build
(and an --install
)
option!
Whereas autotools had us running make
by hand. Which is fine on Linux, macOS,
or even MinGW on Windows. But on Windows, I want to use
MSVC. Some projects ship a
.sln
(Visual Studio Solution) file directly, but these are not entirely
portable across Visual Studio versions, even though they do their best to
"upgrade" it when you open it.
So, if you were wondering why in the world people bothered with systems like CMake, that's why: not everything is Linux, not every C compiler is GCC, not every build systems is GNU make. It's super useful to have a tool that drives the whole process, and lets you use MSVC as a C compiler, or ninja as a build system, for example.
And Meson is another one of these!
Usage is really similar to CMake, here's my build script for freetype:
# (again, most of the file is not shown because it just downloads and extracts # the sources) meson setup \ builds/freetype \ sources/freetype \ --prefix ${PREFIX} \ --default-library static \ -D b_staticpic=true \ -D b_pie=true \ -D brotli=disabled \ -D bzip2=disabled \ -D harfbuzz=disabled \ -D png=disabled \ -D tests=disabled \ -D zlib=enabled \ ; meson compile -C builds/freetype meson install -C builds/freetype
As you can see, there's also a standard way to enable/disable features (like
brotli
, bzip2
, etc.), pick the installation prefix (--prefix
), the build
target (--default-library
), and some built-in options to enable
position-independent code (b_staticpic
and b_pie
: pretty sure the latter is
not needed here, but ah well).
And so, having written all my build scripts, and an all.sh
script to call them
all in order, I'm ready to make my static build. Here's an asciinema of it: it
completes in a little over a minute on a Ryzen 5950X with 128GB of RAM and a
decent NVMe SSD (on Fedora 35 inside VMWare Workstation Player with a Windows 11
host):
I find it really fun to watch it go! You can see some colors around the Meson and CMake parts. It's so nice!
In the end, we have 150MB's worth of libraries in our prefix:
$ du -hd0 prefix 151M prefix $ ls prefix/lib64 glib-2.0 libcairo.la libfontconfig.a libgmodule-2.0.a libpcrecpp.a libpcreposix.la libpng16.la libpoppler-glib.a libcairo.a libexpat.a libfreetype.a libgobject-2.0.a libpcrecpp.la libpixman-1.a libpng.a libz.a libcairo-gobject.a libffi.a libgio-2.0.a libgthread-2.0.a libpcre.la libpixman-1.la libpng.la pkgconfig libcairo-gobject.la libffi.la libglib-2.0.a libpcre.a libpcreposix.a libpng16.a libpoppler.a
How do we link about this? Thanks to pkg-config
!
The Fedora package for poppler-glib
installed the
/usr/lib64/pkgconfig/poppler-glibc.pc
, which contains all the libraries and
flags required to use it:
prefix=/usr libdir=/usr/lib64 includedir=/usr/include Name: poppler-glib Description: GLib wrapper for poppler Version: 21.08.0 Requires: glib-2.0 >= 2.56 gobject-2.0 >= 2.56 cairo >= 1.10.0 Requires.private: poppler = 21.08.0 Libs: -L${libdir} -lpoppler-glib Cflags: -I${includedir}/poppler/glib
And now that we've built and installed poppler-glib
into our own prefix, we
have a /home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig
file
prefix=/home/amos/bearcove/poppler-build/prefix libdir=/home/amos/bearcove/poppler-build/prefix/lib64 includedir=/home/amos/bearcove/poppler-build/prefix/include Name: poppler-glib Description: GLib wrapper for poppler Version: 21.11.0 Requires: glib-2.0 >= 2.56 gobject-2.0 >= 2.56 cairo >= 1.10.0 Requires.private: poppler = 21.11.0 Libs: -L${libdir} -lpoppler-glib Cflags: -I${includedir}/poppler/glib
(Note that the version we built is more recent than the version Fedora 35 ships!)
pkg-config
is relatively straightforward to use "manually". To show it off,
I made a sample C program:
#include <glib/poppler.h> #include <glib/gerror.h> #include <stdio.h> int main() { GError *error = NULL; PopplerDocument *doc = poppler_document_new_from_file("file:///tmp/export.pdf", NULL, &error); if (error) { printf("got GError: %s\n", error->message); g_clear_error(&error); exit(1); } printf("doc = %p\n", doc); }
Trying to compile it with gcc the naive way doesn't work, because include paths are not set:
$ gcc sample.c -o sample sample.c:1:10: fatal error: glib/poppler.h: No such file or directory 1 | #include <glib/poppler.h> | ^~~~~~~~~~~~~~~~ compilation terminated.
But even setting those include paths aren't enough:
$ gcc -I ~/bearcove/poppler-build/prefix/include/glib-2.0 -I ~/bearcove/poppler-build/prefix/include/poppler -I ~/bearcove/poppler-build/prefix/lib64/glib-2.0/include -I ~/bearcove/poppler-build/prefix/include/cairo sample.c -o sample /usr/bin/ld: /tmp/cc09zSS1.o: in function `main': sample.c:(.text+0x22): undefined reference to `poppler_document_new_from_file' /usr/bin/ld: sample.c:(.text+0x55): undefined reference to `g_clear_error' collect2: error: ld returned 1 exit status
...because now we're missing the libraries. And this is what pkg-config
solves!
$ pkg-config --cflags glib-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-4 -pthread
But this picks up the system / distro-installed .pc
files. If we want to use
the prefix we just made, we need to set the PKG_CONFIG_PATH
environment variable.
$ PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig \ pkg-config --cflags glib-2.0 -I/home/amos/bearcove/poppler-build/prefix/include/glib-2.0 -pthread -I/home/amos/bearcove/poppler-build/prefix/lib64/glib-2.0/include -I/home/amos/bearcove/poppler-build/prefix/include -DPCRE_STATIC
I've just shown the --cflags
flag which is necessary for compiling C files (it
sets defines and includes paths, mostly), and then there's --libs
, which is
necessary to link everything together.
Here's what linking dynamically (with the system packages) would look like:
$ pkg-config --libs glib-2.0 -lglib-2.0
And now statically (with our own prefix):
$ PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig \ pkg-config --libs glib-2.0 -L/home/amos/bearcove/poppler-build/prefix/lib64 -lglib-2.0 -pthread -lm -lpcre
Because in our case we're doing static linking, we also probably want to use the
--static
flag, which tells pkg-config to "be more aggressive when computing
dependency graph (for static linking)", and because we don't want to accidentally
end up dynamically linking against a system library, we can specify --env-only
,
which tells pkg-config to "look only for package entries in PKG_CONFIG_PATH".
This doesn't actually make a difference for glib-2.0
:
$ PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig \ pkg-config --env-only --static --libs glib-2.0 -L/home/amos/bearcove/poppler-build/prefix/lib64 -lglib-2.0 -pthread -lm -lpcre
But it does for cairo, for example!
$ PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig \ pkg-config --libs cairo -L/home/amos/bearcove/poppler-build/prefix/lib64 -lcairo
Without --static
, it only bothers specifying -lcairo
, I'm guessing because
it assumes by default we're linking dynamically, and libcairo.so
itself depends
on other libraries:
$ ldd /usr/lib64/libcairo.so | grep -E 'glib|pixman|pcre|png|freetype' libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f551e8cb000) libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f551e7b1000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f551e778000) libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f551dedb000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f551de1f000)
There's no such concept with .a
archives (static libraries). They're just a
bunch of .o
files:
$ ar tv /home/amos/bearcove/poppler-build/prefix/lib64/libpoppler-glib.a rw-r--r-- 1000/1000 16912 Nov 25 14:35 2021 poppler-action.cc.o rw-r--r-- 1000/1000 1992 Nov 25 14:35 2021 poppler-date.cc.o rw-r--r-- 1000/1000 116744 Nov 25 14:35 2021 poppler-document.cc.o rw-r--r-- 1000/1000 64688 Nov 25 14:35 2021 poppler-page.cc.o rw-r--r-- 1000/1000 10496 Nov 25 14:35 2021 poppler-attachment.cc.o rw-r--r-- 1000/1000 25032 Nov 25 14:35 2021 poppler-form-field.cc.o rw-r--r-- 1000/1000 65024 Nov 25 14:35 2021 poppler-annot.cc.o rw-r--r-- 1000/1000 7664 Nov 25 14:35 2021 poppler-layer.cc.o rw-r--r-- 1000/1000 10024 Nov 25 14:35 2021 poppler-movie.cc.o rw-r--r-- 1000/1000 11624 Nov 25 14:35 2021 poppler-media.cc.o rw-r--r-- 1000/1000 3032 Nov 25 14:35 2021 poppler.cc.o rw-r--r-- 1000/1000 6168 Nov 25 14:35 2021 poppler-cached-file-loader.cc.o rw-r--r-- 1000/1000 14136 Nov 25 14:35 2021 poppler-input-stream.cc.o rw-r--r-- 1000/1000 81248 Nov 25 14:35 2021 poppler-structure-element.cc.o rw-r--r-- 1000/1000 72280 Nov 25 14:35 2021 poppler-enums.c.o rw-r--r-- 1000/1000 29064 Nov 25 14:35 2021 CairoFontEngine.cc.o rw-r--r-- 1000/1000 142272 Nov 25 14:35 2021 CairoOutputDev.cc.o rw-r--r-- 1000/1000 7768 Nov 25 14:35 2021 CairoRescaleBox.cc.o
"GNU ar" is most probably named for "archive", and it's part of binutils
.
The t
flag prints a "lisT" of entries, and v
is "Verbose".
So, when we specify --static
(and --env-only
for good measure), pkg-config
lists dependencies explicitly:
$ PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig \ pkg-config --libs --env-only --static cairo -L/home/amos/bearcove/poppler-build/prefix/lib64 -lcairo -L/home/amos/bearcove/poppler-build/prefix/lib64 -lgobject-2.0 -L/home/amos/bearcove/poppler-build/prefix/lib64 -L/home/amos/bearcove/poppler-build/prefix/lib64 -L/home/amos/bearcove/poppler-build/prefix/lib64/../lib64 -lffi -lglib-2.0 -pthread -lm -lpcre -lpixman-1 -lfreetype -lz -lpng16 -lm -lm -lz
So, to link our sample C program statically against our libraries, all we need
to do is grab the output of pkg-config --cflags
and pkg-config --libs
and use
it for the appropriate build stages:
set -eux export PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig PKGS="poppler-glib gio-2.0 fontconfig libpng" CFLAGS=$(pkg-config --static --cflags ${PKGS}) LIBS=$(pkg-config --static --libs ${PKGS}) gcc ${CFLAGS} -c sample.c g++ sample.o -o sample ${LIBS} -static-libstdc++
The compile step ends up looking like this:
gcc -I${PREFIX}/include/poppler/glib -I${PREFIX}/include/glib-2.0 -I${PREFIX}/lib64/glib-2.0/include -I${PREFIX}/include -I${PREFIX}/include/cairo -I${PREFIX}/include/pixman-1 -I${PREFIX}/include/freetype2 -I${PREFIX}/include/libpng16 -I${PREFIX}/include/poppler -DPCRE_STATIC -pthread -c sample.c
And the link step, like this:
g++ sample.o -o sample -L${PREFIX}/lib64 -lpoppler-glib -lgobject-2.0 -lglib-2.0 -pthread -lm -lpcre -L${PREFIX}/lib64/../lib64 -lcairo -L${PREFIX}/lib64 -L${PREFIX}/lib64 -L${PREFIX}/lib64 -lm -lpixman-1 -lz -lm -L${PREFIX}/lib64 -lpoppler -lgio-2.0 -lgobject-2.0 -lffi -lgmodule-2.0 -lglib-2.0 -lm -lpcre -lfontconfig -pthread -lexpat -lfreetype -lz -lpng16 -lm -lm -lz -static-libstdc++
(With the actual prefix replaced with ${PREFIX}
for readability).
Specifying poppler-glib
wasn't enough in this case, adding gio-2.0
,
fontconfig
and libpng
was still needed. This feels like a bug in .pc
files, but then again, static linking is starting to feel like a lost art:
mainstream distributions like Ubuntu and Fedora are definitely focused on
dynamic linking above all else.
Here's another cool tip: pkg-config
can generate a graphviz (.dot
) file to
graph dependencies:
$ PKG_CONFIG_DEBUG_SPEW=1 \ PKG_CONFIG_PATH=/home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig \ pkg-config --static --env-only --digraph --exists --errors-to-stdout poppler-glib gio-2.0 fontconfig libpng > /tmp/deps.dot $ dot -Tsvg /tmp/deps.dot > /tmp/deps.svg
It seems cairo
has basically no dependencies in that graph, whereas in reality,
it definitely has a bunch:
$ ldd /usr/lib64/libcairo.so | grep -E 'fontconfig|freetype|pixman|png' libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f4777ecb000) libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007f4777e7c000) libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f4777db1000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f4777d78000
Looking closer, it appears those are listed, but as "private" dependencies:
$ cat /home/amos/bearcove/poppler-build/prefix/lib64/pkgconfig/cairo.pc prefix=/home/amos/bearcove/poppler-build/prefix exec_prefix=${prefix} libdir=/home/amos/bearcove/poppler-build/prefix/lib64 includedir=${prefix}/include Name: cairo Description: Multi-platform 2D graphics library Version: 1.16.0 Requires.private: gobject-2.0 glib-2.0 >= 2.14 pixman-1 >= 0.30.0 freetype2 >= 9.7.3 libpng Libs: -L${libdir} -lcairo Libs.private: -lz Cflags: -I${includedir}/cairo
So maybe it's just the graph generation that's broken, which explains why we need
to specify fontconfig
by hand, but not freetype2
.
Anyway! Let's get back on track.
With all that effort, our sample C program has almost no dynamic dependencies:
$ ldd ./sample linux-vdso.so.1 (0x00007ffc8e4c7000) libm.so.6 => /lib64/libm.so.6 (0x00007fcf437a8000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fcf4378e000) libc.so.6 => /lib64/libc.so.6 (0x00007fcf43584000) /lib64/ld-linux-x86-64.so.2 (0x00007fcf43897000
Similar to another randomly-chosen Rust CLI program:
$ ldd $(which sfz) linux-vdso.so.1 (0x00007ffceaba7000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb569f8b000) libm.so.6 => /lib64/libm.so.6 (0x00007fb569eaf000) libc.so.6 => /lib64/libc.so.6 (0x00007fb569ca5000) /lib64/ld-linux-x86-64.so.2 (0x00007fb56a72c000)
The dependencies there are:
linux-vdso.so
: a "virtual" dynamic shared object to make some syscalls faster (likegettimeofday
)libgcc_s.so
: some GCC builtins, like_Unwind_Resume
,__popcountdi2
, etc.libm.so
: math functions, liketan
,sqrtf32
, etc.libc.so
: the standard C library, in this case, glibcld-linux-x86-64.so.2
: the dynamic linker/loader, see this whole series
And the good news is: the -sys
crates for various gnome/glib-adjacent libraries
use pkg-config
to know what to link against!
Let's look at what gtk-rs/gir
generated for poppler-rs/sys/build.rs
for example:
// Generated by gir (https://github.com/gtk-rs/gir @ 8891a2f2c34b) // from ../../gir-files (@ c6afb5857607) // from ../gir-files (@ ec3e62ee546b) // DO NOT EDIT #[cfg(not(feature = "dox"))] use std::process; #[cfg(feature = "dox")] fn main() {} // prevent linking libraries to avoid documentation failure #[cfg(not(feature = "dox"))] fn main() { if let Err(s) = system_deps::Config::new().probe() { println!("cargo:warning={}", s); process::exit(1); } }
This uses the system-deps crate, which
README says it supports pkg-config
dependencies by looking at the
Cargo.toml
.
Do we have anything there?
# in `poppler-rs/sys/Cargo.toml` [package.metadata.system-deps.poppler_glib] name = "poppler-glib" version = "0.70" [package.metadata.system-deps.poppler_glib.v0_72] version = "0.72" [package.metadata.system-deps.poppler_glib.v0_73] version = "0.73" # (more versions omitted...)
We do!
And so if we build with no particular options, we can spy on cargo to see what it executes..
$ strace -ff -o /tmp/cargo-build -e 'execve' cargo build (output omitted)
strace tracks syscalls:
in this case we want to track execve
calls (executing another program), we
want to "follow forks" (-ff), ie. spy on all processes created by the top-level
cargo, such as compiled build scripts, and we want to output the logs, per-PID
(process identifier), to /tmp/cargo-build.PID
files.
$ rg 'pkg-config' /tmp/cargo-build.* /tmp/cargo-build.215066 1:execve("/home/amos/.cargo/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 2:execve("/home/amos/.local/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 3:execve("/home/amos/.fly/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 4:execve("/home/amos/.vscode-server/bin/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 5:execve("/home/amos/.local/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 6:execve("/home/amos/.fly/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 7:execve("/home/amos/.vscode-server/bin/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 8:execve("/home/amos/.cargo/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 9:execve("/usr/local/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = -1 ENOENT (No such file or directory) 10:execve("/usr/bin/pkg-config", ["pkg-config", "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cfe93b5010 /* 97 vars */) = 0 12:execve("/usr/bin/x86_64-redhat-linux-gnu-pkg-config", ["/usr/bin/x86_64-redhat-linux-gnu"..., "--libs", "--cflags", "cairo", "cairo >= 1.14"], 0x55cc12d26980 /* 96 vars */) = 0 (many others omitted)
rg
is ripgrep — it's excellent.
Hilariously, we can see that system-deps
ends up trying every possible path
for pkg-config
, starting with ~/.cargo/bin
, then ~/.local/bin
, then
~/.fly/bin
, etc. It does so for every invocation.
There could probably be benefits in looking it up once and just using that path, if someone feels like doing an easy, low-impact-but-nice PR!
So, because we know cargo
ends up invoking pkg-config
(and environment
variables are inherited by child processes, unless that behavior is specifically
disabled), we can just set PKG_CONFIG_PATH
as before, and everything sh-
$ PKG_CONFIG_PATH=~/bearcove/poppler-build/prefix/lib64/pkgconfig cargo build Compiling proc-macro2 v1.0.32 Compiling unicode-xid v0.2.2 Compiling syn v1.0.81 Compiling serde v1.0.130 (cut) Compiling poppler-rs v0.18.2 (/home/amos/bearcove/poppler-rs/poppler) error: linking with `cc` failed: exit status: 1 | = note: "cc" "-m64" (cut.) = note: /usr/bin/ld: /home/amos/bearcove/poppler-build/prefix/lib64/libpoppler-glib.a(CairoOutputDev.cc.o): in function `CairoOutputDev::~CairoOutputDev()': CairoOutputDev.cc:(.text+0x19a): undefined reference to `operator delete(void*, unsigned long)' /usr/bin/ld: CairoOutputDev.cc:(.text+0x223): undefined reference to `TextPage::decRefCnt()' /usr/bin/ld: CairoOutputDev.cc:(.text+0x237): undefined reference to `ActualText::~ActualText()' /usr/bin/ld: CairoOutputDev.cc:(.text+0x244): undefined reference to `operator delete(void*, unsigned long)'
...everything is NOT working, because, remember, a bunch of packages are actually missing!
So we got a couple options here — one is to make our own build.rs
script for
pdftocairo
. This is relatively simple to set up:
// in `pdftocairo/build.rs` fn main() { // poppler-glib requires poppler println!("cargo:rustc-link-lib=static=poppler"); // poppler is written in C++ println!("cargo:rustc-link-lib=static=stdc++"); // nobody bothers including this in their pkg-config files apparently println!("cargo:rustc-link-lib=static=png"); // cairo needs this println!("cargo:rustc-link-lib=static=freetype"); // cairo/freetype need this? // the freetype ChangeLog says the dependency graph looks like: // cairo => fontconfig => freetype2 => harfbuzz => cairo println!("cargo:rustc-link-lib=static=fontconfig"); // cairo also needs this println!("cargo:rustc-link-lib=static=pixman-1"); // fontconfig needs this (it's an XML parser) println!("cargo:rustc-link-lib=expat"); }
And then the build succeeds!
$ PKG_CONFIG_PATH=~/bearcove/poppler-build/prefix/lib64/pkgconfig cargo build Compiling pdftocairo v0.1.0 (/home/amos/bearcove/pdftocairo) Compiling glib-sys v0.14.0 Compiling gobject-sys v0.14.0 Compiling cairo-sys-rs v0.14.9 Compiling gio-sys v0.14.0 Compiling poppler-sys-rs v0.18.0 (/home/amos/bearcove/poppler-rs/sys) Compiling glib v0.14.8 Compiling cairo-rs v0.14.9 Compiling poppler-rs v0.18.2 (/home/amos/bearcove/poppler-rs/poppler) Finished dev [unoptimized + debuginfo] target(s) in 7.70s
As promised, it has "virtually no external dependencies":
$ ldd ./target/debug/pdftocairo linux-vdso.so.1 (0x00007ffe81bfe000) libm.so.6 => /lib64/libm.so.6 (0x00007f6350a45000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6350a2b000) libc.so.6 => /lib64/libc.so.6 (0x00007f6350821000) /lib64/ld-linux-x86-64.so.2 (0x00007f6351d44000)
And also, deliciously chonky:
$ ls -lhA ./target/debug/pdftocairo -rwxr-xr-x. 2 amos amos 73M Nov 26 14:31 ./target/debug/pdftocairo
(It was 48M when using dynamic linking).
Although, we have ways to make it smaller. By compressing debug sections:
$ objcopy --compress-debug-sections ./target/debug/pdftocairo /tmp/pdftocairo.compressed-debug $ ls -lhA /tmp/pdftocairo.compressed-debug -rwxr-xr-x. 1 amos amos 35M Nov 26 14:33 /tmp/pdftocairo.compressed-debug
(That variant was 16M with dynamic linking).
Or stripping debug info altogether:
$ objcopy --strip-all ./target/debug/pdftocairo /tmp/pdftocairo.stripped $ ls -lhA /tmp/pdftocairo.stripped -rwxr-xr-x. 1 amos amos 19M Nov 26 14:33 /tmp/pdftocairo.stripped
And that was just 5.3M with dynamic linking. So yeah! All that code isn't free, but we have a freaking PDF to SVG converted in a single, relatively self-contained binary now!
Running ./target/debug/pdftocairo
yields the same result as before. Yay!
We talked earlier about having a couple options: another one is to patch the
.pc
files ourselves. But this isn't great either, because system-deps
defaults to dynamic
linking and
patching it the "proper" way would involve upstreaming patches to a bunch of
crates.
Once again, the world as a whole has kinda given up on static linking, so we're on our own here - unless we look into AppImage/snapcraft/flatpak.
In the world of C/C++ libraries, as far as build systems go, it's kind of a free-for-all. There's the old-school autoconf users, the great CMake / Meson divide, and then Bazel, Buck, and many others.
We made everything more complicated by really, really wanting a static build, which very few people care about, so we had to jump through more hoops than one would normally have to.
pkg-config
is a tool that outputs the required compiler flags for compiling
and linking against C/C++ libraries. It ships with Linux distributions and
normally looks in a system prefix, like /usr/lib64/pkgconfig
. But it also
works with "custom prefixes" like we did.
Here too, static linking complicates things a little.
Thanks to my sponsors:
If you liked what you saw, please support my work!
Here's another article just for you:
One could say I have a bit of an obsession with build times.
I believe having a "tight feedback loop" is extremely valuable: when I work on a large codebase, I want to be able to make small incremental changes and check very often that things are going as expected.
Especially if I'm working on a project that needs to move quickly: say, the product for an early-stage startup, or a side-project for which I only ever get to do 1-hour work bursts at most.