Productionizing our poppler build

This article is part of the Don't shell out! series.

I was a bit anxious about running our poppler meson build in CI, because it's the real test, you know? "Works on my machine" only goes so far, things have a tendency to break once you try to make them reproducible.

And I was right to worry... but not for the reasons I thought. As I tried to get everything to build in CI, there was a Pypi maintenance that prevented me from installing meson, and then Sourceforge was acting up.

Apart from that it was relatively smooth sailing? Let's run through the .circleci/config.yml file section by section.

version: 2.1

This opts into the newest (at the time of this writing) version of CircleCI configs, that allow specifying workflows and stuff.

orbs: win: circleci/windows@2.4.1 aws-s3: circleci/aws-s3@3.0

Because we want to build on Windows, we'll need the circleci/windows orb, which gives us access to Windows executors. We'll also want to store artifacts on S3, and there's an orb for that too.

We've got three jobs: the Linux and Windows build (running in parallel), and finally an upload job:

workflows: version: 2 build: jobs: - x86_64-unknown-linux-gnu: context: [aws] - x86_64-pc-windows-msvc - upload: context: [aws] requires: - x86_64-unknown-linux-gnu - x86_64-pc-windows-msvc

Let's start with the Linux job, the most straighforward:

jobs: x86_64-unknown-linux-gnu: docker: - image: steps: - checkout - run: | meson setup build --buildtype release --default-library static --prefix /tmp/poppler-prefix meson compile -C build meson install -C build tar -czf poppler-prefix-x86_64-unknown-linux-gnu.tar.gz -C /tmp poppler-prefix - persist_to_workspace: root: . paths: ["poppler-prefix-x86_64-unknown-linux-gnu.tar.gz"]

Passing --buildtype is important (a debug build is almost 3x as large). The prefix is installed in /tmp and we generate a .tar.gz file where everything is prefixed by poppler-prefix/.

That build runs in a Docker container I built specifically for this purpose: mostly it has a recent Python 3, a C/C++ toolchain, Ninja, and meson. It's using the same base I described in My ideal Rust workflow, it's just a separate target:

############################################## FROM base AS meson ############################################## # Install python, C & C++ compiler, and ninja RUN set -eux; \ apt update; \ apt install --yes --no-install-recommends \ # # Python package manager (for latest meson) python3-pip \ # # C & C++ compiler gcc g++ \ # # Ninja build tool ninja-build \ ; # Install meson RUN set -eux; \ pip install meson \ ;

Now onto the Windows pipeline! That one's fun, because it involves MSVC (aka Visual Studio C++). CircleCI's Windows executors has some version of MSVC installed already, but meson apparently can't find it without a little help.

The best language to use there is probably PowerShell, which we've done a little of already, and the little additional challenge is that... usually to "set up" an MSVC command-line environment, you'd call a batch file (or "source" it?). But this is PowerShell.

Luckily, someone with a much better handle of Batch and PowerShell and MVSC in general has solved that problem before, and so I was able to just steal slightly adjust their work to get it working. It involves this wonderful bit of PowerShell:

PowerShell session
pushd 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build' cmd /c "vcvarsall.bat x64&set" | foreach { if ($_ -match "=") { $v = $_.split("="); set-item -force -path "ENV:\$($v[0])" -value "$($v[1])" } } popd write-host "`nVisual Studio Command Prompt variables set." -ForegroundColor Yellow

And here's the job definition itself:

x86_64-pc-windows-msvc: executor: name: win/default size: medium steps: - checkout - run: | pip install "meson==0.60.2" .\.circleci\call-vcvarsall.ps1 meson setup build --vsenv --buildtype release --default-library static --prefix C:/poppler-prefix meson compile -C build .\msvc-static-install.ps1 build C:/poppler-prefix tar -czf poppler-prefix-x86_64-pc-windows-msvc.tar.gz -C C:/ --exclude etc/fonts ./poppler-prefix - persist_to_workspace: root: "." paths: ["poppler-prefix-x86_64-pc-windows-msvc.tar.gz"]

Couple things here: --vsenv forces using MSVC (otherwise, if clang is detected, it will default to it). --buildtype and --default-library, we've already seen. Windows ships with an honest-to-cthulhu tar now, so we don't need to mess with .zip, and that leaves...

...what's with the --exclude?

Yes, that. Well... apparently fontconfig installs symlinks:

Shell session
$ tree -ah etc | sed 's/\/home\/amos\/bearcove\/poppler-prefix/PREFIX/' etc └── [ 32] fonts ├── [ 652] conf.d │   ├── [ 85] 10-hinting-slight.conf -> PREFIX/share/fontconfig/conf.avail/10-hinting-slight.conf │   ├── [ 89] 10-scale-bitmap-fonts.conf -> PREFIX/share/fontconfig/conf.avail/10-scale-bitmap-fonts.conf │   ├── [ 88] 11-lcdfilter-default.conf -> PREFIX/share/fontconfig/conf.avail/11-lcdfilter-default.conf │   ├── [ 88] 20-unhint-small-vera.conf -> PREFIX/share/fontconfig/conf.avail/20-unhint-small-vera.conf │   ├── [ 85] 30-metric-aliases.conf -> PREFIX/share/fontconfig/conf.avail/30-metric-aliases.conf │   ├── [ 79] 40-nonlatin.conf -> PREFIX/share/fontconfig/conf.avail/40-nonlatin.conf │   ├── [ 78] 45-generic.conf -> PREFIX/share/fontconfig/conf.avail/45-generic.conf │   ├── [ 76] 45-latin.conf -> PREFIX/share/fontconfig/conf.avail/45-latin.conf │   ├── [ 80] 49-sansserif.conf -> PREFIX/share/fontconfig/conf.avail/49-sansserif.conf │   ├── [ 75] 50-user.conf -> PREFIX/share/fontconfig/conf.avail/50-user.conf │   ├── [ 76] 51-local.conf -> PREFIX/share/fontconfig/conf.avail/51-local.conf │   ├── [ 78] 60-generic.conf -> PREFIX/share/fontconfig/conf.avail/60-generic.conf │   ├── [ 76] 60-latin.conf -> PREFIX/share/fontconfig/conf.avail/60-latin.conf │   ├── [ 84] 65-fonts-persian.conf -> PREFIX/share/fontconfig/conf.avail/65-fonts-persian.conf │   ├── [ 79] 65-nonlatin.conf -> PREFIX/share/fontconfig/conf.avail/65-nonlatin.conf │   ├── [ 78] 69-unifont.conf -> PREFIX/share/fontconfig/conf.avail/69-unifont.conf │   ├── [ 80] 80-delicious.conf -> PREFIX/share/fontconfig/conf.avail/80-delicious.conf │   ├── [ 80] 90-synthetic.conf -> PREFIX/share/fontconfig/conf.avail/90-synthetic.conf │   └── [ 1009] README └── [ 2.7K] fonts.conf 2 directories, 20 files

(They're absolute, too! I guess they don't expect people to copy prefixes across different computers). And it caused problems further down the line for me, which I won't get into right now to avoid spoilers.

So we're just not packing these files! I honestly doubt we're even using those fontconfig files for anything - by the time we deal with PDF files, they don't have any text left in them, just shapes.

Then we have the third job, which is boring. And boring is nice, in this case:

upload: docker: - image: 'cimg/python:3.10' steps: - attach_workspace: at: /tmp/workspace - aws-s3/copy: from: /tmp/workspace/poppler-prefix-x86_64-unknown-linux-gnu.tar.gz to: s3://bearcove-binaries/poppler-prefix/$CIRCLE_SHA1/ - aws-s3/copy: from: /tmp/workspace/poppler-prefix-x86_64-pc-windows-msvc.tar.gz to: s3://bearcove-binaries/poppler-prefix/$CIRCLE_SHA1/

And... tada!

The resulting archives are a little chonk, but I'm not about to argue for a fistful of megabytes

So, are we all done? All good?

Oh no we're not...

You mean to say... this wasn't a theoretical exercise? We're actually going to use these?

Yes we are...

Actually using this static poppler build

Well, easy right? Just make sure that before building any crates that depend on poppler, we download and extract them, and export PKG_CONFIG_PATH so that they're found and used. Right?


Oh no

The thing is...

Oh that's never a good sign.

Okay let's cut to the chase: yes, we can cobble CI pipelines together so that "it builds in CI". It's not that hard: we can add whatever we want as a shell script, so we can definitely run a curl/tar/export some vars and boom here we go.

But it's not enough.


No, it's not.


It's not! I want to be able to open those projects in VS Code and have them just build. The whole point of freeing ourselves of these dependencies is to not have to run a bunch of manual steps before we can be productive.


But even then! See, even today, as I had to set up salvage again, which involved installing libavif-tools to provide the avifenc CLI tool and... it was broken! It said something something ABI version mismatch I'm sad you can't have an avif file.

See, when you take on a dependency, it eventually becomes you prob-


Yes... and that was always the plan. But we're talking about SVG today.


And we're so close to the goal! All we need is a small build script and we'll be on our w-

NO. No. We don't "just" need "a small build script". Because there's the problem: our dependency tree looks like this right now, correct?

- salvage - poppler-rs - poppler-sys-rs - pkg-config-rs (omitted: high-level and `-sys` crates for cairo, glib, etc.)

Well you're omitting a bunch of crates but su-

STAY ON TOPIC. It looks like this. And you know the build order there?

The uh.. the arrows go up?

PRECISELY. It'll build poppler-sys-rs first, which relies on pkg-config-rs to invoke pkg-config to find libpoppler-glib.a and friends.

And if you add a build script to salvage you know when it'll run?

Well it'll... ah. Uh.


It'll run after all the dependencies are already built. So exporting an environment variable from it would do sweet nothing. The build will have already failed, or picked up some dynamic libraries (if we're on Linux and we have development packages installed).

Well we could simply fork, uh, all the-

Oh you want to fork cairo-rs, cairo-sys-rs, glib-sys, glib, glib-macros, gobject-sys, and gio-sys?

When you put it like that... no, I don't really want to.

Right. So stop being a tryhard, cut your losses and QUIT IT WITH THE STATIC BUILDS.


Mhhhhhhhhhhh. Unless...

long sigh

...unless we somehow patch pkg-config-rs.

Imagine we had a pkg-config-hack/Cargo.toml like this...

TOML markup
[package] name = "pkg-config" version = "0.3.24" edition = "2021" [dependencies] async-compression = { version = "0.3.8", features = ["gzip", "tokio"] } aws-config = "0.3.0" aws-sdk-s3 = "0.3.0" aws-types = "0.3.0" camino = { version = "1.0.5", features = ["serde1"] } futures = "0.3.17" tokio = { version = "1.15.0", features = ["full"] } tokio-tar = "0.3.0" tokio-util = { version = "0.6.9", features = ["io"] } walkdir = "2.3.2" color-eyre = "0.5.11" named-lock = "0.1.1" serde = { version = "1.0.132", features = ["derive"] } serde_json = "1.0.73" # this allows re-exporting most of the upstream pkg-config. note that we cannot # use a simple deps, we have to use a git dep because of how we wrap # it. # also, if glib-sys etc. end up bumping their version of pkg-config-rs, that one # will have to be bumped too. [dependencies.upstream] package = "pkg-config" git = "" rev = "49a4ac189aafa365167c72e8e503565a7c2697c2"

Then, in our top-level Cargo workspace, or package (if we're not using a workspace), we could do something like this...

TOML markup
# in `salvage/Cargo.toml` [dependencies] poppler-rs = "0.18.2" cairo-rs = { version = "0.14.9", features = ["svg"] } [patch.crates-io] pkg-config = { git = "", rev = "4f06bd6" }

And then it would replace the pkg-config crate with our own crate.

Fine, sure, okay. Super crimey but okay. Then what?

Well, then we have full control of what's happening. For starters, we need to re-export most of the pkg-config API as-is, except for the Config struct, because it's the one doing the actual lookup:

Rust code
// in `poppler-meson-crates/pkg-config-hack/src/` pub mod config; pub use config::cargo_config; mod prepare; pub use upstream::{Error, Library}; pub struct Config { inner: upstream::Config, } impl Config { pub fn new() -> Self { Self { inner: upstream::Config::new(), } } pub fn atleast_version(&mut self, vers: &str) -> &mut Config { self.inner.atleast_version(vers); self } pub fn print_system_libs(&mut self, print: bool) -> &mut Config { self.inner.print_system_libs(print); self } pub fn cargo_metadata(&mut self, cargo_metadata: bool) -> &mut Config { self.inner.cargo_metadata(cargo_metadata); self } pub fn env_metadata(&mut self, env_metadata: bool) -> &mut Config { self.inner.env_metadata(env_metadata); self } pub fn statik(&mut self, statik: bool) -> &mut Config { self.inner.statik(statik); self } pub fn probe(&self, name: &str) -> Result<Library, Error> { println!("cargo:warning=Probing library {}", name); prepare::prepare_pkgconfig_prefix(); let res = self.inner.probe(name)?; Ok(res) } }


And well, you can see in Config::probe, before we let the actual pkg-config crate call the pkg-config CLI utility, we call prepare.

And that one's a little complicated, but I'm sure y'all can make sense of it.

Rust code
// in `poppler-meson-crates/pkg-config-hack/src/ use async_compression::tokio::bufread::GzipDecoder; use aws_sdk_s3::Client; use camino::{Utf8Path, Utf8PathBuf}; use color_eyre::{eyre::eyre, Report}; use futures::TryStreamExt; use named_lock::NamedLock; use serde::{Deserialize, Serialize}; use std::{env::temp_dir, io, sync::Once, time::Instant}; use tokio::io::BufReader; use tokio_util::io::StreamReader; use walkdir::WalkDir; static COLOR_EYRE_ONCE: Once = Once::new(); pub fn prepare_pkgconfig_prefix() { // Make sure multiple build scripts don't try to prepare the prefix at the same time let lock = NamedLock::create("pkg-config-hack-prepare").unwrap(); let before_lock = Instant::now(); let _guard = lock.lock().unwrap(); println!( "cargo:warning=Acquired lock after {:?}", before_lock.elapsed() ); // Make color-eyre spit out useful errors COLOR_EYRE_ONCE.call_once(|| { std::env::set_var("RUST_LIB_BACKTRACE", "1"); color_eyre::install().unwrap(); }); let pkg_config_path = download_and_extract_prefix().unwrap(); println!("cargo:warning=Setting PKG_CONFIG_PATH={}", pkg_config_path); std::env::set_var("PKG_CONFIG_PATH", pkg_config_path); } const FORMAT_VERSION: u64 = 1; const POPPLER_MESON_VERSION: &str = "8c4735aba88cfa81bf43a6648f6864862dfa495c"; /// Downloads a prefix containing a static build of poppler and its dependencies /// (cairo, glib, etc.), and returns the absolute path to a pkg-config path. #[tokio::main] async fn download_and_extract_prefix() -> Result<Utf8PathBuf, Report> { // (Note: we're using tokio main to start an async executor just for // this function, since the AWS S3 requires one.) let temp_dir = Utf8PathBuf::try_from(temp_dir()).unwrap(); let work_dir = temp_dir.join("pkg-config-hack"); let prefix_dir = work_dir.join("poppler-prefix"); println!("Will download to prefix path {}", work_dir); // Have we already prepared the right version? let ticket_path = work_dir.join("ticket.json"); match Ticket::read(&ticket_path).await { Ok(ticket) => { if ticket.up_to_date() { return Ok(ticket.pkg_config_path.clone()); } } Err(e) => { println!("cargo:warning=Could not read ticket: {}", e); } } if is_dir(&work_dir).await { // This should only fail if multiple processes are messing with // the prefix at the same time, which would mean our named lock // doesn't work. tokio::fs::remove_dir_all(&work_dir).await?; } tokio::fs::create_dir_all(&work_dir).await?; let target = std::env::var("TARGET").unwrap(); println!("Building for target {}", target); let shared_config = aws_config::from_env().load().await; let client = Client::new(&shared_config); if shared_config.region().is_none() { panic!("AWS_REGION (and friends) must be set for pkg-config-hack to work"); } println!("AWS region: {}", shared_config.region().unwrap()); let key = format!( "poppler-prefix/{}/poppler-prefix-{}.tar.gz", POPPLER_MESON_VERSION, target ); println!("Fetching ({})", key); let resp = client .get_object() .bucket("bearcove-binaries") .key(&key) .send() .await?; println!("Resp = {:?}", resp); // AWS error => std::io::Error let body = resp .body .map_err(|e| io::Error::new(io::ErrorKind::Other, e)); // Stream<Bytes> => AsyncRead let body = StreamReader::new(body); // AsyncRead => AsyncBufRead let body = BufReader::new(body); // decompress gzip let body = GzipDecoder::new(body); // open tar archive let mut archive = tokio_tar::Archive::new(body); println!("Unpacking all entries..."); archive.unpack(&work_dir).await?; println!("Patching .pc files..."); for entry in WalkDir::new(&work_dir) { let entry = entry?; let path: Utf8PathBuf = entry.path().to_path_buf().try_into()?; if let Some("pc") = path.extension() { println!("Should patch {}", path); let old_contents = tokio::fs::read_to_string(&path).await?; let new_contents = old_contents // for Linux .replace("/tmp/poppler-prefix", prefix_dir.as_str()) // for Windows .replace("C:/poppler-prefix", prefix_dir.as_str()); tokio::fs::write(&path, &new_contents).await?; } } // Okay, prefix should exist now... assert!(is_dir(&prefix_dir).await); // The Linux build uses lib64 for some reason, even though it's built on // Ubuntu, which is a Debian derivative, which is never supposed to have // lib64: - if I had to pick some tool // to blame, it'd be meson, but I don't have to, thank Cthulhu. for libdir in ["lib", "lib64"] { let pkg_config_path = prefix_dir.join(libdir).join("pkgconfig"); if is_dir(&pkg_config_path).await { let ticket = Ticket { format_version: FORMAT_VERSION, poppler_meson_version: POPPLER_MESON_VERSION.to_string(), pkg_config_path: pkg_config_path.clone(), }; ticket.write(ticket_path).await?; println!("cargo:warning=Writing ticket: {:?}", ticket); return Ok(pkg_config_path); } } Err(eyre!( "pkgconfig dir not found, is the prefix even valid? (see file listing above)" )) } /// Returns true if the path is a directory that we can read. Errors out if /// it's anything other than a directory, or we couldn't get its metadata (b/c /// permissions, I/O error, anything else) async fn is_dir<P: AsRef<Utf8Path>>(path: P) -> bool { let path = path.as_ref(); let res = matches!(tokio::fs::metadata(path).await, Ok(meta) if meta.is_dir()); println!("is {} a dir? {}", path, res); res } #[derive(Debug, Serialize, Deserialize)] struct Ticket { format_version: u64, poppler_meson_version: String, pkg_config_path: Utf8PathBuf, } impl Ticket { async fn read<P: AsRef<Utf8Path>>(ticket_path: P) -> Result<Self, Report> { let serialized = tokio::fs::read(ticket_path.as_ref()).await?; Ok(serde_json::from_slice(&serialized[..])?) } async fn write<P: AsRef<Utf8Path>>(&self, ticket_path: P) -> Result<(), Report> { let serialized = serde_json::to_vec_pretty(self)?; tokio::fs::write(ticket_path.as_ref(), &serialized[..]).await?; Ok(()) } fn up_to_date(&self) -> bool { if self.format_version != FORMAT_VERSION { return false; } if self.poppler_meson_version != POPPLER_MESON_VERSION { return false; } true } }

That... that could be an series of its own, couldn't it?

Yes it could! But it's not that complicated, basically we:

  1. Make sure we're the only process trying to set up the prefix at any given time, using the named-lock crate.
  2. Determine a "stable" temporary directory, something like /tmp/pkg-config-hack on Linux
  3. Check for any pre-existing "install ticket" that matches what we're trying to install
  4. If there is such a ticket, we're done!
  5. If there isn't, or we can't read it, we use the official AWS Rust SDK to download the .tar.gz, buffer it, decompress it, and unpack it as a tar archive.
  6. While we're at it, we "fix" a bunch of absolute paths in the installed .pc files
  7. And finally we write an install ticket
  8. ...and set PKG_CONFIG_PATH to something like /tmp/pkg-config-hack/poppler-prefix/lib/pkg-config

Easy right?

That is terrifying.

I mean... we have a terrifying number of build dependencies now (maybe bringing in all of tokio, hyper, a rust gzip and tar implementation, the S3 sdk, etc., is a tiny bit of overkill), but I actually think it's pretty readable!

So now all we have to do is actually use it in salvage, say, maybe we do this:

Rust code
// in `salvage/src/` use cairo::{Context, SvgSurface}; use color_eyre::eyre::{self, eyre}; use poppler::{Document, Rectangle}; use std::{fs::File, path::Path}; #[derive(Clone)] pub struct Poppler {} impl Poppler { pub fn new() -> Self { Self {} } pub fn pdf_to_svg(&self, input: &Path, output: &Path) -> Result<(), eyre::Error> { let pdf_bytes = std::fs::read(input)?; let doc = Document::from_data(&pdf_bytes[..], None)?; let page =; let mut bb: Rectangle = Default::default(); page.get_bounding_box(&mut bb); let out = File::create(&output)?; let surface = SvgSurface::for_stream(bb.x2 - bb.x1, bb.y2 - bb.y1, out)?; let cx = Context::new(&surface)?; page.render(&cx); surface .finish_output_stream() .map_err(|e| eyre!("cairo error: {}", e.to_string()))?; Ok(()) } }

And now we... get a bunch of linking errors.

Ah, right, remember how the installed .pc files are not quite flexible enough to allow static linking? Well, that.

Well, no worries! We can just expose a config module from pkg-config-hack:

Rust code
// in `poppler-meson-crates/pkg-config-hack/src/` /// Prints required flags to build against a static build of poppler. pub fn cargo_config() { let target = std::env::var("TARGET").unwrap(); let is_msvc = target.contains("msvc"); // poppler-glib requires poppler println!("cargo:rustc-link-lib=static=poppler"); if is_msvc { // on windows-msvc everything is fine, we just need a couple libraries // for CommandLineToArgvW, SHGetKnownFolderPath println!("cargo:rustc-link-lib=shell32"); // for CoTaskMemFree println!("cargo:rustc-link-lib=ole32"); // for C++ stuff println!("cargo:rustc-link-lib=vcruntime"); } else { // that's where it goes on Fedora I guess ¯\_(ツ)_/¯ // (doesn't hurt other distros) println!("cargo:rustc-link-search=native=/usr/lib/gcc/x86_64-redhat-linux/11/"); // on linux, we want to link statically with the standard C++ library. println!("cargo:rustc-link-lib=static=stdc++"); } // nobody bothers including this in their pkg-config files apparently println!("cargo:rustc-link-lib=static=png16"); // cairo needs this println!("cargo:rustc-link-lib=static=freetype"); // cairo/freetype need this? // the freetype ChangeLog says the dependency graph looks like: // cairo => fontconfig => freetype2 => harfbuzz => cairo println!("cargo:rustc-link-lib=static=fontconfig"); // cairo also needs this println!("cargo:rustc-link-lib=static=pixman-1"); // fontconfig needs this (it's an XML parser) println!("cargo:rustc-link-lib=static=expat"); }

And then we just add a build script for salvage that calls that:

Rust code
// in `salvage/` fn main() { pkg_config::config::cargo_config(); }

Let's not forget to add pkg-config as a build dependency...

Shell session
$ cargo add -B pkg-config Updating '' index Adding pkg-config v0.3.24 to build-dependencies

And BOOM, it builds!

Shell session
$ cargo build Compiling salvage v1.3.0 (/home/amos/bearcove/salvage) error: linking with `cc` failed: exit status: 1 | = note: "cc" "-m64" (cut) "-Wl,-Bdynamic" "-lpoppler" "-lpoppler-glib" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lcairo" "-ldl" "-lpng16" "-lz" "-lfontconfig" "-lfreetype" "-lexpat" "-lpixman-1" "-lm" "-lgio-2.0" "-lresolv" "-lz" "-lgobject-2.0" "-lffi" "-lgmodule-2.0" "-ldl" "-lglib-2.0" "-lm" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lm" "-lcairo-gobject" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lcairo" "-ldl" "-lpng16" "-lz" "-lfontconfig" "-lfreetype" "-lexpat" "-lpixman-1" "-lm" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lm" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-L" "/home/amos/.rustup/toolchains/1.57.0-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/home/amos/bearcove/salvage/target/debug/deps/salvage-622b377b8581b717" "-Wl,--gc-sections" "-pie" "-Wl,-zrelro" "-Wl,-znow" "-nodefaultlibs" "-fuse-ld=lld" = note: ld.lld: error: undefined symbol: __res_nquery >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> did you mean: __res_nquery@GLIBC_2.2.5 >>> defined in: /lib64/ ld.lld: error: undefined symbol: __dn_expand >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> referenced 4 more times >>> did you mean: __dn_expand@GLIBC_2.2.5 >>> defined in: /lib64/ collect2: error: ld returned 1 exit status error: could not compile `salvage` due to previous error

Bwahahahah no it doesn't.

It doesn't.

And that one's a little nasty...

See, that libgio-2.0.a was built on Ubuntu 20.04. And right now I'm trying to build something against it, from Fedora 35.

And on Ubuntu, that symbol is there:

Shell session
$ docker run --rm -it ubuntu:20.04 /bin/bash root@98e93b395dfa:/# apt update && apt install -y --no-install-recommends binutils (cut) root@98e93b395dfa:/# nm -D /usr/lib/x86_64-linux-gnu/ | grep dn_expand 00000000000047f0 T __dn_expand root@98e93b395dfa:/#

But on Fedora... it's named something else:

Shell session
$ docker run --rm -it fedora:35 /bin/bash [root@91a05a86529e /]# nm -D /usr/lib64/ | grep dn_expand bash: nm: command not found [root@91a05a86529e /]# dnf provides nm Fedora 35 - x86_64 30 MB/s | 79 MB 00:02 Fedora 35 openh264 (From Cisco) - x86_64 2.5 kB/s | 2.5 kB 00:01 Fedora Modular 35 - x86_64 4.8 MB/s | 3.3 MB 00:00 Fedora 35 - x86_64 - Updates 22 MB/s | 17 MB 00:00 Fedora Modular 35 - x86_64 - Updates 4.0 MB/s | 2.8 MB 00:00 binutils-2.37-10.fc35.i686 : A GNU collection of binary utilities Repo : fedora Matched from: Filename : /usr/bin/nm binutils-2.37-10.fc35.x86_64 : A GNU collection of binary utilities Repo : fedora Matched from: Filename : /usr/bin/nm [root@91a05a86529e /]# dnf install binutils (cut) [root@91a05a86529e /]# nm -D /usr/lib64/ | grep dn_expand U __libc_dn_expand@GLIBC_PRIVATE

In fact, it's not even provided by

Shell session
(continued) [root@91a05a86529e /]# ldd /usr/lib64/ (0x00007ffd3b872000) => /lib64/ (0x00007f8d14459000) /lib64/ (0x00007f8d1467b000) [root@91a05a86529e /]# nm -D /lib64/ | grep __libc_dn_expand 000000000012eae0 T __libc_dn_expand@@GLIBC_PRIVATE

...but by

So, what is a bear to do?

Oh, me? I've mentally checked out several pages ago.

Well, we could statically link against libresolv, or patch... gio I guess? To not call those functions, but neither of these sound really appealing right now, when we can just commit more code crimes.

Crimes, crimes, crimes!

See, we don't actually need dn_expand - it's DNS-related, and we sure never expect poppler, by way of gio, to access the network. So we don't need DNS.

So we could just... stub it.

C code
// in `salvage/src/screw-libresolv.c` #include <stdio.h> #include <stdlib.h> static void bail(void) { printf( "The program's about to abort. I'm sure you're wondering why?\n" "\n" "Well, it's DNS. it's *always* DNS.\n" "See, this program links against poppler-glib, which links against glib,\n" "which includes gio, which includes a DNS resolver. so it links against\n" "libresolv.\n" "\n" "Who provides libresolv, you ask? Depends who you ask!\n" "\n" "On some platforms it's ISC, as part of BIND. On others, it's part of libc.\n" "It exposes symbols like dn_expand. On Fedora, the _actual_ symbol name\n" "(provided by the static or dynamic library) is __libc__dn_expand. On Ubuntu\n" "it's __dn_expand. Note that the C function name is just 'dn_expand'.\n" "\n" "That means if you build gio on Ubuntu, it'll expand the __dn_expand symbol\n" "to exist. Then if you try to link against gio on Fedora, it'll fail because\n" "the actual symbol name will be __libc_dn_expand (in the dynamic library it's\n" "even private (via @GLIBC_PRIVATE).\n" "\n" "How do we resolve this? Well, WE DON'T EVEN NEED DNS. At least not from gio.\n" "So, if we expose our own dummy symbols... that just abort... it should build,\n" "and nobody should ever have to read this!\n" ); abort(); } void __attribute__((weak)) __dn_expand(void) { bail(); } void __attribute__((weak)) __res_nquery(void) { bail(); }

Ohhhh and because it's a weak symbol it won't mess with an existing one?


So we just build it and link it...

Shell session
$ cargo add -B cc Updating '' index Adding cc v1.0.72 to build-dependencies
Rust code
// in `salvage/` fn main() { // 👇 new! if std::env::var("TARGET").unwrap().contains("linux-gnu") { cc::Build::new() .file("src/screw-libresolv.c") .warnings(false) .compile("screw-libresolv"); } pkg_config::config::cargo_config(); }

And just like that, it builds. It clocks in at 20MB, but it builds. (18MB with compressed debug sections, 14MB stripped).

More Windows sadness

It doesn't build on Windows though... Well it builds! It just doesn't link. And that's because our various -sys crates which are generated by gtk-rs/gir, have snippets like these!

Rust code
// in `poppler-sys-rs/src/` #[link(name = "poppler-glib")] #[link(name = "poppler")] extern "C" { // a whole bunch of functions }

And on Linux, this is fine! The toolchains there will pick up a .a like they would an .so, and they'll do the right thing.

But on Windows... it's not the case. I'm not sure exactly which tool is responsible, but at some point someone decides the symbol we want isn't poppler_document_new_from_data, it's actually __imp_poppler_document_new_from_data: as if poppler-glib.lib was an import library for poppler-glib.dll, and not a static library.

If we want static linking to succeed on Windows, we need to change it to something like this:

Rust code
// in `poppler-sys-rs/src/` #[link(name = "poppler-glib", kind = "static")] #[link(name = "poppler", kind = "static")] extern "C" { // a whole bunch of functions }

Won't that break dynamic linking?

Of course it will! And that's why what we really want, is to be able to conditionally use either static or dynamic linking. One way to do it (which I don't love, but we make do) is with cargo features!

If poppler-sys-rs had a static feature, we could have:

Rust code
// in `poppler-sys-rs/src/` #[cfg_attr(feature = "static", link(name = "poppler", kind = "static"))] #[cfg_attr(feature = "static", link(name = "poppler-glib", kind = "static"))] #[cfg_attr(not(feature = "static"), link(name = "poppler"))] #[cfg_attr(not(feature = "static"), link(name = "poppler-glib"))] extern "C" { // a whole bunch of functions }

And we could even get gir to generate those!

If only someone... contributed that feature upstream...

Oh no no no not ag-

Looking into gtk-rs/gir

...fuck's sake.

So the first wrinkle is that, at the time of this writing, there's gir 0.14, and gir's default branch, adn they generate incompatible code. So we'll need to regenerate all crates.

Weren't we planning on regenerating all of them anyway? (So they build correctly?)

Yeah. So it's not a big deal.

It wasn't too hard to find where #[link] attributes are generated:

Rust code
// in `gir/src/codegen/` (yes, with the trailing underscore) fn write_link_attr(w: &mut dyn Write, shared_libs: &[String]) -> Result<()> { for it in shared_libs { writeln!( w, "#[link(name = \"{}\")]", shared_lib_name_to_link_name(it) )?; } Ok(()) }

So, we just need to change that!

Rust code
fn write_link_attr(w: &mut dyn Write, shared_libs: &[String]) -> Result<()> { for it in shared_libs { let link_name = shared_lib_name_to_link_name(it); writeln!( w, r#"#[cfg_attr(feature = "static", link(name = "{}", kind = "static"))]"#, link_name )?; writeln!( w, r#"#[cfg_attr(not(feature = "static"), link(name = "{}"))]"#, link_name )?; } Ok(()) }

Let's look at the diff before and after that change on poppler-rs-sys, since I own that crate:

$ git diff (cut) -#[link(name = "poppler-glib")] -#[link(name = "poppler")] +#[cfg_attr(feature = "static", link(name = "poppler-glib", kind = "static"))] +#[cfg_attr(not(feature = "static"), link(name = "poppler-glib"))] +#[cfg_attr(feature = "static", link(name = "poppler", kind = "static"))] +#[cfg_attr(not(feature = "static"), link(name = "poppler"))]


Next, we to actually define the feature in Cargo.toml, and have it enable the feature for other crates.

Here's part of the code in gir that generates Cargo.toml files:

Rust code
// in `gir/src/codegen/sys/` fn fill_in(root: &mut Table, env: &Env) { // (cut) { let features = upsert_table(root, "features"); let versions = collect_versions(env); versions.keys().fold(None::<Version>, |prev, &version| { let prev_array: Vec<Value> = get_feature_dependencies(version, prev, &env.config.feature_dependencies) .iter() .map(|s| Value::String(s.clone())) .collect(); features.insert(version.to_feature(), Value::Array(prev_array)); Some(version) }); features.insert( "dox".to_string(), Value::Array( env.config .dox_feature_dependencies .iter() .map(|s| Value::String(s.clone())) .collect(), ), ); } // (cut) }

You can see it's using an upsert_table helper: it's actually trying to modify the manifest in-place, because it means for those to be edited by humans.

If we just add this:

Rust code
features.insert( "static".to_string(), Value::Array( env.config .external_libraries .iter() .map(|l| Value::String(format!("{}/static", l.crate_name))) .collect(), ), );

We're good to go! Here's the PR I opened, which as far as I can tell won't land this year.

Oh, that's cute.

But yeah, I've regenerated everything with it, in bearcove/gtk-rs-core and bearcove/poppler-rs, making sure poppler references the newer crates, and having itself a static feature:

TOML markup
# in `poppler-rs/poppler-rs/Cargo.toml` [package] name = "poppler" version = "0.1.0" edition = "2021" [dependencies] libc = "0.2.107" bitflags = "1.3.2" [dependencies.glib] package = "glib" version = "0.15.0" git = "" branch = "amos/static-build" [dependencies.cairo] package = "cairo-rs" version = "0.15.0" git = "" branch = "amos/static-build" [dependencies.ffi] package = "poppler-sys" git = "" branch = "amos/static-build" [features] static = ["ffi/static"]

And with all those changes, and our custom poppler prefix, it's actually almost trivial to make a static build of a simple app, like that one:

TOML markup
# in `poppler-rs/pdftocairo/Cargo.toml` [package] name = "pdftocairo" version = "0.1.0" edition = "2021" [dependencies] camino = "1.0.5" color-eyre = "0.5.11" poppler = { path = "../poppler-rs" } tracing = "0.1.29" tracing-error = "0.2.0" tracing-subscriber = { version = "0.3.1", features = ["env-filter"] } [features] static = ["poppler/static"]
Rust code
// in `poppler-rs/pdftocairo/src/` use camino::Utf8PathBuf; use color_eyre::Report; use tracing::info; #[cfg_attr(feature = "static", link(name = "stdc++", kind = "static"))] extern "C" {} fn main() -> Result<(), Report> { if std::env::var("RUST_LOG").is_err() { std::env::set_var("RUST_LOG", "info"); } color_eyre::install()?; install_tracing(); let path = Utf8PathBuf::from("/tmp/export.pdf"); info!(%path, "Reading file..."); let data = std::fs::read(&path)?; info!(%path, "Reading file... done!"); let doc = poppler::Document::from_data(&data[..], None)?; info!("Got the document! {:#?}", doc); info!("Producer = {:#?}", doc.producer()); info!("Num pages = {:#?}", doc.n_pages()); Ok(()) } fn install_tracing() { use tracing_error::ErrorLayer; use tracing_subscriber::prelude::*; use tracing_subscriber::{fmt, EnvFilter}; let fmt_layer = fmt::layer(); let filter_layer = EnvFilter::try_from_default_env() .or_else(|_| EnvFilter::try_new("info")) .unwrap(); tracing_subscriber::registry() .with(filter_layer) .with(fmt_layer) .with(ErrorLayer::default()) .init(); }

Hey... that doesn't actually use cairo.

Uhh true. PRs welcome?

The #[link] attribute above is the only thing not covered by the .pc files themselves.

After that, we can do:

Shell session
$ PKG_CONFIG_ALL_STATIC=1 PKG_CONFIG_PATH=/home/amos/bearcove/prefix/lib64/pkgconfig cargo build --verbose --features static

And get an all-static, and gigantic, executable:

Shell session
$ ldd ./target/debug/pdftocairo (0x00007ffc77d63000) => /lib64/ (0x00007fac18f55000) => /lib64/ (0x00007fac18e8a000) => /lib64/ (0x00007fac18dae000) => /lib64/ (0x00007fac18d93000) => /lib64/ (0x00007fac18b89000) /lib64/ (0x00007fac19de3000) => /lib64/ (0x00007fac18b76000) => /lib64/ (0x00007fac18b3b000) => /lib64/ (0x00007fac18a65000) => /lib64/ (0x00007fac18a57000) => /lib64/ (0x00007fac1891c000) => /lib64/ (0x00007fac188fb000) => /lib64/ (0x00007fac188d8000) => /lib64/ (0x00007fac1885e000)

Ah, well, not really. I guess a bunch of libraries escaped there. I guess we'll stick with our awful tricks for the time being.

What about Windows?

But what about Windows? This is what we did all this for, after all...

Well, with this little PowerShell script:

PowerShell script
# in `poppler-rs/pdftocairo/build.ps1` $env:PKG_CONFIG_ALL_STATIC = "1" $env:PKG_CONFIG_PATH = "C:/Users/amos/AppData/Local/temp/pkg-config-hack/poppler-prefix/lib/pkgconfig" cargo build --features static

And changing our little extern "C" block to this:

Rust code
// in `poppler-rs/pdftocairo/src/` #[cfg_attr( all(feature = "static", target_os = "linux"), link(name = "stdc++", kind = "static") )] #[cfg_attr(all(feature = "static", target_os = "windows"), link(name = "shell32"))] #[cfg_attr(all(feature = "static", target_os = "windows"), link(name = "ole32"))] extern "C" {}

Then it builds!

PowerShell session
$ .\build.ps1 warning: unused import: `glib::StaticType` --> C:\Users\amos\bearcove\poppler-rs\poppler-rs\src\auto\ | 11 | use glib::StaticType; poppler-rs/pdftocairo on  amos/static-build [!+] is 📦 v0.1.0 via 🦀 v1.57.0 ❯ .\build.ps1 Compiling serde v1.0.132 (cut) Compiling poppler v0.1.0 (C:\Users\amos\bearcove\poppler-rs\poppler-rs) Compiling pdftocairo v0.1.0 (C:\Users\amos\bearcove\poppler-rs\pdftocairo) Finished dev [unoptimized + debuginfo] target(s) in 17.27s

Interestingly, it's only 7.8MB!

Using Dependencies (a spiritual successor to Dependency Walker, but that one freezes when I give it my executable 🙃), we can see it only depends on system Windows DLLs:

The productionized build

Now for poppler-meson-crates — this repo contains all the build script hackery that extracts the prefix from S3, so it should work out of the box. Just need to switch to the static-friendly poppler-rs crate:

TOML markup
# in `poppler-meson-crates/poppler-sample/Cargo.toml` [dependencies.poppler] git = "" branch = "amos/static-build" features = ["static"]

And that's all the changes we need.

PowerShell session
$ cargo build Compiling proc-macro2 v1.0.34 Compiling unicode-xid v0.2.2 (cut) Compiling cairo-rs v0.15.0 ( Compiling poppler v0.1.0 ( Finished dev [unoptimized + debuginfo] target(s) in 1m 36s

That's right. It finally all works out of the box:

PowerShell session
$ .\target\debug\poppler-sample.exe Producer = Some( "Skia/PDF m94", )

And now for the real thing

This series took so long to write (it's been two and a half months!) that I had time to reinstall my setup from scratch again. This time, it was because WSL2 had gotten on my nerves one too many times, and I decided to switch back to "just giving 32GiB of RAM to a Fedora VM in VMWare".

And with that setup, I couldn't run the desktop app easily. I mean... I could, by just exporting DISPLAY=:0, so it would connect to the display server that was running inside the VM (tucked away in another Windows virtual desktop).

But it was infuriating to have to do that when I had gotten truly headless exports to work. So I just went ahead and folded my experimental "use headless chrome for .drawio -> .pdf" and "use poppler for .pdf -> .svg" code into the main branch for salvage (my command-line asset processor).

And it stopped building on Windows! Because of that static business. So, if we just change the dependencies a little... does it build now?

TOML markup
# in `futile/Cargo.toml` [dependencies.poppler] git = "" branch = "amos/static-build" features = ["static"] [dependencies.cairo-rs] git = "" branch = "amos/static-build" features = ["static", "svg"]
PowerShell session
$ cargo build Compiling glib v0.15.0 ( Compiling cairo-rs v0.15.0 ( Compiling poppler v0.1.0 ( | 20 | page.get_bounding_box(&mut bb); | ^^^^^^^^^^^^^^^^ method not found in `poppler::Page` error[E0599]: no method named `render` found for struct `poppler::Page` in the current scope --> src\commands\ | 25 | page.render(&cx); | ^^^^^^ method not found in `poppler::Page` Some errors have detailed explanations: E0432, E0599. For more information about an error, try `rustc --explain E0432`. error: could not compile `salvage` due to 3 previous errors

Nope! I could swear those methods were in there somewhere...

Rust code
// in `poppler-rs/src/auto/` impl Page { //#[doc(alias = "poppler_page_get_bounding_box")] //#[doc(alias = "get_bounding_box")] //pub fn is_bounding_box(&self, rect: /*Ignored*/&mut Rectangle) -> bool { // unsafe { TODO: call ffi:poppler_page_get_bounding_box() } //} }

Ah. Turns out my Gir.toml was slightly wrong: a little gir -m not_bound in poppler-rs/poppler-rs let me know that get_bounding_box wasn't generated because poppler.Rectangle wasn't generated.

As for render, I had it set to ignore, which I guess didn't have any effect in gir 0.14?

Anyway, with this Gir.toml

TOML markup
[options] library = "Poppler" version = "0.18" target_path = "." min_cfg_version = "0.70" girs_directories = ["/home/amos/bearcove/prefix/share/gir-1.0", "/usr/share/gir-1.0"] work_mode = "normal" # generate_safety_asserts = true deprecate_by_min_version = true single_version_file = true external_libraries = [ "Gio", "GLib", "GObject", "Cairo", ] manual = [ "GLib.Bytes", "GLib.Error", "GLib.DateTime", "cairo.Context", "cairo.Surface", "cairo.Region", ] generate = [ "Poppler.Backend", "Poppler.Document", "Poppler.Rectangle", ] [[object]] name = "Poppler.Page" status = "generate" [[object.function]] name = "get_bounding_box" rename = "get_bounding_box" [[object.function]] name = "get_text_layout" ignore = true [[object.function]] name = "get_text_layout_for_area" ignore = true [[object.function]] name = "get_crop_box" ignore = true [[object.function]] name = "render" [[object.function.parameter]] name = "cairo" const = true [[object.function]] name = "render_for_printing" [[object.function.parameter]] name = "cairo" const = true

The methods appeared again! The const = true thingy is a workaround for the lack of proper metadata in .gir files. All the ignore = true were generating incorrect code. gir is still a work-in-progress!

After fixing all this, I realized I was working off of the wrong poppler-rs repository: I had moved it from GitHub, to the GNOME gitlab.

I was also missing this wonderful workaround for the lackluster code generation around Rectangle:

Rust code
// in `poppler-rs/poppler-rs/src/` use glib::translate::{ToGlibPtr, ToGlibPtrMut}; impl Deref for Rectangle { type Target = ffi::PopplerRectangle; fn deref(&self) -> &Self::Target { unsafe { &*self.to_glib_none().0 } } } impl DerefMut for Rectangle { fn deref_mut(&mut self) -> &mut Self::Target { unsafe { &mut *self.to_glib_none_mut().0 } } }

Wonderfully unsafe.

Anyway... does it build now?


Shell session
$ cargo b Compiling poppler-sys v0.0.1 ( Compiling poppler v0.1.0 ( Compiling salvage v1.4.0 (C:\Users\amos\bearcove\salvage) Finished dev [unoptimized + debuginfo] target(s) in 5.17s

It does!!! 🎉

This article was made possible thanks to my patrons: Alexander Payne, Fredrik Østrem, David Barsky, Yufan Lou, Stephen Molyneaux, Barret Rennie, Thomas Corbin, MW, Jacob Cheriathundam, Michael Watzko, Embark Studios, Eugene Bulkin, Marcus Griep, Petar Radosevic, Tool Army, Tully, Santiago Lema, Spencer Gilbert, Jörn Huxhorn, Garrett Ward, DEX, Christian Oudard, Ronen Cohen, Thor Kamphefner, Kamran Khan, Cole Kurkowski, Arjen Laarhoven, Vicente Bosch, Chirag Jain, Ville Mattila, Marie Janssen, Vladyslav Batyrenko, Cameron Clausen, spike grobstein, Jon Gjengset, Paul Marques Mota, Jakub Fijałkowski, Mitchell Hamilton, Brad Luyster, Max von Forell, Jake S, Dimitri Merejkowsky, Chris Biscardi, René Ribaud, Alex Doroshenko, Vincent, Steven McGuire, Chad Birch, Chris Emery, Bob Ippolito, John Van Enk, metabaron, Isak Sunde Singh, Philipp Gniewosz, Mads Johansen, lukvol, Ives van Hoorne, Jan De Landtsheer, Daniel Strittmatter, Evgeniy Dubovskoy, Alex Rudy, Shane Lillie, Romet Tagobert, Douglas Creager, Corey Alexander, Molly Howell, knutwalker, Zachary Dremann, Sebastian Ziebell, Julien Roncaglia, Amber Kowalski, T, queenfartbutt, Paul Kline, Kristoffer Ström, Astrid Bek, Yoh Deadfall, Justin Ossevoort, Tomáš Duda, Jeremy Banks, Rasmus Larsen, Torben Clasen, C J Silverio, Walther, Pete Bevin, Shane Sveller, Clara Schultz, jer, Wonwoo Choi, Hawken Rives, João Veiga, Richard Pringle, Adam Perry, Benjamin Röjder Delnavaz, Matt Jadczak, Jonathan Knapp, Maximilian, Seth Stadick, brianloveswords, Sean Bryant, Ember, Sebastian Zimmer, Makoto Nakashima, Geoff Cant, Geoffroy Couprie, Michael Alyn Miller, o0Ignition0o, Zaki, Raphael Gaschignard, Romain Ruetschi, Ignacio Vergara, Pascal, Jane Lusby, Nicolas Goy, Ted Mielczarek, Aurora.

This article is part 6 of the Don't shell out! series.

Read the next part

If you liked this article, please support my work on Patreon!

Become a Patron

Looking for the homepage?
Another article: Image decay as a service