Productionizing our poppler build
👋 This page was last updated ~3 years ago. Just so you know.
I was a bit anxious about running our poppler meson build in CI, because it's the real test, you know? "Works on my machine" only goes so far, things have a tendency to break once you try to make them reproducible.
And I was right to worry... but not for the reasons I thought. As I tried to get everything to build in CI, there was a Pypi maintenance that prevented me from installing meson, and then Sourceforge was acting up.
Apart from that it was relatively smooth sailing? Let's run through the
.circleci/config.yml
file section by section.
version: 2.1
This opts into the newest (at the time of this writing) version of CircleCI configs, that allow specifying workflows and stuff.
orbs: win: circleci/windows@2.4.1 aws-s3: circleci/aws-s3@3.0
Because we want to build on Windows, we'll need the circleci/windows orb, which gives us access to Windows executors. We'll also want to store artifacts on S3, and there's an orb for that too.
We've got three jobs: the Linux and Windows build (running in parallel), and finally an upload job:
workflows: version: 2 build: jobs: - x86_64-unknown-linux-gnu: context: [aws] - x86_64-pc-windows-msvc - upload: context: [aws] requires: - x86_64-unknown-linux-gnu - x86_64-pc-windows-msvc
Let's start with the Linux job, the most straighforward:
jobs: x86_64-unknown-linux-gnu: docker: - image: 391789101930.dkr.ecr.us-east-1.amazonaws.com/bearcove-meson:latest steps: - checkout - run: | meson setup build --buildtype release --default-library static --prefix /tmp/poppler-prefix meson compile -C build meson install -C build tar -czf poppler-prefix-x86_64-unknown-linux-gnu.tar.gz -C /tmp poppler-prefix - persist_to_workspace: root: . paths: ["poppler-prefix-x86_64-unknown-linux-gnu.tar.gz"]
Passing --buildtype
is important (a debug build is almost 3x as large). The
prefix is installed in /tmp
and we generate a .tar.gz
file where everything
is prefixed by poppler-prefix/
.
That build runs in a Docker container I built specifically for this purpose: mostly it has a recent Python 3, a C/C++ toolchain, Ninja, and meson. It's using the same base I described in My ideal Rust workflow, it's just a separate target:
############################################## FROM base AS meson ############################################## # Install python, C & C++ compiler, and ninja RUN set -eux; \ apt update; \ apt install --yes --no-install-recommends \ # # Python package manager (for latest meson) python3-pip \ # # C & C++ compiler gcc g++ \ # # Ninja build tool ninja-build \ ; # Install meson RUN set -eux; \ pip install meson \ ;
Now onto the Windows pipeline! That one's fun, because it involves MSVC (aka Visual Studio C++). CircleCI's Windows executors has some version of MSVC installed already, but meson apparently can't find it without a little help.
The best language to use there is probably PowerShell, which we've done a little of already, and the little additional challenge is that... usually to "set up" an MSVC command-line environment, you'd call a batch file (or "source" it?). But this is PowerShell.
Luckily, someone with a much better handle of Batch and PowerShell and MVSC in
general has solved that problem before, and so I was able to just steal
slightly adjust their work to get it working. It involves this wonderful
bit of PowerShell:
pushd 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build' cmd /c "vcvarsall.bat x64&set" | foreach { if ($_ -match "=") { $v = $_.split("="); set-item -force -path "ENV:\$($v[0])" -value "$($v[1])" } } popd write-host "`nVisual Studio Command Prompt variables set." -ForegroundColor Yellow
And here's the job definition itself:
x86_64-pc-windows-msvc: executor: name: win/default size: medium steps: - checkout - run: | pip install "meson==0.60.2" .\.circleci\call-vcvarsall.ps1 meson setup build --vsenv --buildtype release --default-library static --prefix C:/poppler-prefix meson compile -C build .\msvc-static-install.ps1 build C:/poppler-prefix tar -czf poppler-prefix-x86_64-pc-windows-msvc.tar.gz -C C:/ --exclude etc/fonts ./poppler-prefix - persist_to_workspace: root: "." paths: ["poppler-prefix-x86_64-pc-windows-msvc.tar.gz"]
Couple things here: --vsenv
forces using MSVC (otherwise, if clang is
detected, it will default to it). --buildtype
and --default-library
, we've
already seen. Windows ships with an honest-to-cthulhu tar
now, so we don't
need to mess with .zip
, and that leaves...
...what's with the --exclude
?
Yes, that. Well... apparently fontconfig installs symlinks:
$ tree -ah etc | sed 's/\/home\/amos\/bearcove\/poppler-prefix/PREFIX/' etc └── [ 32] fonts ├── [ 652] conf.d │ ├── [ 85] 10-hinting-slight.conf -> PREFIX/share/fontconfig/conf.avail/10-hinting-slight.conf │ ├── [ 89] 10-scale-bitmap-fonts.conf -> PREFIX/share/fontconfig/conf.avail/10-scale-bitmap-fonts.conf │ ├── [ 88] 11-lcdfilter-default.conf -> PREFIX/share/fontconfig/conf.avail/11-lcdfilter-default.conf │ ├── [ 88] 20-unhint-small-vera.conf -> PREFIX/share/fontconfig/conf.avail/20-unhint-small-vera.conf │ ├── [ 85] 30-metric-aliases.conf -> PREFIX/share/fontconfig/conf.avail/30-metric-aliases.conf │ ├── [ 79] 40-nonlatin.conf -> PREFIX/share/fontconfig/conf.avail/40-nonlatin.conf │ ├── [ 78] 45-generic.conf -> PREFIX/share/fontconfig/conf.avail/45-generic.conf │ ├── [ 76] 45-latin.conf -> PREFIX/share/fontconfig/conf.avail/45-latin.conf │ ├── [ 80] 49-sansserif.conf -> PREFIX/share/fontconfig/conf.avail/49-sansserif.conf │ ├── [ 75] 50-user.conf -> PREFIX/share/fontconfig/conf.avail/50-user.conf │ ├── [ 76] 51-local.conf -> PREFIX/share/fontconfig/conf.avail/51-local.conf │ ├── [ 78] 60-generic.conf -> PREFIX/share/fontconfig/conf.avail/60-generic.conf │ ├── [ 76] 60-latin.conf -> PREFIX/share/fontconfig/conf.avail/60-latin.conf │ ├── [ 84] 65-fonts-persian.conf -> PREFIX/share/fontconfig/conf.avail/65-fonts-persian.conf │ ├── [ 79] 65-nonlatin.conf -> PREFIX/share/fontconfig/conf.avail/65-nonlatin.conf │ ├── [ 78] 69-unifont.conf -> PREFIX/share/fontconfig/conf.avail/69-unifont.conf │ ├── [ 80] 80-delicious.conf -> PREFIX/share/fontconfig/conf.avail/80-delicious.conf │ ├── [ 80] 90-synthetic.conf -> PREFIX/share/fontconfig/conf.avail/90-synthetic.conf │ └── [ 1009] README └── [ 2.7K] fonts.conf 2 directories, 20 files
(They're absolute, too! I guess they don't expect people to copy prefixes across different computers). And it caused problems further down the line for me, which I won't get into right now to avoid spoilers.
So we're just not packing these files! I honestly doubt we're even using those fontconfig files for anything - by the time we deal with PDF files, they don't have any text left in them, just shapes.
Then we have the third job, which is boring. And boring is nice, in this case:
upload: docker: - image: "cimg/python:3.10" steps: - attach_workspace: at: /tmp/workspace - aws-s3/copy: from: /tmp/workspace/poppler-prefix-x86_64-unknown-linux-gnu.tar.gz to: s3://bearcove-binaries/poppler-prefix/$CIRCLE_SHA1/ - aws-s3/copy: from: /tmp/workspace/poppler-prefix-x86_64-pc-windows-msvc.tar.gz to: s3://bearcove-binaries/poppler-prefix/$CIRCLE_SHA1/
And... tada!
The resulting archives are a little chonk, but I'm not about to argue for a fistful of megabytes
So, are we all done? All good?
Oh no we're not...
You mean to say... this wasn't a theoretical exercise? We're actually going to use these?
Yes we are...
Actually using this static poppler build
Well, easy right? Just make sure that before building any crates that depend
on poppler, we download and extract them, and export PKG_CONFIG_PATH
so that
they're found and used. Right?
Well...
Oh no
The thing is...
Oh that's never a good sign.
Okay let's cut to the chase: yes, we can cobble CI pipelines together so that "it builds in CI". It's not that hard: we can add whatever we want as a shell script, so we can definitely run a curl/tar/export some vars and boom here we go.
But it's not enough.
SURE IT IS
No, it's not.
FREE ME
It's not! I want to be able to open those projects in VS Code and have them just build. The whole point of freeing ourselves of these dependencies is to not have to run a bunch of manual steps before we can be productive.
JUST DEVELOP ON LINUX, THEY HAVE PACKAGES
But even then! See, even today, as I had to set up salvage
again, which
involved installing libavif-tools
to provide the avifenc
CLI tool and...
it was broken! It said something something ABI version mismatch I'm sad you
can't have an avif file.
See, when you take on a dependency, it eventually becomes you prob-
BUT THEN YOU SWITCHED TO CAVIF AND IT WORKS NOW
Yes... and that was always the plan. But we're talking about SVG today.
IT'S BEEN A MONTH. POSSIBLY LONGER. WHAT DO YOU MEAN "TODAY"
And we're so close to the goal! All we need is a small build script and we'll be on our w-
NO. No. We don't "just" need "a small build script". Because there's the problem: our dependency tree looks like this right now, correct?
- salvage - poppler-rs - poppler-sys-rs - pkg-config-rs (omitted: high-level and `-sys` crates for cairo, glib, etc.)
Well you're omitting a bunch of crates but su-
STAY ON TOPIC. It looks like this. And you know the build order there?
The uh.. the arrows go up?
PRECISELY. It'll build poppler-sys-rs
first, which relies on pkg-config-rs
to invoke pkg-config
to find libpoppler-glib.a
and friends.
And if you add a build script to salvage
you know when it'll run?
Well it'll... ah. Uh.
EXACTLY.
It'll run after all the dependencies are already built. So exporting an environment variable from it would do sweet nothing. The build will have already failed, or picked up some dynamic libraries (if we're on Linux and we have development packages installed).
Well we could simply fork, uh, all the-
Oh you want to fork cairo-rs
, cairo-sys-rs
,
glib-sys
, glib
, glib-macros
, gobject-sys
, and gio-sys
?
When you put it like that... no, I don't really want to.
Right. So stop being a tryhard, cut your losses and QUIT IT WITH THE STATIC BUILDS.
Mh.
Mhhhhhhhhhhh. Unless...
long sigh
...unless we somehow patch pkg-config-rs
.
Imagine we had a pkg-config-hack/Cargo.toml
like this...
[package] name = "pkg-config" version = "0.3.24" edition = "2021" [dependencies] async-compression = { version = "0.3.8", features = ["gzip", "tokio"] } aws-config = "0.3.0" aws-sdk-s3 = "0.3.0" aws-types = "0.3.0" camino = { version = "1.0.5", features = ["serde1"] } futures = "0.3.17" tokio = { version = "1.15.0", features = ["full"] } tokio-tar = "0.3.0" tokio-util = { version = "0.6.9", features = ["io"] } walkdir = "2.3.2" color-eyre = "0.5.11" named-lock = "0.1.1" serde = { version = "1.0.132", features = ["derive"] } serde_json = "1.0.73" # this allows re-exporting most of the upstream pkg-config. note that we cannot # use a simple crates.io deps, we have to use a git dep because of how we wrap # it. # also, if glib-sys etc. end up bumping their version of pkg-config-rs, that one # will have to be bumped too. [dependencies.upstream] package = "pkg-config" git = "https://github.com/rust-lang/pkg-config-rs" rev = "49a4ac189aafa365167c72e8e503565a7c2697c2"
Then, in our top-level Cargo workspace, or package (if we're not using a workspace), we could do something like this...
# in `salvage/Cargo.toml` [dependencies] poppler-rs = "0.18.2" cairo-rs = { version = "0.14.9", features = ["svg"] } [patch.crates-io] pkg-config = { git = "https://github.com/bearcove/poppler-meson-crates", rev = "4f06bd6" }
And then it would replace the pkg-config
crate with our own crate.
Fine, sure, okay. Super crimey but okay. Then what?
Well, then we have full control of what's happening. For starters, we need to
re-export most of the pkg-config
API as-is, except for the Config
struct,
because it's the one doing the actual lookup:
// in `poppler-meson-crates/pkg-config-hack/src/lib.rs` pub mod config; pub use config::cargo_config; mod prepare; pub use upstream::{Error, Library}; pub struct Config { inner: upstream::Config, } impl Config { pub fn new() -> Self { Self { inner: upstream::Config::new(), } } pub fn atleast_version(&mut self, vers: &str) -> &mut Config { self.inner.atleast_version(vers); self } pub fn print_system_libs(&mut self, print: bool) -> &mut Config { self.inner.print_system_libs(print); self } pub fn cargo_metadata(&mut self, cargo_metadata: bool) -> &mut Config { self.inner.cargo_metadata(cargo_metadata); self } pub fn env_metadata(&mut self, env_metadata: bool) -> &mut Config { self.inner.env_metadata(env_metadata); self } pub fn statik(&mut self, statik: bool) -> &mut Config { self.inner.statik(statik); self } pub fn probe(&self, name: &str) -> Result<Library, Error> { println!("cargo:warning=Probing library {}", name); prepare::prepare_pkgconfig_prefix(); let res = self.inner.probe(name)?; Ok(res) } }
And?
And well, you can see in Config::probe
, before we let the actual
pkg-config
crate call the pkg-config
CLI utility, we call prepare
.
And that one's a little complicated, but I'm sure y'all can make sense of it.
// in `poppler-meson-crates/pkg-config-hack/src/prepare.rs use async_compression::tokio::bufread::GzipDecoder; use aws_sdk_s3::Client; use camino::{Utf8Path, Utf8PathBuf}; use color_eyre::{eyre::eyre, Report}; use futures::TryStreamExt; use named_lock::NamedLock; use serde::{Deserialize, Serialize}; use std::{env::temp_dir, io, sync::Once, time::Instant}; use tokio::io::BufReader; use tokio_util::io::StreamReader; use walkdir::WalkDir; static COLOR_EYRE_ONCE: Once = Once::new(); pub fn prepare_pkgconfig_prefix() { // Make sure multiple build scripts don't try to prepare the prefix at the same time let lock = NamedLock::create("pkg-config-hack-prepare").unwrap(); let before_lock = Instant::now(); let _guard = lock.lock().unwrap(); println!( "cargo:warning=Acquired lock after {:?}", before_lock.elapsed() ); // Make color-eyre spit out useful errors COLOR_EYRE_ONCE.call_once(|| { std::env::set_var("RUST_LIB_BACKTRACE", "1"); color_eyre::install().unwrap(); }); let pkg_config_path = download_and_extract_prefix().unwrap(); println!("cargo:warning=Setting PKG_CONFIG_PATH={}", pkg_config_path); std::env::set_var("PKG_CONFIG_PATH", pkg_config_path); } const FORMAT_VERSION: u64 = 1; const POPPLER_MESON_VERSION: &str = "8c4735aba88cfa81bf43a6648f6864862dfa495c"; /// Downloads a prefix containing a static build of poppler and its dependencies /// (cairo, glib, etc.), and returns the absolute path to a pkg-config path. #[tokio::main] async fn download_and_extract_prefix() -> Result<Utf8PathBuf, Report> { // (Note: we're using tokio main to start an async executor just for // this function, since the AWS S3 requires one.) let temp_dir = Utf8PathBuf::try_from(temp_dir()).unwrap(); let work_dir = temp_dir.join("pkg-config-hack"); let prefix_dir = work_dir.join("poppler-prefix"); println!("Will download to prefix path {}", work_dir); // Have we already prepared the right version? let ticket_path = work_dir.join("ticket.json"); match Ticket::read(&ticket_path).await { Ok(ticket) => { if ticket.up_to_date() { return Ok(ticket.pkg_config_path.clone()); } } Err(e) => { println!("cargo:warning=Could not read ticket: {}", e); } } if is_dir(&work_dir).await { // This should only fail if multiple processes are messing with // the prefix at the same time, which would mean our named lock // doesn't work. tokio::fs::remove_dir_all(&work_dir).await?; } tokio::fs::create_dir_all(&work_dir).await?; let target = std::env::var("TARGET").unwrap(); println!("Building for target {}", target); let shared_config = aws_config::from_env().load().await; let client = Client::new(&shared_config); if shared_config.region().is_none() { panic!("AWS_REGION (and friends) must be set for pkg-config-hack to work"); } println!("AWS region: {}", shared_config.region().unwrap()); let key = format!( "poppler-prefix/{}/poppler-prefix-{}.tar.gz", POPPLER_MESON_VERSION, target ); println!("Fetching ({})", key); let resp = client .get_object() .bucket("bearcove-binaries") .key(&key) .send() .await?; println!("Resp = {:?}", resp); // AWS error => std::io::Error let body = resp .body .map_err(|e| io::Error::new(io::ErrorKind::Other, e)); // Stream<Bytes> => AsyncRead let body = StreamReader::new(body); // AsyncRead => AsyncBufRead let body = BufReader::new(body); // decompress gzip let body = GzipDecoder::new(body); // open tar archive let mut archive = tokio_tar::Archive::new(body); println!("Unpacking all entries..."); archive.unpack(&work_dir).await?; println!("Patching .pc files..."); for entry in WalkDir::new(&work_dir) { let entry = entry?; let path: Utf8PathBuf = entry.path().to_path_buf().try_into()?; if let Some("pc") = path.extension() { println!("Should patch {}", path); let old_contents = tokio::fs::read_to_string(&path).await?; let new_contents = old_contents // for Linux .replace("/tmp/poppler-prefix", prefix_dir.as_str()) // for Windows .replace("C:/poppler-prefix", prefix_dir.as_str()); tokio::fs::write(&path, &new_contents).await?; } } // Okay, prefix should exist now... assert!(is_dir(&prefix_dir).await); // The Linux build uses lib64 for some reason, even though it's built on // Ubuntu, which is a Debian derivative, which is never supposed to have // lib64: https://wiki.ubuntu.com/MultiarchSpec - if I had to pick some tool // to blame, it'd be meson, but I don't have to, thank Cthulhu. for libdir in ["lib", "lib64"] { let pkg_config_path = prefix_dir.join(libdir).join("pkgconfig"); if is_dir(&pkg_config_path).await { let ticket = Ticket { format_version: FORMAT_VERSION, poppler_meson_version: POPPLER_MESON_VERSION.to_string(), pkg_config_path: pkg_config_path.clone(), }; ticket.write(ticket_path).await?; println!("cargo:warning=Writing ticket: {:?}", ticket); return Ok(pkg_config_path); } } Err(eyre!( "pkgconfig dir not found, is the prefix even valid? (see file listing above)" )) } /// Returns true if the path is a directory that we can read. Errors out if /// it's anything other than a directory, or we couldn't get its metadata (b/c /// permissions, I/O error, anything else) async fn is_dir<P: AsRef<Utf8Path>>(path: P) -> bool { let path = path.as_ref(); let res = matches!(tokio::fs::metadata(path).await, Ok(meta) if meta.is_dir()); println!("is {} a dir? {}", path, res); res } #[derive(Debug, Serialize, Deserialize)] struct Ticket { format_version: u64, poppler_meson_version: String, pkg_config_path: Utf8PathBuf, } impl Ticket { async fn read<P: AsRef<Utf8Path>>(ticket_path: P) -> Result<Self, Report> { let serialized = tokio::fs::read(ticket_path.as_ref()).await?; Ok(serde_json::from_slice(&serialized[..])?) } async fn write<P: AsRef<Utf8Path>>(&self, ticket_path: P) -> Result<(), Report> { let serialized = serde_json::to_vec_pretty(self)?; tokio::fs::write(ticket_path.as_ref(), &serialized[..]).await?; Ok(()) } fn up_to_date(&self) -> bool { if self.format_version != FORMAT_VERSION { return false; } if self.poppler_meson_version != POPPLER_MESON_VERSION { return false; } true } }
That... that could be an series of its own, couldn't it?
Yes it could! But it's not that complicated, basically we:
- Make sure we're the only process trying to set up the prefix at any given time, using the named-lock crate.
- Determine a "stable" temporary directory, something like
/tmp/pkg-config-hack
on Linux - Check for any pre-existing "install ticket" that matches what we're trying to install
- If there is such a ticket, we're done!
- If there isn't, or we can't read it, we use the official AWS Rust SDK
to download the
.tar.gz
, buffer it, decompress it, and unpack it as a tar archive. - While we're at it, we "fix" a bunch of absolute paths in the installed
.pc
files - And finally we write an install ticket
- ...and set
PKG_CONFIG_PATH
to something like/tmp/pkg-config-hack/poppler-prefix/lib/pkg-config
Easy right?
That is terrifying.
I mean... we have a terrifying number of build dependencies now (maybe bringing in all of tokio, hyper, a rust gzip and tar implementation, the S3 sdk, etc., is a tiny bit of overkill), but I actually think it's pretty readable!
So now all we have to do is actually use it in salvage
, say, maybe we do
this:
// in `salvage/src/poppler.rs` use cairo::{Context, SvgSurface}; use color_eyre::eyre::{self, eyre}; use poppler::{Document, Rectangle}; use std::{fs::File, path::Path}; #[derive(Clone)] pub struct Poppler {} impl Poppler { pub fn new() -> Self { Self {} } pub fn pdf_to_svg(&self, input: &Path, output: &Path) -> Result<(), eyre::Error> { let pdf_bytes = std::fs::read(input)?; let doc = Document::from_data(&pdf_bytes[..], None)?; let page = doc.page(0).unwrap(); let mut bb: Rectangle = Default::default(); page.get_bounding_box(&mut bb); let out = File::create(&output)?; let surface = SvgSurface::for_stream(bb.x2 - bb.x1, bb.y2 - bb.y1, out)?; let cx = Context::new(&surface)?; page.render(&cx); surface .finish_output_stream() .map_err(|e| eyre!("cairo error: {}", e.to_string()))?; Ok(()) } }
And now we... get a bunch of linking errors.
Ah, right, remember how the installed .pc
files are not quite flexible enough
to allow static linking? Well, that.
Well, no worries! We can just expose a config
module from pkg-config-hack
:
// in `poppler-meson-crates/pkg-config-hack/src/config.rs` /// Prints required flags to build against a static build of poppler. pub fn cargo_config() { let target = std::env::var("TARGET").unwrap(); let is_msvc = target.contains("msvc"); // poppler-glib requires poppler println!("cargo:rustc-link-lib=static=poppler"); if is_msvc { // on windows-msvc everything is fine, we just need a couple libraries // for CommandLineToArgvW, SHGetKnownFolderPath println!("cargo:rustc-link-lib=shell32"); // for CoTaskMemFree println!("cargo:rustc-link-lib=ole32"); // for C++ stuff println!("cargo:rustc-link-lib=vcruntime"); } else { // that's where it goes on Fedora I guess ¯\_(ツ)_/¯ // (doesn't hurt other distros) println!("cargo:rustc-link-search=native=/usr/lib/gcc/x86_64-redhat-linux/11/"); // on linux, we want to link statically with the standard C++ library. println!("cargo:rustc-link-lib=static=stdc++"); } // nobody bothers including this in their pkg-config files apparently println!("cargo:rustc-link-lib=static=png16"); // cairo needs this println!("cargo:rustc-link-lib=static=freetype"); // cairo/freetype need this? // the freetype ChangeLog says the dependency graph looks like: // cairo => fontconfig => freetype2 => harfbuzz => cairo println!("cargo:rustc-link-lib=static=fontconfig"); // cairo also needs this println!("cargo:rustc-link-lib=static=pixman-1"); // fontconfig needs this (it's an XML parser) println!("cargo:rustc-link-lib=static=expat"); }
And then we just add a build script for salvage
that calls that:
// in `salvage/build.rs` fn main() { pkg_config::config::cargo_config(); }
Let's not forget to add pkg-config
as a build dependency...
$ cargo add -B pkg-config Updating 'https://github.com/rust-lang/crates.io-index' index Adding pkg-config v0.3.24 to build-dependencies
And BOOM, it builds!
$ cargo build Compiling salvage v1.3.0 (/home/amos/bearcove/salvage) error: linking with `cc` failed: exit status: 1 | = note: "cc" "-m64" (cut) "-Wl,-Bdynamic" "-lpoppler" "-lpoppler-glib" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lcairo" "-ldl" "-lpng16" "-lz" "-lfontconfig" "-lfreetype" "-lexpat" "-lpixman-1" "-lm" "-lgio-2.0" "-lresolv" "-lz" "-lgobject-2.0" "-lffi" "-lgmodule-2.0" "-ldl" "-lglib-2.0" "-lm" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lm" "-lcairo-gobject" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lcairo" "-ldl" "-lpng16" "-lz" "-lfontconfig" "-lfreetype" "-lexpat" "-lpixman-1" "-lm" "-lgobject-2.0" "-lffi" "-lglib-2.0" "-lm" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-L" "/home/amos/.rustup/toolchains/1.57.0-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/home/amos/bearcove/salvage/target/debug/deps/salvage-622b377b8581b717" "-Wl,--gc-sections" "-pie" "-Wl,-zrelro" "-Wl,-znow" "-nodefaultlibs" "-fuse-ld=lld" = note: ld.lld: error: undefined symbol: __res_nquery >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> did you mean: __res_nquery@GLIBC_2.2.5 >>> defined in: /lib64/libc.so.6 ld.lld: error: undefined symbol: __dn_expand >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> referenced by gthreadedresolver.c >>> gthreadedresolver.c.o:(do_lookup_records) in archive /tmp/pkg-config-hack/poppler-prefix/lib64/libgio-2.0.a >>> referenced 4 more times >>> did you mean: __dn_expand@GLIBC_2.2.5 >>> defined in: /lib64/libc.so.6 collect2: error: ld returned 1 exit status error: could not compile `salvage` due to previous error
Bwahahahah no it doesn't.
It doesn't.
And that one's a little nasty...
See, that libgio-2.0.a
was built on Ubuntu 20.04. And right now I'm trying to
build something against it, from Fedora 35.
And on Ubuntu, that symbol is there:
$ docker run --rm -it ubuntu:20.04 /bin/bash root@98e93b395dfa:/# apt update && apt install -y --no-install-recommends binutils (cut) root@98e93b395dfa:/# nm -D /usr/lib/x86_64-linux-gnu/libresolv.so.2 | grep dn_expand 00000000000047f0 T __dn_expand root@98e93b395dfa:/#
But on Fedora... it's named something else:
$ docker run --rm -it fedora:35 /bin/bash [root@91a05a86529e /]# nm -D /usr/lib64/libresolv.so.2 | grep dn_expand bash: nm: command not found [root@91a05a86529e /]# dnf provides nm Fedora 35 - x86_64 30 MB/s | 79 MB 00:02 Fedora 35 openh264 (From Cisco) - x86_64 2.5 kB/s | 2.5 kB 00:01 Fedora Modular 35 - x86_64 4.8 MB/s | 3.3 MB 00:00 Fedora 35 - x86_64 - Updates 22 MB/s | 17 MB 00:00 Fedora Modular 35 - x86_64 - Updates 4.0 MB/s | 2.8 MB 00:00 binutils-2.37-10.fc35.i686 : A GNU collection of binary utilities Repo : fedora Matched from: Filename : /usr/bin/nm binutils-2.37-10.fc35.x86_64 : A GNU collection of binary utilities Repo : fedora Matched from: Filename : /usr/bin/nm [root@91a05a86529e /]# dnf install binutils (cut) [root@91a05a86529e /]# nm -D /usr/lib64/libresolv.so.2 | grep dn_expand U __libc_dn_expand@GLIBC_PRIVATE
In fact, it's not even provided by libresolv.so.2
:
(continued) [root@91a05a86529e /]# ldd /usr/lib64/libresolv.so.2 linux-vdso.so.1 (0x00007ffd3b872000) libc.so.6 => /lib64/libc.so.6 (0x00007f8d14459000) /lib64/ld-linux-x86-64.so.2 (0x00007f8d1467b000) [root@91a05a86529e /]# nm -D /lib64/libc.so.6 | grep __libc_dn_expand 000000000012eae0 T __libc_dn_expand@@GLIBC_PRIVATE
...but by libc.so.6
.
So, what is a bear to do?
Oh, me? I've mentally checked out several pages ago.
Well, we could statically link against libresolv, or patch... gio I guess? To not call those functions, but neither of these sound really appealing right now, when we can just commit more code crimes.
Crimes, crimes, crimes!
See, we don't actually need dn_expand
- it's DNS-related, and we sure never
expect poppler, by way of gio, to access the network. So we don't need DNS.
So we could just... stub it.
// in `salvage/src/screw-libresolv.c` #include <stdio.h> #include <stdlib.h> static void bail(void) { printf( "The program's about to abort. I'm sure you're wondering why?\n" "\n" "Well, it's DNS. it's *always* DNS.\n" "See, this program links against poppler-glib, which links against glib,\n" "which includes gio, which includes a DNS resolver. so it links against\n" "libresolv.\n" "\n" "Who provides libresolv, you ask? Depends who you ask!\n" "\n" "On some platforms it's ISC, as part of BIND. On others, it's part of libc.\n" "It exposes symbols like dn_expand. On Fedora, the _actual_ symbol name\n" "(provided by the static or dynamic library) is __libc__dn_expand. On Ubuntu\n" "it's __dn_expand. Note that the C function name is just 'dn_expand'.\n" "\n" "That means if you build gio on Ubuntu, it'll expand the __dn_expand symbol\n" "to exist. Then if you try to link against gio on Fedora, it'll fail because\n" "the actual symbol name will be __libc_dn_expand (in the dynamic library it's\n" "even private (via @GLIBC_PRIVATE).\n" "\n" "How do we resolve this? Well, WE DON'T EVEN NEED DNS. At least not from gio.\n" "So, if we expose our own dummy symbols... that just abort... it should build,\n" "and nobody should ever have to read this!\n" ); abort(); } void __attribute__((weak)) __dn_expand(void) { bail(); } void __attribute__((weak)) __res_nquery(void) { bail(); }
Ohhhh and because it's a weak symbol it won't mess with an existing one?
Exactly!
So we just build it and link it...
$ cargo add -B cc Updating 'https://github.com/rust-lang/crates.io-index' index Adding cc v1.0.72 to build-dependencies
// in `salvage/build.rs` fn main() { // 👇 new! if std::env::var("TARGET").unwrap().contains("linux-gnu") { cc::Build::new() .file("src/screw-libresolv.c") .warnings(false) .compile("screw-libresolv"); } pkg_config::config::cargo_config(); }
And just like that, it builds. It clocks in at 20MB, but it builds. (18MB with compressed debug sections, 14MB stripped).
More Windows sadness
It doesn't build on Windows though... Well it builds! It just doesn't link. And
that's because our various -sys
crates which are generated by gtk-rs/gir,
have snippets like these!
// in `poppler-sys-rs/src/lib.rs` #[link(name = "poppler-glib")] #[link(name = "poppler")] extern "C" { // a whole bunch of functions }
And on Linux, this is fine! The toolchains there will pick up a .a
like they
would an .so
, and they'll do the right thing.
But on Windows... it's not the case. I'm not sure exactly which tool is
responsible, but at some point someone decides the symbol we want isn't
poppler_document_new_from_data
, it's actually
__imp_poppler_document_new_from_data
: as if poppler-glib.lib
was an import
library for poppler-glib.dll
, and not a static library.
If we want static linking to succeed on Windows, we need to change it to something like this:
// in `poppler-sys-rs/src/lib.rs` #[link(name = "poppler-glib", kind = "static")] #[link(name = "poppler", kind = "static")] extern "C" { // a whole bunch of functions }
Won't that break dynamic linking?
Of course it will! And that's why what we really want, is to be able to conditionally use either static or dynamic linking. One way to do it (which I don't love, but we make do) is with cargo features!
If poppler-sys-rs
had a static
feature, we could have:
// in `poppler-sys-rs/src/lib.rs` #[cfg_attr(feature = "static", link(name = "poppler", kind = "static"))] #[cfg_attr(feature = "static", link(name = "poppler-glib", kind = "static"))] #[cfg_attr(not(feature = "static"), link(name = "poppler"))] #[cfg_attr(not(feature = "static"), link(name = "poppler-glib"))] extern "C" { // a whole bunch of functions }
And we could even get gir
to generate those!
If only someone... contributed that feature upstream...
Oh no no no not ag-
Looking into gtk-rs/gir
...fuck's sake.
So the first wrinkle is that, at the time of this writing, there's gir 0.14, and gir's default branch, adn they generate incompatible code. So we'll need to regenerate all crates.
Weren't we planning on regenerating all of them anyway? (So they build correctly?)
Yeah. So it's not a big deal.
It wasn't too hard to find where #[link]
attributes are generated:
// in `gir/src/codegen/lib_.rs` (yes, with the trailing underscore) fn write_link_attr(w: &mut dyn Write, shared_libs: &[String]) -> Result<()> { for it in shared_libs { writeln!( w, "#[link(name = \"{}\")]", shared_lib_name_to_link_name(it) )?; } Ok(()) }
So, we just need to change that!
fn write_link_attr(w: &mut dyn Write, shared_libs: &[String]) -> Result<()> { for it in shared_libs { let link_name = shared_lib_name_to_link_name(it); writeln!( w, r#"#[cfg_attr(feature = "static", link(name = "{}", kind = "static"))]"#, link_name )?; writeln!( w, r#"#[cfg_attr(not(feature = "static"), link(name = "{}"))]"#, link_name )?; } Ok(()) }
Let's look at the diff before and after that change on poppler-rs-sys
, since I
own that crate:
$ git diff (cut) -#[link(name = "poppler-glib")] -#[link(name = "poppler")] +#[cfg_attr(feature = "static", link(name = "poppler-glib", kind = "static"))] +#[cfg_attr(not(feature = "static"), link(name = "poppler-glib"))] +#[cfg_attr(feature = "static", link(name = "poppler", kind = "static"))] +#[cfg_attr(not(feature = "static"), link(name = "poppler"))]
Perfect!
Next, we to actually define the feature in Cargo.toml
, and have it enable the
feature for other crates.
Here's part of the code in gir
that generates Cargo.toml files:
// in `gir/src/codegen/sys/cargo_toml.rs` fn fill_in(root: &mut Table, env: &Env) { // (cut) { let features = upsert_table(root, "features"); let versions = collect_versions(env); versions.keys().fold(None::<Version>, |prev, &version| { let prev_array: Vec<Value> = get_feature_dependencies(version, prev, &env.config.feature_dependencies) .iter() .map(|s| Value::String(s.clone())) .collect(); features.insert(version.to_feature(), Value::Array(prev_array)); Some(version) }); features.insert( "dox".to_string(), Value::Array( env.config .dox_feature_dependencies .iter() .map(|s| Value::String(s.clone())) .collect(), ), ); } // (cut) }
You can see it's using an upsert_table
helper: it's actually trying to modify
the manifest in-place, because it means for those to be edited by humans.
If we just add this:
features.insert( "static".to_string(), Value::Array( env.config .external_libraries .iter() .map(|l| Value::String(format!("{}/static", l.crate_name))) .collect(), ), );
We're good to go! Here's the PR I opened, which as far as I can tell won't land this year.
Oh, that's cute.
But yeah, I've regenerated everything with it, in bearcove/gtk-rs-core and bearcove/poppler-rs, making sure poppler references the newer crates, and having itself a static feature:
# in `poppler-rs/poppler-rs/Cargo.toml` [package] name = "poppler" version = "0.1.0" edition = "2021" [dependencies] libc = "0.2.107" bitflags = "1.3.2" [dependencies.glib] package = "glib" version = "0.15.0" git = "https://github.com/bearcove/gtk-rs-core" branch = "amos/static-build" [dependencies.cairo] package = "cairo-rs" version = "0.15.0" git = "https://github.com/bearcove/gtk-rs-core" branch = "amos/static-build" [dependencies.ffi] package = "poppler-sys" git = "https://github.com/bearcove/poppler-rs" branch = "amos/static-build" [features] static = ["ffi/static"]
And with all those changes, and our custom poppler prefix, it's actually almost trivial to make a static build of a simple app, like that one:
# in `poppler-rs/pdftocairo/Cargo.toml` [package] name = "pdftocairo" version = "0.1.0" edition = "2021" [dependencies] camino = "1.0.5" color-eyre = "0.5.11" poppler = { path = "../poppler-rs" } tracing = "0.1.29" tracing-error = "0.2.0" tracing-subscriber = { version = "0.3.1", features = ["env-filter"] } [features] static = ["poppler/static"]
// in `poppler-rs/pdftocairo/src/main.rs` use camino::Utf8PathBuf; use color_eyre::Report; use tracing::info; #[cfg_attr(feature = "static", link(name = "stdc++", kind = "static"))] extern "C" {} fn main() -> Result<(), Report> { if std::env::var("RUST_LOG").is_err() { std::env::set_var("RUST_LOG", "info"); } color_eyre::install()?; install_tracing(); let path = Utf8PathBuf::from("/tmp/export.pdf"); info!(%path, "Reading file..."); let data = std::fs::read(&path)?; info!(%path, "Reading file... done!"); let doc = poppler::Document::from_data(&data[..], None)?; info!("Got the document! {:#?}", doc); info!("Producer = {:#?}", doc.producer()); info!("Num pages = {:#?}", doc.n_pages()); Ok(()) } fn install_tracing() { use tracing_error::ErrorLayer; use tracing_subscriber::prelude::*; use tracing_subscriber::{fmt, EnvFilter}; let fmt_layer = fmt::layer(); let filter_layer = EnvFilter::try_from_default_env() .or_else(|_| EnvFilter::try_new("info")) .unwrap(); tracing_subscriber::registry() .with(filter_layer) .with(fmt_layer) .with(ErrorLayer::default()) .init(); }
Hey... that doesn't actually use cairo.
Uhh true. PRs welcome?
The #[link]
attribute above is the only thing not covered by the .pc
files
themselves.
After that, we can do:
$ PKG_CONFIG_ALL_STATIC=1 PKG_CONFIG_PATH=/home/amos/bearcove/prefix/lib64/pkgconfig cargo build --verbose --features static
And get an all-static, and gigantic, executable:
$ ldd ./target/debug/pdftocairo linux-vdso.so.1 (0x00007ffc77d63000) libz.so.1 => /lib64/libz.so.1 (0x00007fac18f55000) libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007fac18e8a000) libm.so.6 => /lib64/libm.so.6 (0x00007fac18dae000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fac18d93000) libc.so.6 => /lib64/libc.so.6 (0x00007fac18b89000) /lib64/ld-linux-x86-64.so.2 (0x00007fac19de3000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fac18b76000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007fac18b3b000) libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007fac18a65000) libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007fac18a57000) libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007fac1891c000) libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007fac188fb000) libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 (0x00007fac188d8000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fac1885e000)
Ah, well, not really. I guess a bunch of libraries escaped there. I guess we'll stick with our awful tricks for the time being.
What about Windows?
But what about Windows? This is what we did all this for, after all...
Well, with this little PowerShell script:
# in `poppler-rs/pdftocairo/build.ps1` $env:PKG_CONFIG_ALL_STATIC = "1" $env:PKG_CONFIG_PATH = "C:/Users/amos/AppData/Local/temp/pkg-config-hack/poppler-prefix/lib/pkgconfig" cargo build --features static
And changing our little extern "C"
block to this:
// in `poppler-rs/pdftocairo/src/main.rs` #[cfg_attr( all(feature = "static", target_os = "linux"), link(name = "stdc++", kind = "static") )] #[cfg_attr(all(feature = "static", target_os = "windows"), link(name = "shell32"))] #[cfg_attr(all(feature = "static", target_os = "windows"), link(name = "ole32"))] extern "C" {}
Then it builds!
$ .\build.ps1 warning: unused import: `glib::StaticType` --> C:\Users\amos\bearcove\poppler-rs\poppler-rs\src\auto\document.rs:11:5 | 11 | use glib::StaticType; poppler-rs/pdftocairo on amos/static-build [!+] is 📦 v0.1.0 via 🦀 v1.57.0 ❯ .\build.ps1 Compiling serde v1.0.132 (cut) Compiling poppler v0.1.0 (C:\Users\amos\bearcove\poppler-rs\poppler-rs) Compiling pdftocairo v0.1.0 (C:\Users\amos\bearcove\poppler-rs\pdftocairo) Finished dev [unoptimized + debuginfo] target(s) in 17.27s
Interestingly, it's only 7.8MB!
Using Dependencies (a spiritual successor to Dependency Walker, but that one freezes when I give it my executable 🙃), we can see it only depends on system Windows DLLs:
The productionized build
Now for poppler-meson-crates
— this repo contains all the build script hackery
that extracts the prefix from S3, so it should work out of the box. Just need
to switch to the static-friendly poppler-rs
crate:
# in `poppler-meson-crates/poppler-sample/Cargo.toml` [dependencies.poppler] git = "https://github.com/bearcove/poppler-rs" branch = "amos/static-build" features = ["static"]
And that's all the changes we need.
$ cargo build Compiling proc-macro2 v1.0.34 Compiling unicode-xid v0.2.2 (cut) Compiling cairo-rs v0.15.0 (https://github.com/bearcove/gtk-rs-core?branch=amos/static-build#dcff5004) Compiling poppler v0.1.0 (https://github.com/bearcove/poppler-rs?branch=amos/static-build#e35aeef9) Finished dev [unoptimized + debuginfo] target(s) in 1m 36s
That's right. It finally all works out of the box:
$ .\target\debug\poppler-sample.exe Producer = Some( "Skia/PDF m94", )
And now for the real thing
This series took so long to write (it's been two and a half months!) that I had time to reinstall my setup from scratch again. This time, it was because WSL2 had gotten on my nerves one too many times, and I decided to switch back to "just giving 32GiB of RAM to a Fedora VM in VMWare".
And with that setup, I couldn't run the draw.io desktop app easily. I mean... I
could, by just exporting DISPLAY=:0
, so it would connect to the display server
that was running inside the VM (tucked away in another Windows virtual desktop).
But it was infuriating to have to do that when I had gotten truly headless
draw.io exports to work. So I just went ahead and folded my experimental "use
headless chrome for .drawio -> .pdf" and "use poppler for .pdf -> .svg" code
into the main branch for salvage
(my command-line asset processor).
And it stopped building on Windows! Because of that static
business. So, if we
just change the dependencies a little... does it build now?
# in `futile/Cargo.toml` [dependencies.poppler] git = "https://github.com/bearcove/poppler-rs" branch = "amos/static-build" features = ["static"] [dependencies.cairo-rs] git = "https://github.com/bearcove/gtk-rs-core" branch = "amos/static-build" features = ["static", "svg"]
$ cargo build Compiling glib v0.15.0 (https://github.com/bearcove/gtk-rs-core?branch=amos/static-build#34fcdc82) Compiling cairo-rs v0.15.0 (https://github.com/bearcove/gtk-rs-core?branch=amos/static-build#34fcdc82) Compiling poppler v0.1.0 (https://github.com/bearcove/poppler-rs?branch=amos/static-build#e35aeef9) | 20 | page.get_bounding_box(&mut bb); | ^^^^^^^^^^^^^^^^ method not found in `poppler::Page` error[E0599]: no method named `render` found for struct `poppler::Page` in the current scope --> src\commands\poppler.rs:25:14 | 25 | page.render(&cx); | ^^^^^^ method not found in `poppler::Page` Some errors have detailed explanations: E0432, E0599. For more information about an error, try `rustc --explain E0432`. error: could not compile `salvage` due to 3 previous errors
Nope! I could swear those methods were in there somewhere...
// in `poppler-rs/src/auto/page.rs` impl Page { //#[doc(alias = "poppler_page_get_bounding_box")] //#[doc(alias = "get_bounding_box")] //pub fn is_bounding_box(&self, rect: /*Ignored*/&mut Rectangle) -> bool { // unsafe { TODO: call ffi:poppler_page_get_bounding_box() } //} }
Ah. Turns out my Gir.toml
was slightly wrong: a little gir -m not_bound
in
poppler-rs/poppler-rs
let me know that get_bounding_box
wasn't generated
because poppler.Rectangle
wasn't generated.
As for render
, I had it set to ignore
, which I guess didn't have any effect
in gir 0.14?
Anyway, with this Gir.toml
[options] library = "Poppler" version = "0.18" target_path = "." min_cfg_version = "0.70" girs_directories = ["/home/amos/bearcove/prefix/share/gir-1.0", "/usr/share/gir-1.0"] work_mode = "normal" # generate_safety_asserts = true deprecate_by_min_version = true single_version_file = true external_libraries = [ "Gio", "GLib", "GObject", "Cairo", ] manual = [ "GLib.Bytes", "GLib.Error", "GLib.DateTime", "cairo.Context", "cairo.Surface", "cairo.Region", ] generate = [ "Poppler.Backend", "Poppler.Document", "Poppler.Rectangle", ] [[object]] name = "Poppler.Page" status = "generate" [[object.function]] name = "get_bounding_box" rename = "get_bounding_box" [[object.function]] name = "get_text_layout" ignore = true [[object.function]] name = "get_text_layout_for_area" ignore = true [[object.function]] name = "get_crop_box" ignore = true [[object.function]] name = "render" [[object.function.parameter]] name = "cairo" const = true [[object.function]] name = "render_for_printing" [[object.function.parameter]] name = "cairo" const = true
The methods appeared again! The const = true
thingy is a workaround for the
lack of proper metadata in .gir
files. All the ignore = true
were generating
incorrect code. gir is still a work-in-progress!
After fixing all this, I realized I was working off of the wrong poppler-rs
repository: I had moved it from
GitHub, to the GNOME
gitlab.
I was also missing this wonderful workaround for the lackluster code generation
around Rectangle
:
// in `poppler-rs/poppler-rs/src/lib.rs` use glib::translate::{ToGlibPtr, ToGlibPtrMut}; impl Deref for Rectangle { type Target = ffi::PopplerRectangle; fn deref(&self) -> &Self::Target { unsafe { &*self.to_glib_none().0 } } } impl DerefMut for Rectangle { fn deref_mut(&mut self) -> &mut Self::Target { unsafe { &mut *self.to_glib_none_mut().0 } } }
Wonderfully unsafe.
Anyway... does it build now?
DOES IT???
$ cargo b Compiling poppler-sys v0.0.1 (https://github.com/bearcove/poppler-rs?branch=amos/static-build#3b05dc26) Compiling poppler v0.1.0 (https://github.com/bearcove/poppler-rs?branch=amos/static-build#3b05dc26) Compiling salvage v1.4.0 (C:\Users\amos\bearcove\salvage) Finished dev [unoptimized + debuginfo] target(s) in 5.17s
It does!!! 🎉
Thanks to my sponsors:
If you liked what you saw, please support my work!
Here's another article just for you:
A while back, I asked on Twitter what people found confusing in Rust, and one of the top topics was "how the module system maps to files".
I remember struggling with that a lot when I first started Rust, so I'll try to explain it in a way that makes sense to me.
Important note
All that follows is written for Rust 2021 edition. I have no interest in learning (or teaching) the ins and outs of the previous version, especially because it was a lot more confusing to me.