# From Inkscape to poppler

What's next? Well... poppler is the library Inkscape uses to import PDFs.

Cool bear's hot tip

Yes, the name comes from Futurama.

Turns out, poppler comes with a bunch of CLI tools, including pdftocairo!

Halfway through this article, I realized the "regular weight" on my system was in fact Iosevka SS01 (Andale Mono Style) (see Releases), but the "bold weight" was the default Iosevka.

So, I removed both and reinstalled them from the official distribution, which explains visual and size changes after that point.

So, with a few more CLI incantations...

Shell session
$pdftocairo /tmp/export.pdf -svg /tmp/export.svg$ ls -lhA /tmp/export*
-rw-r--r--. 1 amos amos 159K Nov 19 10:14 /tmp/export.pdf
-rw-r--r--. 1 amos amos 739K Nov 19 10:14 /tmp/export.svg


We've got an SVG file! And it's a bit large, I wonder if it embeds part of a font, like the PDF does?

Well... it's a bit more complicated.

As it turns out, individual non-bold ("regular weight") letters actually refer to other paths:

But words made up of bold letters are a single, very lengthy path:

I wonder if that's because I've only installed the "Regular" weight for the Iosevka font... let's find out.

After installing the "Bold" weight, renaming /tmp/export.EXT to /tmp/export.regular.EXT, and running both steps again, the PDF export is smaller - and so is the SVG!

Shell session
$ls -lhAt /tmp/export.* -rw-r--r--. 1 amos amos 436K Nov 19 10:40 /tmp/export.svg -rw-r--r--. 1 amos amos 68K Nov 19 10:40 /tmp/export.pdf -rw-r--r--. 1 amos amos 739K Nov 19 10:39 /tmp/export.regular.svg -rw-r--r--. 1 amos amos 159K Nov 19 10:39 /tmp/export.regular.pdf  The PDF file now contains two partial embedded fonts: %% Original object ID: 4 0 9 0 obj << /BaseFont /Iosevka-Bold /DescendantFonts [ 11 0 R ] /Encoding /Identity-H /Subtype /Type0 /ToUnicode 12 0 R /Type /Font >> endobj %% Original object ID: 5 0 10 0 obj << /BaseFont /Iosevka /DescendantFonts [ 14 0 R ] /Encoding /Identity-H /Subtype /Type0 /ToUnicode 15 0 R /Type /Font >> endobj  And we can see in the SVG file that bold characters now also take advantage of the SVG use tag. So what happened with bold in the first export then? How did we even get bold letters, if we didn't have the corresponding font? Let's look at them both: The bottom version is what Iosevka is supposed to look like. The top version is Chrome font's renderer (freetype?) doing its best to turn a regular font into a bold font, by just... embiggening stuff. So anyway, now we have a reasonable SVG. It: • Should look the same on any machine, no matter what fonts are installed • Thus, can be downloaded and printed easily • Does some deduplication for glyphs, so that for example the path for Iosevka's 0 glyph is only defined once, and then re-used a bunch of times But, well, we used a CLI tool to do it. Ideally we'd be able to just do it from code, since we don't want any external dependencies (Chrome being the notable, and infuriating, exception). GNOME has a pretty good story when it comes to Rust libraries. But the folks working on them are focusing mainly on cairo, gio, glib, pango, and gtk3/gtk4. There is a poppler crate on crates.io, but it is hopelessly out-of-date. But the good news is: there's existing tooling for glib-based C libraries, and poppler is one of them. Can we use it to generate bindings before this article becomes so large it crashes your browser? Let's find out! ## gobject introspection In the year of our lord 2021, we could all use a little introspection. And APIs are absolutely no exception. APIs are typically defined as a bunch of C headers, and that isn't machine-friendly for a bunch of reasons. I know that because I once tried writing a C preprocessor that basically converted #ifdef blocks into cargo features. It was awful. So what a bunch of folks have been doing instead, is to have some canonical representation of the API as a structured language (that specifically isn't C), and then from there you can generate bindings with it. That's what folks at Microsoft are doing with windows-rs do for example. They actually have machinery involving clang and .NET (you can take a look at the win32metadata repository for more information), and the reference definitions look like this (as seen through ILSpy): Apple is doing something similar with BridgeSupport, although I have found very little documentation about it, and at least one person claimed it was no longer supported. And, well, the GNOME project has been doing the same thing! If the gobject-introspection Git history is to be trusted, they've started their effort in 2004! The Rust side of it, gtk-rs/gir was "only" started in 2015. And like I said earlier, even though poppler is actually an offshoot from xpdf (and so it looks different from a lot of other GNOME-adjacent libraries), it does have a "glib interface" (alongside a QT interface), and that glib interface has a .Gir file, and so we can use it with gtk-rs/gir! A .Gir file is just plain XML, here's an excerpt from /usr/share/gir-1.0/Poppler-0.18.Gir on a Fedora 35 install: XML <?xml version="1.0"?> <!-- This file was automatically generated from C sources - DO NOT EDIT! To affect the contents of this file, edit the original C definitions, and/or use gtk-doc annotations. --> <repository version="1.2" xmlns="http://www.gtk.org/introspection/core/1.0" xmlns:c="http://www.gtk.org/introspection/c/1.0" xmlns:glib="http://www.gtk.org/introspection/glib/1.0"> <include name="GObject" version="2.0"/> <include name="Gio" version="2.0"/> <include name="cairo" version="1.0"/> <package name="poppler-glib"/> <c:include name="poppler.h"/> <namespace name="Poppler" version="0.18" shared-library="libpoppler-glib.so.8,libpoppler.so.112" c:identifier-prefixes="Poppler" c:symbol-prefixes="poppler"> <!-- (skipping a few things to find the interesting bits...) --> <class name="Page" c:symbol-prefix="page" c:type="PopplerPage" parent="GObject.Object" glib:type-name="PopplerPage" glib:get-type="poppler_page_get_type"> <method name="render" c:identifier="poppler_page_render"> <doc xml:space="preserve" filename="glib/poppler-page.cc" line="336">Render the page to the given cairo context. This function is for rendering a page that will be displayed. If you want to render a page that will be printed use poppler_page_render_for_printing() instead. Please see the documentation for that function for the differences between rendering to the screen and rendering to a printer.</doc> <source-position filename="glib/poppler-page.h" line="38"/> <return-value transfer-ownership="none"> <type name="none" c:type="void"/> </return-value> <parameters> <instance-parameter name="page" transfer-ownership="none"> <doc xml:space="preserve" filename="glib/poppler-page.cc" line="338">the page to render from</doc> <type name="Page" c:type="PopplerPage*"/> </instance-parameter> <parameter name="cairo" transfer-ownership="none"> <doc xml:space="preserve" filename="glib/poppler-page.cc" line="339">cairo context to render to</doc> <type name="cairo.Context" c:type="cairo_t*"/> </parameter> </parameters> </method> </class> </namespace> </repository>  And, you know, it doesn't have all the information one could dream of, but it's a perfectly fine start to generate Rust bindings. So after chatting with the wonderful folks in the Gnome/Rust Matrix room, I got to work and started making my own poppler-rs. A lot of "Rust bindings to C libraries" are actually two crates: a foobar-sys crate that is full of unsafe functions, and a foobar crate that wraps foobar-sys's functionality with safe abstractions. And that's the model gtk-rs/gir enforces as well, so I made a little workspace... TOML markup # in poppler-rs/Cargo.toml [workspace] members = [ "sys", "poppler", ]  And for the sys crate, I added a little config: TOML markup # in poppler-rs/sys/Gir.toml [options] library = "Poppler" version = "0.18" target_path = "." min_cfg_version = "0.70" girs_directories = ["../../gir-files", "../gir-files"] work_mode = "sys" external_libraries = [ "Gio", "GLib", "GObject", "Cairo", ] ignore = [ "Poppler.MAJOR_VERSION", "Poppler.MINOR_VERSION", "Poppler.MICRO_VERSION", ]  Cool bear's hot tip MAJOR_VERSION etc. are defines in C. Because we link dynamically against poppler in most scenarios, and the binding is generated once and then used against many different versions of poppler, having them exposed to Rust is a) unnecessary, and b) makes gir-generated unit tests fail (because the numbers don't match up, even if the libraries would be compatible). And then after running gir in the sys/ directory, BOOM, we have a sys crate. It has a single src/lib.rs file, that has a preamble... Rust code // in poppler-rs/sys/src/lib.rs // Generated by gir (https://github.com/gtk-rs/gir @ 8891a2f2c34b) // from ../../gir-files (@ c6afb5857607) // from ../gir-files (@ ec3e62ee546b) // DO NOT EDIT #![allow(non_camel_case_types, non_upper_case_globals, non_snake_case)] #![allow(clippy::approx_constant, clippy::type_complexity, clippy::unreadable_literal, clippy::upper_case_acronyms)] #![cfg_attr(feature = "dox", feature(doc_cfg))] use gio_sys as gio; use glib_sys as glib; use gobject_sys as gobject; use cairo_sys as cairo; #[allow(unused_imports)] use libc::{c_int, c_char, c_uchar, c_float, c_uint, c_double, c_short, c_ushort, c_long, c_ulong, c_void, size_t, ssize_t, intptr_t, uintptr_t, time_t, FILE}; #[allow(unused_imports)] use glib::{gboolean, gconstpointer, gpointer, GType}; // etc.  And then some enums... Rust code // Enums pub type PopplerActionLayerAction = c_int; pub const POPPLER_ACTION_LAYER_ON: PopplerActionLayerAction = 0; pub const POPPLER_ACTION_LAYER_OFF: PopplerActionLayerAction = 1; pub const POPPLER_ACTION_LAYER_TOGGLE: PopplerActionLayerAction = 2;  And then some unions... Rust code // Unions #[repr(C)] #[derive(Copy, Clone)] pub union PopplerAction { pub type_: PopplerActionType, pub any: PopplerActionAny, pub goto_dest: PopplerActionGotoDest, pub goto_remote: PopplerActionGotoRemote, pub launch: PopplerActionLaunch, pub uri: PopplerActionUri, pub named: PopplerActionNamed, pub movie: PopplerActionMovie, pub rendition: PopplerActionRendition, pub ocg_state: PopplerActionOCGState, pub javascript: PopplerActionJavascript, pub reset_form: PopplerActionResetForm, }  Some callbacks (not shown here), some "records", which I guess is what structs are called in gobject-introspection: Rust code #[repr(C)] #[derive(Copy, Clone)] pub struct PopplerRectangle { pub x1: c_double, pub y1: c_double, pub x2: c_double, pub y2: c_double, } impl ::std::fmt::Debug for PopplerRectangle { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { f.debug_struct(&format!("PopplerRectangle @ {:p}", self)) .field("x1", &self.x1) .field("y1", &self.y1) .field("x2", &self.x2) .field("y2", &self.y2) .finish() } }  And then some classes! Rust code #[repr(C)] pub struct PopplerPage(c_void); impl ::std::fmt::Debug for PopplerPage { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { f.debug_struct(&format!("PopplerPage @ {:p}", self)) .finish() } }  And then, well, then there's every function in poppler-glib: Rust code #[link(name = "poppler-glib")] #[link(name = "poppler")] extern "C" { // (MANY functions skipped) // Oh look this one is gated behind a cargo feature automatically! #[cfg(any(feature = "v0_80", feature = "dox"))] #[cfg_attr(feature = "dox", doc(cfg(feature = "v0_80")))] pub fn poppler_print_duplex_get_type() -> GType; // skipping more... pub fn poppler_page_render(page: *mut PopplerPage, cairo: *mut cairo::cairo_t); // skipping everything else. }  And that's how you get a -sys crate. You'll note that it has only poppler functions. It doesn't have, for example, cairo functions, which is a dependency of poppler. Those are in other crates, which have already been generated and published to crates.io: TOML markup # in poppler-rs/sys/Cargo.toml [dependencies] cairo-sys-rs = "0.14.9" gio-sys = "0.14.0" glib-sys = "0.14.0" gobject-sys = "0.14.0" libc = "0.2"  Now that we have the low-level, unsafe crate, we can generate the high-level crate! That one's a bit more complicated, because, again, the .Gir files are missing some information that matters for languages like Rust. TOML markup # in poppler-rs/poppler/Gir.toml [options] library = "Poppler" version = "0.18" target_path = "." min_cfg_version = "0.70" girs_directories = ["../../gir-files", "../gir-files"] work_mode = "normal" # 👈 this was "sys" for the previous crate deprecate_by_min_version = true single_version_file = true external_libraries = [ "Gio", "GLib", "GObject", "Cairo", ] # This tells gir "these types exist in _other crates_, you don't need to # generate them yourself BUT you shouldn't skip functions that use these" # (Normally gir skips anything that uses types that aren't explicitly # allowlisted). manual = [ "GLib.Bytes", "GLib.Error", "GLib.DateTime", "cairo.Context", "cairo.Surface", "cairo.Region", ] # This is the short way of telling gir what to generate generate = [ "Poppler.Backend", "Poppler.Document", ] # This is the long way of telling gir what to generate, where we can ignore # specific "object functions" (methods, really..), change the constness of some # parameters, etc. [[object]] name = "Poppler.Page" status = "generate" [[object.function]] name = "render" [[object.function.parameter]] name = "cairo" const = true [[object.function]] name = "render_for_printing" [[object.function.parameter]] name = "cairo" const = true [[object.function]] name = "get_text_layout" ignore = true [[object.function]] name = "get_text_layout_for_area" ignore = true [[object.function]] name = "get_crop_box" ignore = true [[object.function]] name = "get_bounding_box" rename = "get_bounding_box" [[object]] name = "Poppler.Rectangle" status = "generate" boxed_inline = true  There's a couple interesting workarounds I've got baked in there, for some value of "interesting". For example, the poppler_page_get_bounding_box function prototype looks like this: C code gboolean poppler_page_get_bounding_box (PopplerPage *page, PopplerRectangle *rect);  And so by default, gtk-rs/gir generated something like this: Rust code impl Page { fn is_bounding_box(&mut self, rect: &mut Rectangle) -> bool; }  Ohhh because it returns a bool, right. ...hence the odd "rename get_bounding_box to get_bounding_box" configuration. get_crop_box generated code that straight up refused to compile, so I had to ignore it - and I ran into a couple other issues, but I have to say I've been using the 0.14 branch of gtk-rs/gir, and the development branch contains a lot of improvements already. Wait, why did you use 0.14 then? That's what the existing glib and cairo-rs crates were generated with. So.. the versions have to match to be able to interoperate? Precisely! And again, just running gir generates a whole crate, a high-level, safe one this time: Rust code // in poppler-rs/poppler/src/auto/page.rs // This file was generated by gir (https://github.com/gtk-rs/gir) // from ../../gir-files // from ../gir-files // DO NOT EDIT use crate::Rectangle; use glib::object::ObjectType as ObjectType_; use glib::signal::connect_raw; use glib::signal::SignalHandlerId; use glib::translate::*; use std::boxed::Box as Box_; use std::fmt; use std::mem; use std::mem::transmute; // that's how you know it's gonna get good glib::wrapper! { #[doc(alias = "PopplerPage")] pub struct Page(Object<ffi::PopplerPage>); match fn { type_ => || ffi::poppler_page_get_type(), } } impl Page { // (still not sure why this returns a bool / when this would ever return // false, the docs are non-existent) #[doc(alias = "poppler_page_get_bounding_box")] pub fn get_bounding_box(&self, rect: &mut Rectangle) -> bool { unsafe { from_glib(ffi::poppler_page_get_bounding_box(self.to_glib_none().0, rect.to_glib_none_mut().0)) } } // (skipped a bunch of methods) #[doc(alias = "poppler_page_render")] pub fn render(&self, cairo: &cairo::Context) { unsafe { ffi::poppler_page_render(self.to_glib_none().0, mut_override(cairo.to_glib_none().0)); } } #[doc(alias = "poppler_page_render_for_printing")] pub fn render_for_printing(&self, cairo: &cairo::Context) { unsafe { ffi::poppler_page_render_for_printing(self.to_glib_none().0, mut_override(cairo.to_glib_none().0)); } } // (skipped all the other methods) }  Just like before, the high-level poppler crate depends on high-level glib/cairo crates. And bitflags, for reasons™️ TOML markup # in poppler-rs/poppler/Cargo.toml [dependencies] glib = "0.14.8" libc = "0.2.107" cairo-rs = "0.14.9" bitflags = "1.3.2"  And now, FINALLY, we can use these bindings. ## Using our fresh poppler-rs bindings I made a tiny version of pdftocairo that exclusively renders to a cairo SVG surface, just to try things out. Here it is in its entirety: TOML markup # in pdftocairo/Cargo.toml [package] name = "pdftocairo" version = "0.1.0" edition = "2021" [dependencies] # for utf-8 paths camino = "1.0.5" # for error handling color-eyre = "0.5.11" # *chants* poppler, poppler, poppler! poppler-rs = { path = "../poppler-rs/poppler" } # for rendering cairo-rs = { version = "0.14.9", features = ["svg"] } # for application-level tracing tracing = "0.1.29" tracing-error = "0.2.0" tracing-subscriber = { version = "0.3.1", features = ["env-filter"] }  Rust code // in pdftocairo/src/main.rs use std::fs::File; use cairo::{Context, SvgSurface}; use camino::Utf8PathBuf; use color_eyre::{eyre::eyre, Report}; use poppler::Rectangle; use tracing::info; fn main() -> Result<(), Report> { if std::env::var("RUST_LOG").is_err() { std::env::set_var("RUST_LOG", "info"); } color_eyre::install()?; install_tracing(); let path = Utf8PathBuf::from("/tmp/export.pdf"); info!(%path, "Reading file..."); let data = std::fs::read(&path)?; info!(%path, "Reading file... done!"); let doc = poppler::Document::from_data(&data[..], None)?; info!("Got the document! {:#?}", doc); info!("Producer = {:#?}", doc.producer()); info!("Num pages = {:#?}", doc.n_pages()); let page = doc.page(0).unwrap(); info!("page = {:#?}", page); let mut bb: Rectangle = Default::default(); page.get_bounding_box(&mut bb); info!("bb = {:#?}", *bb); info!("Creating file!"); let export_path = Utf8PathBuf::from("/tmp/export.svg"); let f = File::create(&export_path)?; info!("Creating surface..."); let surface = SvgSurface::for_stream(bb.x2 - bb.x1, bb.y2 - bb.y1, f)?; info!("Creating context..."); let cx = Context::new(&surface)?; info!("Rendering..."); page.render(&cx); info!("Finishing output stream..."); surface .finish_output_stream() .map_err(|e| eyre!("cairo error: {}", e.to_string()))?; info!(%export_path, "We're.. done?"); Ok(()) } fn install_tracing() { use tracing_error::ErrorLayer; use tracing_subscriber::prelude::*; use tracing_subscriber::{fmt, EnvFilter}; let fmt_layer = fmt::layer(); let filter_layer = EnvFilter::try_from_default_env() .or_else(|_| EnvFilter::try_new("info")) .unwrap(); tracing_subscriber::registry() .with(filter_layer) .with(fmt_layer) .with(ErrorLayer::default()) .init(); }  And here's proof it works! Shell session $ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.02s

$./target/debug/pdftocairo 2021-11-24T18:14:36.936369Z INFO pdftocairo: Reading file... path=/tmp/export.pdf 2021-11-24T18:14:36.936467Z INFO pdftocairo: Reading file... done! path=/tmp/export.pdf 2021-11-24T18:14:36.939146Z INFO pdftocairo: Got the document! Document( ObjectRef { inner: 0x000055bfa9458400, type: PopplerDocument, }, ) 2021-11-24T18:14:36.939199Z INFO pdftocairo: Producer = Some( "Skia/PDF m74", ) 2021-11-24T18:14:36.939239Z INFO pdftocairo: Num pages = 1 2021-11-24T18:14:36.939284Z INFO pdftocairo: page = Page( ObjectRef { inner: 0x000055bfa9458440, type: PopplerPage, }, ) 2021-11-24T18:14:36.941495Z INFO pdftocairo: bb = PopplerRectangle @ 0x55bfa9467810 { x1: 0.0, y1: 0.0, x2: 744.9599599999999, y2: 481.91998, } 2021-11-24T18:14:36.941563Z INFO pdftocairo: Creating file! 2021-11-24T18:14:36.941622Z INFO pdftocairo: Creating surface... 2021-11-24T18:14:36.941667Z INFO pdftocairo: Creating context... 2021-11-24T18:14:36.941691Z INFO pdftocairo: Rendering... 2021-11-24T18:14:36.947172Z INFO pdftocairo: Finishing output stream... 2021-11-24T18:14:36.955067Z INFO pdftocairo: We're.. done? export_path=/tmp/export.svg  Oooh, Skia! And here's the result: Shell session $ head /tmp/export.svg
<?xml version="1.0" encoding="UTF-8"?>
<defs>
<g>
<symbol overflow="visible" id="glyph0-0">
<path style="stroke:none;" d="M 0.703125 0 L 0.703125 -8.8125 L 5.28125 -8.8125 L 5.28125 0 Z M 1.109375 -3.96875 L 2.84375 -6.140625 L 4.65625 -8.40625 L 3.53125 -8.40625 L 1.109375 -5.375 Z M 1.109375 -5.84375 L 2.28125 -7.296875 L 3.15625 -8.40625 L 2.03125 -8.40625 L 1.109375 -7.234375 Z M 1.109375 -7.703125 L 1.671875 -8.40625 L 1.109375 -8.40625 Z M 1.109375 -2.078125 L 3.890625 -5.578125 L 4.890625 -6.8125 L 4.890625 -8.21875 L 1.109375 -3.5 Z M 1.109375 -0.390625 L 1.25 -0.390625 L 4.890625 -4.9375 L 4.890625 -6.359375 L 1.109375 -1.625 Z M 1.625 -0.390625 L 2.75 -0.390625 L 4.890625 -3.0625 L 4.890625 -4.46875 Z M 3.125 -0.390625 L 4.25 -0.390625 L 4.890625 -1.203125 L 4.890625 -2.59375 Z M 4.890625 -0.390625 L 4.890625 -0.734375 L 4.625 -0.390625 Z M 4.890625 -0.390625 "/>
</symbol>
<symbol overflow="visible" id="glyph0-1">
<path style="stroke:none;" d="M 0.796875 0 L 0.796875 -8.8125 L 5.28125 -8.8125 L 5.28125 -7.65625 L 2.140625 -7.65625 L 2.140625 -5.15625 L 4.609375 -5.15625 L 4.609375 -4 L 2.140625 -4 L 2.140625 -1.15625 L 5.28125 -1.15625 L 5.28125 0 Z M 0.796875 0 "/>
</symbol>


Uhh..

Mhh if I told you you'd probably have me do it!

...fair.

What did we learn?

We were using a tiny subset of what Inkscape can do: rendering a PDF file to an SVG surface, as paths. And it turns out, we only need the poppler and cairo libraries to do that.

Because both have a "glib" interface, we can use all the GTK-cinematic-universe tooling to generate Rust bindings for them. cairo already has an official binding, but the poppler one was out-of-date: we just regenerated it with gtk-rs/gir and we were on our way.