I love diagrams. I love them so much! In fact, I have fairly poor visualization skills, so making a diagram is extremely helpful to me: I'll have some vague idea of how different things are connected, and then I'll make a diagram, and suddenly there's a tangible thing I can look at and talk about.

Of course the diagram only represents a fraction of what I had in mind in the first place, but that's okay: the point is to be able to talk about some aspect of a concept, and so I have to make choices about what to include in the diagram. And maybe make several diagrams.

So, over the past couple years, I've made a lot of diagrams. Here's one diagram I'm particularly happy about: it describes the ELF file format, and it's from the Making our own executable packer series.

I've made that diagram using the draw.io desktop application, much like all the other diagrams I've made. It's an Electron app, which, don't hate, and also it runs everywhere and works great.

I've made a bunch of custom tooling for my website. For example, I save screenshots as .png files, but then they get converted to .jpg, .webp and .avif, and then the best of these is served to your browser. The .png is just the source of truth.

Originally, I took screenshots of draw.io — my early articles showed the grid and everything. They were also "raster" (or "bitmaps"), ie. a finite number of pixels, which looked okay on the machine I used to write articles, but not on some fancy HiDPI screen.

A Lenovo X200, the laptop I started writing articles on.

Those screenshots were also not dark mode friendly (whereas the diagrams I do now are - well, they just color flip, but at least you don't get an unholy dark/light mix), and they were fairly hard to maintain: I usually had one large .drawio file and did separate "rectangular selection" screenshots. Updating diagrams was a lot of work.

So, pretty soon I wanted the same workflow I had with screenshots: to be able to just save a .drawio file somewhere, and have whatever needs to happen happen, so that I could get best-of-class vector graphics in the browser.

I wasn't really prepared for what came next.

What's in a .drawio file?

When I complained about converting .drawio files to something else, someone online said "well buddy that's what you get for using proprietary formats".

First off.... hello. Second, is it actually proprietary?

Let's find out!

Mhh. That looks like XML... but with some binary payload in there? What's up with that?

Ah right, it's compressed. Let's uncheck that and look at the result:

Alright! It is just XML. But it's not, say, SVG. Can we get draw.io to export as SVG? Yes we can! There's an "export as SVG" option in there.

We can also do it from the comfort of the command line:

Shell session
$ drawio elf64-file-header.drawio --crop --export --format svg --output /tmp/elf-file-header.svg
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
elf64-file-header.drawio -> /tmp/elf-file-header.svg
Cool bear's hot tip

Ignore the libva error - just another day on the Linux desktop.

If we open that SVG in, say, Google Chrome, it looks fine:

But if we open it, in say, GNOME's default image viewer... it doesn't!

Mhh, I'm sure draw.io has other export options... how about PDF?

Shell session
$ drawio elf64-file-header.drawio --crop --export --format pdf --output /tmp/elf-file-header.pdf
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
elf64-file-header.drawio -> /tmp/elf-file-header.pdf

Looks fine in Chrome:

And it looks fine in GNOME's default document viewer!

Great!

But amos... you can't embed PDFs in web pages!

I know that! But a) the layout is right, and b) it's still vector graphics. I'm sure there's a way to get from PDF back to SVG...

But you don't even know why draw.io's SVG export "didn't work right" in the first place.

That's true. We don't even know that it's "wrong" per se, just that it looked weird in Eye of Gnome. But here's something you may have not considered bear: I'd like my diagrams to look the same everywhere.

So? What would make them look different?

Well, tons of things! But most importantly, fonts. I use the wonderful Iosevka font for all my diagrams. What do you think would happen if people opened the SVG on a computer that didn't have that font installed?

Well... I'm fairly sure PDF has a way to embed fonts... as for SVG, it'd probably show the wrong font. Unless you have it set up as a web font?

Correct on all counts! Except I want people to be able to download diagrams and share them, maybe print them.

Mhh... so the text can't be an SVG text element that says "use this font"? It has to be actual paths?

That's right! Luckily, there's a tool that lets us do both "go from PDF to SVG" and "convert text to paths", and that tool is Inkscape.

We can do that from the command line:

Shell session
$ inkscape /tmp/elf-file-header.pdf --pdf-poppler --export-plain-svg --export-text-to-path --export-filename /tmp/elf-file-header.pdf.svg

$ ls -lhA /tmp/elf-fil*
-rw-r--r--. 1 amos amos 110K Nov 17 11:39 /tmp/elf-file-header.pdf
-rw-r--r--. 1 amos amos 538K Nov 17 11:47 /tmp/elf-file-header.pdf.svg
-rw-r--r--. 1 amos amos  65K Nov 17 11:35 /tmp/elf-file-header.svg

Whoa, that's much larger than draw.io's original svg output!

It is! But it looks great in Chrome...

And in Eye of Gnome:

And where the original SVG referred to the Iosevka font, that one doesn't:

Shell session
$ xmllint --format /tmp/elf-file-header.svg | grep --color=always Iosevka | head
              <div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
        <text x="479" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Entry poin...</text>
              <div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
        <text x="569" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Table offs...</text>
              <div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
        <text x="659" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Table offs...</text>
              <div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
        <text x="749" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Flags...</text>
              <div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
        <text x="749" y="369" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Header size...</text>

$ xmllint --format /tmp/elf-file-header.pdf.svg | grep --color=always Iosevka | head
Cool bear's hot tip

If you find yourself transported to a time before JSON and YAML ruled them all and in the darkness bound them, and you need to pretty-print XML, there's a few options at your disposal.

Also, grep --color=always is handy if you need to pipe to another utility like less -R, head, tail, etc. By default, grep will detect that stdout is not a tty and disable color, but in that case, we really wanted to see colors (which you can't see above because it was just copied and pasted).

So! Now we have a pretty chunky SVG, what do we do with it? It's great: always displays right, crisp at any scale, can be downloaded and printed anywhere. But it's big. Like, bigger than a PNG of the same size at a very high resolution, because as it turns out, converting a lot of text to paths isn't free.

Well, we can "optimize" that SVG, with a tool like svgo:

Shell session
$ svgo --input /tmp/elf-file-header.pdf.svg --output /tmp/elf-file-header.pdf.smol.svg

elf-file-header.pdf.svg:
Done in 478 ms!
537.335 KiB - 51.7% = 259.625 KiB

And just like that, we made it 50% smaller, and it displays the same:

Because all of these steps, from .drawio to .pdf, then .pdf to .svg, then .svg to smaller .svg, can be done from the command line (by invoking drawio, inkscape, and svgo respectively), all that can be automated with a single tool. And so that's exactly what I did!

So that's it right? Story's over?

Well... not quite.

Containing the disaster

As I've confessed on my Patreon intro, I can't stay in one place. That also means I move across different computers, and different operating systems, all the time. And so I set up my local website copy from scratch more times than I'm comfortable admitting - every couple articles or so.

This may seem nightmarish to you, but you're wrong. This is good, actually, because it makes me care a lot about what that workflow looks like (and I also get to find out pretty early when something breaks. And it's usually something minor I can fix right away, instead of having a "frozen" setup work for 10 years, and then a LOT of work to bring everything up-to-date).

Anyone who's had the displeasure of meeting head-on with an ArchLinux install that hasn't been tended to in over a year will know that pain.

So for the "PNG to JPEG/AVIF/WEBP" pipeline, I need a couple tools installed: ImageMagick, avifenc, and cwebp.

And for the "draw.io to SVG" pipeline, I need the draw.io desktop app, Inkscape, and svgo.

That's a bunch of things to install. It's fine on Fedora for example, from which I am writing these lines and hoping they find you well, because it has packages for all of these. But Ubuntu only started shipping a package for avifenc in Ubuntu 21.04, whereas I find myself interacting with Ubuntu 20.04 (two versions back) quite a bit, because it's an LTS release (long-term support).

So on Ubuntu 20.04, I need to build avifenc myself. And while it's definitely not the worst experience I've had (it doesn't involve Perl, for example, and at least it's not python), it can take a while, if I don't happen to be on a beefy machine.

But there's another argument to be made. For diagrams, the .drawio file is the source of truth. For bitmaps, the .png file is the source of truth. But since I'm using my own tool (salvage) to process those files as I'm writing articles, and deploying my website is a simple git pull from the server (which watches for file modifications, rebuilds changed assets and atomically switches to a newer deploy: it's fancier than it seems at first glance), that means that:

And... it adds up.

Shell session
$ z fast
$ du -h -d0
906M    .
Cool bear's hot tip

zoxide (shown here as z) is a smarter cd command for your terminal. It remembers which directories you cd to a lot, and lets you jump to them using a shorter substring. In this instance, this was equivalent to cd ~/bearcove/fasterthanli.me.

du is shipped with most Linux systems. Its -h option uses "human-readable" sizes, e.g. megabytes and gigabytes instead of bytes. One could also do -c to get a "total count", and pipe it to tail -1, but that's a waste! Setting the depth to zero (-d0) is faster and better.

So ideally, I'd only ever need to commit the source of truth to the repository: the .png and .drawio files. And my custom not-really-static site generator would magically know what to do with those and generate whichever assets are needed.

And because that's a lot of processing, maybe the processed assets are cached locally, or somewhere like an Amazon S3 bucket, since I'm already using AWS for a bunch of things.

But here comes the airplane here's the wrinkle: because of all the command-line tools I'm using, my server, which once was a rather self-contained binary, to the point of bringing its own TLS implementation (the wonderful rustls):

Shell session
$ ldd $(which futile)
        linux-vdso.so.1 (0x00007ffd189f8000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb0a1ad7000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fb0a19fb000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fb0a17f1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb0a3046000)

...would start depending on six different command-line tools. Some of whom are not, uh, not exactly command-line tools.

Inkscape is okay! It exists in the GLib/GTK cinematic universe, so it's not the lightest thing in the world, but by 2021 standards, it's okay. draw.io is another matter entirely: it not only runs on the desktop, it also runs in browsers, for example.

So it really does need Electron, short of maintaining a lot of different codebases, and chasing subtle implementation differences till the end of time.

And, well, one problem there is that it really wants an X server in order to function. Even to just "export as PDF".

That's because "exporting to PDF" is just printing to a virtual device. This makes sense if you know the lineage: PDF is based on PostScript, which is what computers tend to send to some printers (proprietary shenanigans aside).

So, when I got tired of setting up all those dependencies by hand, and thought about "simply shoving everything in a Docker container" (a hammer fit for a surprising amount of nails), things were somewhat fine until I added draw.io: the package itself added 447MB to the image, then xvfb added another 142MB. That's over half the total size of the image (the 125MB are node.js for svgo, the 147MB are for inkscape):

Cool bear's hot tip

The tool shown above is dive and it's extremely useful to explore "why is my Docker image large".

The image size matters somewhat because I'm storing it in the cloud and downloading a gigabyte over the ocean, while not the worst thing ever, is not the best either. Oh and of course you need a Docker daemon running, so on Windows that means being locked into WSL2, and thus Hyper-V, which cripple VMWare Workstation's performance for example.

Couldn't you have a Docker daemon running in some Linux VM and connect it from Windows?

Yes, yes, anything is possible, but I'm trying to /reduce/ the number of setup setups. I've done remote docker a bunch and it's all fun and games until you remember you can't use bind mounts.

Ah, fair.

So, maybe we should look at an alternative solution.

What did we learn?

Making diagrams with draw.io is fun, but unless we manually go through the "Export" flow every time, it's kinda hard to automate exporting to PDF, which we need to get SVG files that can be viewed and printed from any viewer without any fonts installed (and without requiring HTML support).

"Shoving everything inside a Docker container" is an option, but not a great one. In this case, because we can (nobody complain, you choose what you read!), we're going to try to find another solution.

Headless draw.io exports

Here's a fun thought I've had several time over the past year: "oh that seems like a lot of effort, I'm not sure I want to do that..." and immediately after "oh WAIT I've basically made my website from scratch already, this isn't significantly more involved than what I've done so far", and then I do it.

This is one of these times.

So! The first thing we need is to turn a .drawio file, which is a compressed version of an XML "mxGraph" tree of nodes, into an .svg file, which is... also XML, but something like Chromium can understand and render.

Cool bear's hot tip

Unfortunately, draw.io uses the foreignObject tag in SVG to achieve line wrapping, so we really do need a browser to render those SVGs, we can't use something like lyon.

But remember, we can't actually launch draw.io, because we don't want to depend on having Electron/a whole X server around, even a virtual framebuffer one.

So, what do we do? Well!

At first I thought we could try to use the mxGraph library directly to convert the .drawio to an .svg!

The original repository for mxGraph is archived, which sounds to me like the author wanted to narrow scope to just draw.io (which is more than fair, we're on our own past this point). Luckily, the drawio repository contains a copy of mxClient.js.

Upon further inspection, mxGraph doesn't really run outside of a browser environment, (in, say, Node, or vanilla V8 from Rust) - it really expects some sort of navigator. That's not the dealbreaker it appears at first place.

We were always going to need Chrome, to render the resulting SVG, because of the presence of foreignObject. So we might as well load part of draw.io's JavaScript code in Chrome, right?

I created the following index.html file:

HTML
<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <script src="js/export-init.js"></script>
  <!-- CSS for print output is needed for using current window -->
  <style type="text/css">
      span.MathJax_SVG svg { shape-rendering: crispEdges; }
      table.mxPageSelector { display: none; }
      hr.mxPageBreak { display: none; }
  </style>
  <link rel="stylesheet" href="mxgraph/css/common.css" charset="UTF-8" type="text/css">
  <link rel="stylesheet" href="export-fonts.css" charset="UTF-8" type="text/css">
  <script src="js/app.min.js"></script>
  <script src="js/export.js"></script>
  <script src="js/test.js"></script>
</head>
<body style="margin:0px;overflow:hidden">
</body>
</html>

With test.js set to:

JavaScript code
window.onload = async function main() {
  let xml = await (await fetch("/sample.drawio")).text();
  let data = {
    xml,
  };
  render(data);
};

And, using sfz as a quick HTTP server... voila!

And now, if we bring headless_chrome in the mix...

Rust code
// somewhere in salvage's source code

use camino::Utf8PathBuf;
use headless_chrome::{protocol::page::PrintToPdfOptions, Browser, LaunchOptionsBuilder};
use tracing::info;

pub fn do_stuff() -> Result<(), Box<dyn std::error::Error>> {
    let browser = Browser::new(LaunchOptionsBuilder::default().headless(true).build()?)?;

    let tab = browser.wait_for_initial_tab()?;

    info!("Navigating...");
    tab.navigate_to("http://localhost:5000/index.html")?;
    tab.wait_until_navigated()?;
    info!("Navigating... done!");

    let width = tab
        .evaluate("document.body.clientWidth", false)?
        .value
        .unwrap()
        .as_u64()
        .unwrap();
    let height = tab
        .evaluate("document.body.clientHeight", false)?
        .value
        .unwrap()
        .as_u64()
        .unwrap();
    info!(%width, %height, "Got dimensions");

    let pdf = tab.print_to_pdf(Some(PrintToPdfOptions {
        display_header_footer: Some(false),
        prefer_css_page_size: Some(false),
        landscape: None,
        print_background: None,
        scale: None,
        // Assuming 96 DPI (dots per inch)
        paper_width: Some(width as f32 / 96.0),
        paper_height: Some(height as f32 / 96.0),
        margin_top: Some(0.0),
        margin_bottom: Some(0.0),
        margin_left: Some(0.0),
        margin_right: Some(0.0),
        page_ranges: None,
        ignore_invalid_page_ranges: None,
        header_template: None,
        footer_template: None,
    }))?;
    let pdf_path = Utf8PathBuf::from("/tmp/export.pdf");
    info!(%pdf_path, "Writing pdf...");
    std::fs::write(&pdf_path, &pdf)?;
    info!(%pdf_path, "Writing pdf... done!");

    Ok(())
}

And just like that...

Shell session
$ cargo run --release -- /tmp
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/salvage /tmp`
2021-11-18T19:02:44.426009Z  INFO headless_chrome::browser::process: Launching Chrome binary at "/usr/bin/google-chrome-stable"    
2021-11-18T19:02:44.426390Z  INFO headless_chrome::browser::process: Started Chrome. PID: 39323    
2021-11-18T19:02:44.579968Z  INFO salvage::chrome_stuff: Navigating...
2021-11-18T19:02:44.585193Z  INFO headless_chrome::browser::tab: Navigating a tab to http://localhost:5000/index.html    
2021-11-18T19:02:45.489303Z  INFO salvage::chrome_stuff: Navigating... done!
2021-11-18T19:02:45.501346Z  INFO salvage::chrome_stuff: Got dimensions width=994 height=643
2021-11-18T19:02:45.556953Z  INFO salvage::chrome_stuff: Writing pdf... pdf_path=/tmp/export.pdf
2021-11-18T19:02:45.557023Z  INFO salvage::chrome_stuff: Writing pdf... done! pdf_path=/tmp/export.pdf
2021-11-18T19:02:45.557041Z  INFO headless_chrome::browser: Dropping browser    
2021-11-18T19:02:45.557077Z  INFO headless_chrome::browser::process: Killing Chrome. PID: 39323    
2021-11-18T19:02:45.557173Z  INFO headless_chrome::browser::transport::web_socket_connection: Sending shutdown message to message handling loop    
2021-11-18T19:02:45.557226Z  INFO headless_chrome::browser::transport: Received shutdown message    
2021-11-18T19:02:45.557248Z  INFO headless_chrome::browser::transport: Shutting down message handling loop    
2021-11-18T19:02:45.557353Z  INFO headless_chrome::browser::tab: finished tab's event handling loop    
2021-11-18T19:02:45.557408Z  INFO headless_chrome::browser: Finished browser's event handling loop    
2021-11-18T19:02:45.557414Z  INFO headless_chrome::browser::transport: cleared listeners, I think   

We got a PDF!

The resulting PDF is 107KB. That's pretty small... suspiciously small, even. Sure, there's Deflate compression involved (I checked, the file contains /Filter /FlateDecode).

I doubt the PDF has actually converted text to paths. My guess is that it's only embedding the characters it needs from the Iosevka font. Let's check it out!

Shell session
$ qpdf --qdf --object-streams=disable export.pdf export.uncompressed.pdf
$ ls -lhA export*.pdf
-rw-r--r--. 1 amos amos 107K Nov 18 20:02 export.pdf
-rw-r--r--. 1 amos amos 398K Nov 18 20:12 export.uncompressed.pdf

Near the end of export.uncompressed.pdf, we can find this:

%% Original object ID: 11 0
14 0 obj
<<
  /Ascent 977
  /CapHeight 735
  /Descent -272
  /Flags 5
  /FontBBox [
    -776
    -505
    1276
    1188
  ]
  /FontFile2 15 0 R
  /FontName /IosevkaNerdFontCompleteM-
  /ItalicAngle 0
  /StemV 156
  /Type /FontDescriptor
>>
endobj

AhAH! A font descriptor! Then follows, with object ID 15 0, the actual font data. It's just, again, a deflate-compressed TTF file, which we can extract, and then inspect:

And as expected, it's missing some glyphs! Our diagram contains numbers from 0 to 8, but not 9, so it's not included. Similarly, we never have the uppercase letters G, J, K, Q, U, W, and Z, so they're left out. That's why the resulting .ttf file is only 54KB, instead of the original 1239KB.

Okay, cool, but... is "headless chrome" truly headless? Does it work without Wayland / X?

Well, let's find out!

We'll make a Docker container with only salvage (which is the codebase I chose to mangle to try out headless_chrome) and see what happens.

But first, because I don't feel like forwarding ports or running sfz inside the Docker container, let's make sure salvage itself can serve the static files as HTTP, so that headless chrome can access them.

We'll use hyper-staticfile for that:

Rust code
use std::{convert::Infallible, net::SocketAddr};

use camino::Utf8PathBuf;
use headless_chrome::{protocol::page::PrintToPdfOptions, Browser, LaunchOptionsBuilder};
use hyper::{service::make_service_fn, Server};
use hyper_staticfile::Static;
use tokio::task::spawn_blocking;
use tracing::info;

pub async fn export_to_pdf() -> Result<(), Box<dyn std::error::Error>> {
    let addr = SocketAddr::from(([127, 0, 0, 1], 5000));
    let make_svc =
        make_service_fn(|_conn| async { Ok::<_, Infallible>(Static::new("mxgraph-tests")) });
    let server = Server::bind(&addr).serve(make_svc);
    let server_task = tokio::spawn(server);

    let chrome_task = spawn_blocking(|| orchestrate_chrome().unwrap());

    chrome_task.await?;
    server_task.abort();

    Ok(())
}

pub fn orchestrate_chrome() -> Result<(), Box<dyn std::error::Error>> {
    let browser = Browser::new(LaunchOptionsBuilder::default().headless(true).build()?)?;

    let tab = browser.wait_for_initial_tab()?;

    info!("Navigating...");
    tab.navigate_to("http://localhost:5000/index.html")?;
    tab.wait_until_navigated()?;
    info!("Navigating... done!");

    // same as before    
}

Then, build a release binary of salvage and start a Docker container of Fedora 35 (the Linux distribution I happen to be working from today), with chromium-browser installed:

Shell session
$ cargo build --release
(omitted)
$ docker run -v ${PWD}:/workspace -w /workspace -it fedora:35 /bin/bash

And... it works!

Shell session
$ docker cp 083c61919ead:/tmp/export.pdf /tmp/export.docker.pdf
$ xdg-open /tmp/export.docker.pdf
What did we learn?

draw.io is built on top of web technologies: it runs in browsers. However, the PDF export functionality in particular is kinda tied to Chromium. In the web version, it uses a server to do the conversion.

However, using headless_chrome, we can control a local copy of Google Chrome or Chromium, and reproduce the same export flow the desktop version of draw.io has. But this time, we don't need to have a "display server" like X.org running.