I love diagrams. I love them so much! In fact, I have fairly poor visualization
skills, so making a diagram is extremely helpful to me: I'll have some vague
idea of how different things are connected, and then I'll make a diagram, and
suddenly there's a tangible thing I can look at and talk about.
Of course the diagram only represents a fraction of what I had in mind in the
first place, but that's okay: the point is to be able to talk about some
aspect of a concept, and so I have to make choices about what to include in the
diagram. And maybe make several diagrams.
So, over the past couple years, I've made a lot of diagrams. Here's one diagram
I'm particularly happy about: it describes the ELF file format, and it's from the
Making our own executable packer series .
I've made that diagram using the
draw.io desktop application, much
like all the other diagrams I've made. It's an Electron app, which, don't
hate , and also it runs everywhere
and works great.
I've made a bunch of custom tooling for my
website . For example, I save screenshots as
.png
files, but then they get converted to .jpg
, .webp
and .avif
, and
then the best of these is served to your browser. The .png is just the source of
truth.
Originally, I took screenshots of draw.io — my early articles showed the grid
and everything. They were also "raster" (or "bitmaps"), ie. a finite number of
pixels, which looked okay on the machine I used to write articles, but not on
some fancy HiDPI screen.
A Lenovo X200, the laptop I started writing articles on.
Those screenshots were also not dark mode friendly (whereas the diagrams I do
now are - well, they just color flip, but at least you don't get an unholy
dark/light mix), and they were fairly hard to maintain: I usually had one large
.drawio
file and did separate "rectangular selection" screenshots. Updating
diagrams was a lot of work .
So, pretty soon I wanted the same workflow I had with screenshots: to be able to
just save a .drawio
file somewhere, and have whatever needs to happen happen,
so that I could get best-of-class vector graphics in the browser.
I wasn't really prepared for what came next.
When I complained about converting .drawio
files to something else, someone
online said "well buddy that's what you get for using proprietary formats".
First off.... hello. Second, is it actually proprietary?
Let's find out!
Mhh. That looks like XML... but with some binary payload in there? What's up with that?
Ah right, it's compressed. Let's uncheck that and look at the result:
Alright! It is just XML. But it's not, say, SVG. Can we get draw.io to export as
SVG? Yes we can! There's an "export as SVG" option in there.
We can also do it from the comfort of the command line:
Shell session
$ drawio elf64-file-header.drawio --crop --export --format svg --output /tmp/elf-file-header.svg
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
elf64-file-header.drawio -> /tmp/elf-file-header.svg
Ignore the libva error - just another day on the Linux desktop.
If we open that SVG in, say, Google Chrome, it looks fine:
But if we open it, in say, GNOME's default image viewer ... it doesn't!
Mhh, I'm sure draw.io has other export options... how about PDF?
Shell session
$ drawio elf64-file-header.drawio --crop --export --format pdf --output /tmp/elf-file-header.pdf
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
elf64-file-header.drawio -> /tmp/elf-file-header.pdf
Looks fine in Chrome:
And it looks fine in GNOME's default document viewer !
Great!
But amos... you can't embed PDFs in web pages!
I know that! But a) the layout is right, and b) it's still vector graphics. I'm
sure there's a way to get from PDF back to SVG...
But you don't even know why draw.io's SVG export "didn't work right" in the
first place.
That's true. We don't even know that it's "wrong" per se, just that it looked
weird in Eye of Gnome. But here's something you may have not considered bear:
I'd like my diagrams to look the same everywhere.
So? What would make them look different?
Well, tons of things! But most importantly, fonts. I use the wonderful
Iosevka font for all my diagrams. What do you
think would happen if people opened the SVG on a computer that didn't have that
font installed?
Well... I'm fairly sure PDF has a way to embed fonts... as for SVG, it'd probably
show the wrong font. Unless you have it set up as a web font ?
Correct on all counts! Except I want people to be able to download diagrams and
share them, maybe print them.
Mhh... so the text can't be an SVG text element that says "use this font"? It has
to be actual paths?
That's right! Luckily, there's a tool that lets us do both "go from PDF to SVG"
and "convert text to paths", and that tool is Inkscape .
We can do that from the command line:
Shell session
$ inkscape /tmp/elf-file-header.pdf --pdf-poppler --export-plain-svg --export-text-to-path --export-filename /tmp/elf-file-header.pdf.svg
$ ls -lhA /tmp/elf-fil*
-rw-r--r--. 1 amos amos 110K Nov 17 11:39 /tmp/elf-file-header.pdf
-rw-r--r--. 1 amos amos 538K Nov 17 11:47 /tmp/elf-file-header.pdf.svg
-rw-r--r--. 1 amos amos 65K Nov 17 11:35 /tmp/elf-file-header.svg
Whoa, that's much larger than draw.io's original svg output!
It is! But it looks great in Chrome...
And in Eye of Gnome:
And where the original SVG referred to the Iosevka font, that one doesn't:
Shell session
$ xmllint --format /tmp/elf-file-header.svg | grep --color=always Iosevka | head
<div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
<text x="479" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Entry poin...</text>
<div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
<text x="569" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Table offs...</text>
<div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
<text x="659" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Table offs...</text>
<div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
<text x="749" y="129" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Flags...</text>
<div style="display: inline-block; font-size: 16px; font-family: Iosevka; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">
<text x="749" y="369" fill="rgba(0, 0, 0, 1)" font-family="Iosevka" font-size="16px">Header size...</text>
$ xmllint --format /tmp/elf-file-header.pdf.svg | grep --color=always Iosevka | head
If you find yourself transported to a time before JSON and YAML ruled them all
and in the darkness bound them, and you need to pretty-print XML, there's a few
options at your disposal.
Also, grep --color=always
is handy if you need to pipe to another utility like
less -R
, head
, tail
, etc. By default, grep
will detect that stdout is
not a tty and disable color, but in that
case, we really wanted to see colors (which you can't see above because it was
just copied and pasted).
So! Now we have a pretty chunky SVG, what do we do with it? It's great: always
displays right, crisp at any scale, can be downloaded and printed anywhere. But
it's big. Like, bigger than a PNG of the same size at a very high resolution,
because as it turns out, converting a lot of text to paths isn't free.
Well, we can "optimize" that SVG, with a tool like svgo :
Shell session
$ svgo --input /tmp/elf-file-header.pdf.svg --output /tmp/elf-file-header.pdf.smol.svg
elf-file-header.pdf.svg:
Done in 478 ms!
537.335 KiB - 51.7% = 259.625 KiB
And just like that, we made it 50% smaller, and it displays the same:
Because all of these steps, from .drawio
to .pdf
, then .pdf
to .svg
,
then .svg
to smaller .svg
, can be done from the command line (by invoking
drawio
, inkscape
, and svgo
respectively), all that can be automated with
a single tool. And so that's exactly what I did!
So that's it right? Story's over?
Well... not quite.
As I've confessed on my Patreon intro , I
can't stay in one place. That also means I move across different computers, and
different operating systems, all the time. And so I set up my local website copy
from scratch more times than I'm comfortable admitting - every couple articles
or so.
This may seem nightmarish to you, but you're wrong. This is good, actually,
because it makes me care a lot about what that workflow looks like (and I also
get to find out pretty early when something breaks. And it's usually something
minor I can fix right away, instead of having a "frozen" setup work for 10 years,
and then a LOT of work to bring everything up-to-date).
Anyone who's had the displeasure of meeting head-on with an ArchLinux install
that hasn't been tended to in over a year will know that pain.
So for the "PNG to JPEG/AVIF/WEBP" pipeline, I need a couple tools installed:
ImageMagick ,
avifenc , and
cwebp .
And for the "draw.io to SVG" pipeline, I need the draw.io desktop
app ,
Inkscape , and svgo .
That's a bunch of things to install. It's fine on Fedora for example, from
which I am writing these lines and hoping they find you well, because it has
packages for all of these. But Ubuntu only started shipping a package for
avifenc
in Ubuntu 21.04, whereas I find myself interacting with Ubuntu 20.04
(two versions back) quite a bit, because it's an LTS release (long-term
support ).
So on Ubuntu 20.04, I need to build avifenc
myself. And while it's definitely
not the worst experience I've had (it doesn't involve
Perl , for example, and
at least it's not python ),
it can take a while, if I don't happen to be on a beefy machine.
But there's another argument to be made. For diagrams, the .drawio
file is the
source of truth. For bitmaps, the .png
file is the source of truth. But since
I'm using my own tool (salvage) to process those files as I'm writing
articles , and deploying my website is a simple git pull from the server (which
watches for file modifications, rebuilds changed assets and atomically switches
to a newer deploy: it's fancier than it seems at first glance), that means that:
for every image, I'm committing 4 files (png, jpeg, webp, avif)
for every diagram, I'm committing 2 files (drawio, svg)
And... it adds up.
Shell session
$ z fast
$ du -h -d0
906M .
zoxide (shown here as z
) is a smarter cd
command for your terminal. It remembers which directories you cd
to a lot, and
lets you jump to them using a shorter substring. In this instance, this was
equivalent to cd ~/bearcove/fasterthanli.me
.
du
is shipped with most Linux systems. Its -h
option uses "human-readable"
sizes, e.g. megabytes and gigabytes instead of bytes. One could also do -c
to
get a "total count", and pipe it to tail -1
, but that's a waste! Setting the
depth to zero (-d0
) is faster and better.
So ideally, I'd only ever need to commit the source of truth to the repository:
the .png
and .drawio
files. And my custom not-really-static site generator
would magically know what to do with those and generate whichever assets are
needed.
And because that's a lot of processing, maybe the processed assets are cached
locally, or somewhere like an Amazon S3 bucket, since I'm already using AWS for
a bunch of things .
But here comes the airplane here's the wrinkle: because of all the
command-line tools I'm using, my server, which once was a rather self-contained
binary, to the point of bringing its own TLS implementation (the wonderful
rustls ):
Shell session
$ ldd $(which futile)
linux-vdso.so.1 (0x00007ffd189f8000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb0a1ad7000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb0a19fb000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb0a17f1000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb0a3046000)
...would start depending on six different command-line tools. Some of whom are
not, uh, not exactly command-line tools.
Inkscape is okay! It exists in the GLib/GTK cinematic universe, so it's not the
lightest thing in the world, but by 2021 standards, it's okay. draw.io is
another matter entirely: it not only runs on the desktop, it also runs in
browsers, for example.
So it really does need Electron, short of maintaining a lot of different
codebases, and chasing subtle implementation differences till the end of time.
And, well, one problem there is that it really wants an X
server in order to function.
Even to just "export as PDF".
That's because "exporting to PDF" is just printing to a virtual device. This makes
sense if you know the lineage: PDF is based
on PostScript , which is what computers
tend to send to some printers (proprietary shenanigans aside).
So, when I got tired of setting up all those dependencies by hand, and thought
about "simply shoving everything in a Docker container" (a hammer fit for a
surprising amount of nails), things were somewhat fine until I added draw.io: the
package itself added 447MB to the image, then xvfb added another 142MB. That's over
half the total size of the image (the 125MB are node.js for svgo, the 147MB are for
inkscape):
The tool shown above is dive and it's extremely useful
to explore "why is my Docker image large".
The image size matters somewhat because I'm storing it in the cloud and
downloading a gigabyte over the ocean, while not the worst thing ever, is
not the best either. Oh and of course you need a Docker daemon running, so
on Windows that means being locked into WSL2, and thus Hyper-V, which cripple
VMWare Workstation's performance for example.
Couldn't you have a Docker daemon running in some Linux VM and connect it from
Windows?
Yes, yes, anything is possible, but I'm trying to /reduce/ the number of setup
setups. I've done remote docker a bunch and it's all fun and games until you
remember you can't use bind mounts.
So, maybe we should look at an alternative solution.
Making diagrams with draw.io is fun, but unless we
manually go through the "Export" flow every time, it's kinda hard to automate
exporting to PDF, which we need to get SVG files that can be viewed and printed
from any viewer without any fonts installed (and without requiring HTML
support).
"Shoving everything inside a Docker container" is an option, but not a great
one. In this case, because we can (nobody complain, you choose what you read!),
we're going to try to find another solution.
Here's a fun thought I've had several time over the past year: "oh that seems
like a lot of effort, I'm not sure I want to do that..." and immediately after
"oh WAIT I've basically made my website from scratch already, this isn't
significantly more involved than what I've done so far", and then I do it.
This is one of these times.
So! The first thing we need is to turn a .drawio
file, which is a compressed
version of an XML "mxGraph" tree of nodes, into an .svg
file, which is... also
XML, but something like Chromium can understand and render.
Unfortunately, draw.io uses the
foreignObject
tag in SVG to achieve line wrapping, so we really do need a browser to render
those SVGs, we can't use something like lyon .
But remember, we can't actually launch draw.io, because we don't want to depend
on having Electron/a whole X server around, even a virtual framebuffer
one .
So, what do we do? Well!
At first I thought we could try to use the mxGraph library directly to convert
the .drawio
to an .svg
!
The original repository for mxGraph is archived, which
sounds to me like the author wanted to narrow scope to just draw.io (which is
more than fair, we're on our own past this point). Luckily, the drawio
repository contains a copy of mxClient.js
.
Upon further inspection, mxGraph doesn't really run outside of a browser
environment, (in, say, Node, or vanilla V8 from Rust) - it really expects some
sort of navigator. That's not the dealbreaker it appears at first place.
We were always going to need Chrome, to render the resulting SVG, because of
the presence of foreignObject
. So we might as well load part of draw.io's
JavaScript code in Chrome, right?
I created the following index.html
file:
HTML
<!DOCTYPE html>
< html >
< head >
< meta http-equiv ="Content-Type " content ="text/html; charset=UTF-8 ">
< script src ="js/export-init.js "> </ script >
<!-- CSS for print output is needed for using current window -->
< style type ="text/css ">
span.MathJax_SVG svg { shape-rendering: crispEdges; }
table.mxPageSelector { display: none; }
hr.mxPageBreak { display: none; }
</ style >
< link rel ="stylesheet " href ="mxgraph/css/common.css " charset ="UTF-8 " type ="text/css ">
< link rel ="stylesheet " href ="export-fonts.css " charset ="UTF-8 " type ="text/css ">
< script src ="js/app.min.js "> </ script >
< script src ="js/export.js "> </ script >
< script src ="js/test.js "> </ script >
</ head >
< body style ="margin:0px;overflow:hidden ">
</ body >
</ html >
With test.js
set to:
JavaScript code
window . onload = async function main ( ) {
let xml = await ( await fetch ( "/sample.drawio" ) ) . text ( ) ;
let data = {
xml,
} ;
render ( data ) ;
} ;
And, using sfz as a quick HTTP server... voila!
And now, if we bring headless_chrome
in the mix...
Rust code
// somewhere in salvage's source code
use camino:: Utf8PathBuf;
use headless_chrome:: { protocol:: page:: PrintToPdfOptions, Browser, LaunchOptionsBuilder} ;
use tracing:: info;
pub fn do_stuff ( ) -> Result < ( ) , Box < dyn std:: error:: Error > > {
let browser = Browser:: new ( LaunchOptionsBuilder:: default ( ) . headless ( true ) . build ( ) ?) ?;
let tab = browser. wait_for_initial_tab ( ) ?;
info ! ( "Navigating..." ) ;
tab. navigate_to ( "http://localhost:5000/index.html" ) ?;
tab. wait_until_navigated ( ) ?;
info ! ( "Navigating... done!" ) ;
let width = tab
. evaluate ( "document.body.clientWidth" , false ) ?
. value
. unwrap ( )
. as_u64 ( )
. unwrap ( ) ;
let height = tab
. evaluate ( "document.body.clientHeight" , false ) ?
. value
. unwrap ( )
. as_u64 ( )
. unwrap ( ) ;
info ! ( %width, %height, "Got dimensions" ) ;
let pdf = tab. print_to_pdf ( Some ( PrintToPdfOptions {
display_header_footer : Some ( false ) ,
prefer_css_page_size : Some ( false ) ,
landscape : None,
print_background : None,
scale : None,
// Assuming 96 DPI (dots per inch)
paper_width : Some ( width as f32 / 96.0 ) ,
paper_height : Some ( height as f32 / 96.0 ) ,
margin_top : Some ( 0.0 ) ,
margin_bottom : Some ( 0.0 ) ,
margin_left : Some ( 0.0 ) ,
margin_right : Some ( 0.0 ) ,
page_ranges : None,
ignore_invalid_page_ranges : None,
header_template : None,
footer_template : None,
} ) ) ?;
let pdf_path = Utf8PathBuf:: from ( "/tmp/export.pdf" ) ;
info ! ( %pdf_path, "Writing pdf..." ) ;
std:: fs:: write ( & pdf_path, & pdf) ?;
info ! ( %pdf_path, "Writing pdf... done!" ) ;
Ok ( ( ) )
}
And just like that...
Shell session
$ cargo run --release -- /tmp
Finished release [optimized] target(s) in 0.03s
Running `target/release/salvage /tmp`
2021-11-18T19:02:44.426009Z INFO headless_chrome::browser::process: Launching Chrome binary at "/usr/bin/google-chrome-stable"
2021-11-18T19:02:44.426390Z INFO headless_chrome::browser::process: Started Chrome. PID: 39323
2021-11-18T19:02:44.579968Z INFO salvage::chrome_stuff: Navigating...
2021-11-18T19:02:44.585193Z INFO headless_chrome::browser::tab: Navigating a tab to http://localhost:5000/index.html
2021-11-18T19:02:45.489303Z INFO salvage::chrome_stuff: Navigating... done!
2021-11-18T19:02:45.501346Z INFO salvage::chrome_stuff: Got dimensions width=994 height=643
2021-11-18T19:02:45.556953Z INFO salvage::chrome_stuff: Writing pdf... pdf_path=/tmp/export.pdf
2021-11-18T19:02:45.557023Z INFO salvage::chrome_stuff: Writing pdf... done! pdf_path=/tmp/export.pdf
2021-11-18T19:02:45.557041Z INFO headless_chrome::browser: Dropping browser
2021-11-18T19:02:45.557077Z INFO headless_chrome::browser::process: Killing Chrome. PID: 39323
2021-11-18T19:02:45.557173Z INFO headless_chrome::browser::transport::web_socket_connection: Sending shutdown message to message handling loop
2021-11-18T19:02:45.557226Z INFO headless_chrome::browser::transport: Received shutdown message
2021-11-18T19:02:45.557248Z INFO headless_chrome::browser::transport: Shutting down message handling loop
2021-11-18T19:02:45.557353Z INFO headless_chrome::browser::tab: finished tab's event handling loop
2021-11-18T19:02:45.557408Z INFO headless_chrome::browser: Finished browser's event handling loop
2021-11-18T19:02:45.557414Z INFO headless_chrome::browser::transport: cleared listeners, I think
We got a PDF!
The resulting PDF is 107KB. That's pretty small... suspiciously small, even.
Sure, there's Deflate compression
involved (I checked, the file contains /Filter /FlateDecode
).
I doubt the PDF has actually converted text to paths. My guess is that it's only
embedding the characters it needs from the Iosevka font. Let's check it out!
Shell session
$ qpdf --qdf --object-streams=disable export.pdf export.uncompressed.pdf
$ ls -lhA export*.pdf
-rw-r--r--. 1 amos amos 107K Nov 18 20:02 export.pdf
-rw-r--r--. 1 amos amos 398K Nov 18 20:12 export.uncompressed.pdf
Near the end of export.uncompressed.pdf
, we can find this:
%% Original object ID: 11 0
14 0 obj
<<
/Ascent 977
/CapHeight 735
/Descent -272
/Flags 5
/FontBBox [
-776
-505
1276
1188
]
/FontFile2 15 0 R
/FontName /IosevkaNerdFontCompleteM-
/ItalicAngle 0
/StemV 156
/Type /FontDescriptor
>>
endobj
AhAH! A font descriptor! Then follows, with object ID 15 0
, the actual font
data. It's just, again, a deflate-compressed TTF file, which we can extract,
and then inspect:
And as expected, it's missing some glyphs! Our diagram contains numbers from 0
to 8, but not 9, so it's not included. Similarly, we never have the uppercase
letters G, J, K, Q, U, W, and Z, so they're left out. That's why the resulting
.ttf
file is only 54KB, instead of the original 1239KB.
Okay, cool, but... is "headless chrome" truly headless? Does it work without
Wayland / X?
Well, let's find out!
We'll make a Docker container with only salvage
(which is the codebase I chose
to mangle to try out headless_chrome
) and see what happens.
But first, because I don't feel like forwarding ports or running sfz inside the
Docker container, let's make sure salvage itself can serve the static files as
HTTP, so that headless chrome can access them.
We'll use hyper-staticfile for that:
Rust code
use std:: { convert:: Infallible, net:: SocketAddr} ;
use camino:: Utf8PathBuf;
use headless_chrome:: { protocol:: page:: PrintToPdfOptions, Browser, LaunchOptionsBuilder} ;
use hyper:: { service:: make_service_fn, Server} ;
use hyper_staticfile:: Static;
use tokio:: task:: spawn_blocking;
use tracing:: info;
pub async fn export_to_pdf ( ) -> Result < ( ) , Box < dyn std:: error:: Error > > {
let addr = SocketAddr:: from ( ( [ 127 , 0 , 0 , 1 ] , 5000 ) ) ;
let make_svc =
make_service_fn ( |_conn| async { Ok :: < _ , Infallible > ( Static:: new ( "mxgraph-tests" ) ) } ) ;
let server = Server:: bind ( & addr) . serve ( make_svc) ;
let server_task = tokio:: spawn ( server) ;
let chrome_task = spawn_blocking ( || orchestrate_chrome ( ) . unwrap ( ) ) ;
chrome_task. await ?;
server_task. abort ( ) ;
Ok ( ( ) )
}
pub fn orchestrate_chrome ( ) -> Result < ( ) , Box < dyn std:: error:: Error > > {
let browser = Browser:: new ( LaunchOptionsBuilder:: default ( ) . headless ( true ) . build ( ) ?) ?;
let tab = browser. wait_for_initial_tab ( ) ?;
info ! ( "Navigating..." ) ;
tab. navigate_to ( "http://localhost:5000/index.html" ) ?;
tab. wait_until_navigated ( ) ?;
info ! ( "Navigating... done!" ) ;
// same as before
}
Then, build a release binary of salvage
and start a Docker container of Fedora
35 (the Linux distribution I happen to be working from today), with
chromium-browser
installed:
Shell session
$ cargo build --release
(omitted)
$ docker run -v ${PWD}:/workspace -w /workspace -it fedora:35 /bin/bash
And... it works!
Shell session
$ docker cp 083c61919ead:/tmp/export.pdf /tmp/export.docker.pdf
$ xdg-open /tmp/export.docker.pdf
draw.io is built on top of web technologies: it runs in browsers. However, the
PDF export functionality in particular is kinda tied to Chromium. In the web
version, it uses a server to do the conversion.
However, using headless_chrome
, we can control a local copy of Google Chrome
or Chromium, and reproduce the same export flow the desktop version of draw.io
has. But this time, we don't need to have a "display server" like X.org running.