The rest of the fucking owl

This article is part of the Don't shell out! series.

NO! No no no.



Well... yes! But also no. We still shell out to a bunch of tools:

Shell session
$ rg 'Command::new' src/commands/ 126: let variant = if let Ok(output) = run_command(Command::new("wslpath").arg("-m").arg("/")) { src/commands/ 29: Command::new("cavif") src/commands/ 25: Command::new(&self.bin) src/commands/ 25: Command::new("cwebp") src/commands/ 25: Command::new("svgo")
Cool bear's hot tip

rg is ripgrep: think grep, but wicked fast and respecting your .gitignore (and other ignore files) by default.

The good news? There's crates for all of that.

Let's start simple.

Optimizing SVG

svgo is great actually, but it's JavaScript. Which means the prerequisites are a decently modern node.js and npm install -g svgo.

The svgcleaner crate is a pure Rust alternative that gives decent results:

Shell session
$ svgo /tmp/export.svg -o /tmp/export.svgo.svg export.svg: Done in 139 ms! 90.658 KiB - 54.5% = 41.271 KiB $ svgcleaner /tmp/export.svg /tmp/export.svgcleaner.svg Your image is 39.70% smaller now. $ ls -lhA /tmp/export.* -rw-r--r--. 1 amos amos 3.0K Dec 31 18:50 /tmp/export.pdf -rw-r--r--. 1 amos amos 91K Dec 31 20:38 /tmp/export.svg -rw-r--r--. 1 amos amos 55K Dec 31 20:39 /tmp/export.svgcleaner.svg -rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg

But svgcleaner has knobs! We can match svgo's output size if we're willing to compromise on coordinate precisions:

Shell session
$ svgcleaner --paths-coordinates-precision 3 --transforms-precision 3 /tmp/export.svg /tmp/export.svgcleaner-imprecise.svg Your image is 53.12% smaller now. $ ls -lhA /tmp/export.{svgo,svgcleaner-imprecise}.svg -rw-r--r--. 1 amos amos 43K Dec 31 20:42 /tmp/export.svgcleaner-imprecise.svg -rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg

And I don't know about you, but I sure cannot tell the difference:

svgcleaner is a bin+lib package, so we can just.. bring it in!

Shell session
$ cargo add svgcleaner Updating '' index Adding svgcleaner v0.9.5 to dependencies
Rust code
// in `salvage/src/commands/` use color_eyre::eyre; use std::{fmt::Display, path::Path}; use svgcleaner::{ cleaner::{clean_doc, parse_data, write_buffer}, CleaningOptions, }; #[derive(Clone)] pub struct SvgCleaner {} impl SvgCleaner { pub fn new() -> Self { Self {} } pub fn optimize_svg(&self, input: &Path, output: &Path) -> Result<(), eyre::Error> { let data = std::fs::read_to_string(input)?; let parse_opts = Default::default(); let mut doc = parse_data(&data, &parse_opts).map_err(fmt_err)?; let clean_opts = CleaningOptions { paths_coordinates_precision: 3, transforms_precision: 3, ..Default::default() }; let write_opts = Default::default(); clean_doc(&mut doc, &clean_opts, &write_opts).map_err(fmt_err)?; let mut buf = vec![]; write_buffer(&doc, &write_opts, &mut buf); std::fs::write(output, buf)?; Ok(()) } } fn fmt_err<E: Display>(e: E) -> eyre::Report { color_eyre::eyre::eyre!("{}", e) }

Wait, files? What, why?

So it's a drop-in replacement for the svgo thingy. salvage was really only meant to be a glorified command runner, think make but with hardcoded rules and knowledge of WSL2 and stuff.

I definitely want to change the interfaces here: there's no reason why svgcleaner should read its input from disk, but that way it's trivial to change this:

Rust code
#[tracing::instrument] fn process_drawio(&self, env: &Environment) -> Result<(), eyre::Error> { let tmp_dir = mktemp::Temp::new_dir()?; fs::create_dir_all(&tmp_dir)?; let pdf_path = tmp_dir.join("temp.pdf"); let inksvg_path = tmp_dir.join("temp.inkscape.svg"); let safesvg_path = tmp_dir.join("temp.optimized.svg"); // TODO: keep a chrome instance (and http server) running? // TODO: don't spawn a tokio runtime every time here env.commands .drawio_headless .drawio_to_pdf(&self.input_path, &pdf_path)?; // TODO: keep it all in memory, don't write to disk :) env.commands.poppler.pdf_to_svg(&pdf_path, &inksvg_path)?; env.commands .svgo .get()? .optimize_svg(&inksvg_path, &safesvg_path)?; { let mut dst = BufWriter::new(File::create(self.output_path())?); let mut src = BufReader::new(File::open(&safesvg_path)?); std::io::copy(&mut src, &mut dst)?; dst.flush()?; } Ok(()) }

To this:

Rust code
// cut: drawio_headless & poppler invocation env.commands .svgcleaner .optimize_svg(&inksvg_path, &safesvg_path)?;

I can always refactor that later. Our only mission here is to just get rid of invocations!

Replacing imagemagick, cavif and cwebp

The image crate provides PNG decoding, JPEG encoding, and AVIF encoding. Unfortunately, it doesn't do WebP encoding, but the webp crate does!

Let's bring those in, with only the features we need.

Shell session
$ cargo add image --no-default-features --features "png jpeg avif-encoder" Updating '' index Adding image v0.23.14 to dependencies with features: ["png", "jpeg", "avif-encoder"] $ cargo add webp Updating '' index Adding webp v0.2.0 to dependencies

But this time, we'll change the interface a little bit.

Again, salvage is architected so that it can run command-line tools efficiently. So I chose kind of a weird infrastructure: it spins up a bunch of workers:

Rust code
let mut handles = vec![]; let num_workers = num_cpus::get(); for _ in 0..num_workers { let rx = rx.clone(); let env = env.clone(); handles.push(std::thread::spawn(move || { while let Ok(transform) = rx.recv() { transform.process(&env).unwrap(); } })); }

...and they can react to events:

Rust code
/// A transformation from an input file (e.g. `.drawio`) to /// an output file (e.g. `.safe.svg`) pub struct Transform { pub kind: TransformKind, pub workspace: Arc<Workspace>, pub input_path: PathBuf, pub output_ext: &'static str, } #[derive(Debug)] pub struct Workspace { pub db: Arc<RwLock<Database>>, pub source_dir: PathBuf, pub output_dir: PathBuf, } #[derive(Debug)] pub enum TransformKind { /// Render `.drawio` diagram to font-safe svg DrawIO, /// Convert an image to jpeg Jpeg, /// Convert an image to webp Webp, /// Convert an image to avif Avif, }

...that are sent by this function:

Rust code
#[tracing::instrument] fn process_dir(tx: Sender<Transform>, workspace: Arc<Workspace>) -> Result<(), eyre::Error> { let entries = fs::read_dir(&workspace.source_dir)?; for entry in entries { let entry = entry?; let input_path = entry.path(); if let Some(ext) = input_path .extension() .map(|x| x.to_string_lossy().to_string()) { match ext.as_ref() { "drawio" => { tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg")) .unwrap(); } "png" => { tx.send(workspace.make_transform( TransformKind::Jpeg, input_path.clone(), "jpg", )) .unwrap(); tx.send(workspace.make_transform( TransformKind::Webp, input_path.clone(), "webp", )) .unwrap(); tx.send(workspace.make_transform(TransformKind::Avif, input_path, "avif")) .unwrap(); } _ => { /* ignore */ } } } } Ok(()) }

But honestly, we don't need all that noise anymore. So how are going to refactor this?

First off, let's change so that we have a single TransformKind for bitmaps:

Rust code
#[derive(Debug)] pub enum TransformKind { /// Render `.drawio` diagram to font-safe svg DrawIO, /// Convert a PNG image to JPEG, WebP, and AVIF Bitmap, }

We'll send a single TransformKind for pngs:

Rust code
match ext.as_ref() { "drawio" => { tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg")) .unwrap(); } "png" => { tx.send(workspace.make_transform( TransformKind::Bitmap, input_path.clone(), "jpg", )) .unwrap(); } _ => { /* ignore */ } }

And, fuck it, we'll rewrite process_drawio too:

Rust code
#[tracing::instrument] fn process_drawio(&self) -> Result<(), eyre::Error> { let input_bytes = std::fs::read(&self.input_path)?; let pdf_bytes = DrawioHeadless::drawio_to_pdf(input_bytes)?; let svg_bytes = Poppler::pdf_to_svg(&pdf_bytes[..])?; let optimized_svg_bytes = SvgCleaner::optimize_svg(&svg_bytes[..])?; info!(output_path = %self.output_path(), "writing optimized SVG"); std::fs::write(self.output_path(), optimized_svg_bytes)?; Ok(()) } #[tracing::instrument] fn process_bitmap(&self) -> Result<(), eyre::Error> { let img = image::load( BufReader::new(File::open(&self.input_path)?), ImageFormat::Png, )?; let out_path = self.output_path(); let jpeg_path = out_path.with_extension("jpg"); let webp_path = out_path.with_extension("webp"); let avif_path = out_path.with_extension("avif"); // JPEG info!(%jpeg_path, "writing JPEG"); JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45).encode_image(&img)?; // WebP info!(%webp_path, "writing WebP"); std::fs::write( webp_path, &webp::Encoder::from_image(&img) .map_err(|e| eyre!("webp error: {}", e))? .encode(75.0)[..], )?; // AVIF info!(%avif_path, "writing AVIF"); AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 4, 60).write_image( img.as_bytes(), img.width(), img.height(), img.color(), )?; Ok(()) }

There! All in-memory and nice.

Wait! The JPEG/WebP/AVIF thing used to be parallel and now it's not, let's fix that:

Shell session
$ cargo add crossbeam-utils Updating '' index Adding crossbeam-utils v0.8.5 to dependencies
Rust code
#[tracing::instrument] fn process_bitmap(&self) -> Result<(), eyre::Error> { let img = image::load( BufReader::new(File::open(&self.input_path)?), ImageFormat::Png, )?; let out_path = self.output_path(); let jpeg_path = out_path.with_extension("jpg"); let webp_path = out_path.with_extension("webp"); let avif_path = out_path.with_extension("avif"); crossbeam_utils::thread::scope(|s| { let jpeg = s.spawn(|_| { info!(%jpeg_path, "writing JPEG"); JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45) .encode_image(&img)?; Ok::<_, eyre::Report>(()) }); let webp = s.spawn(|_| { info!(%webp_path, "writing WebP"); std::fs::write( webp_path, &webp::Encoder::from_image(&img) .map_err(|e| eyre!("webp error: {}", e))? .encode(75.0)[..], )?; Ok::<_, eyre::Report>(()) }); let avif = s.spawn(|_| { info!(%avif_path, "writing AVIF"); AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 6, 60) .write_image(img.as_bytes(), img.width(), img.height(), img.color())?; Ok::<_, eyre::Report>(()) }); jpeg.join().unwrap()?; webp.join().unwrap()?; avif.join().unwrap()?; Ok::<_, eyre::Report>(()) }) .unwrap()?; Ok(()) }

There! Not the prettiest, but it'll do.

Testing it all

salvage maintains a salvage-db.json file, like so:

{ "input_files": { "/home/amos/bearcove/": "4ce11f85ddc447de", "/home/amos/bearcove/": "a15ecb21b94b7e8e", "/home/amos/bearcove/": "f03d8d2673407a67", "/home/amos/bearcove/": "2f14729992449d6b" } }

If we remove it, it'll simply re-process all those files.

Shell session
$ rm salvage-db.json && salvage . 2021-12-31T21:25:06.956729Z INFO salvage: Workspace: /home/amos/bearcove/ => /home/amos/bearcove/ dont-shell-out/part-2/assets 2021-12-31T21:25:06.957218Z INFO process: salvage: /home/amos/bearcove/ => /home/amos/bearcove/ ntent/series/dont-shell-out/part-2/assets/svg-letter.jpg 2021-12-31T21:25:06.991646Z INFO salvage: writing JPEG jpeg_path=/home/amos/bearcove/ 2021-12-31T21:25:06.991672Z INFO salvage: writing WebP webp_path=/home/amos/bearcove/ 2021-12-31T21:25:06.991753Z INFO salvage: writing AVIF avif_path=/home/amos/bearcove/ 2021-12-31T21:25:07.007929Z INFO rav1e::api::config: CPU Feature Level: AVX2 2021-12-31T21:25:07.007934Z INFO rav1e::api::config: CPU Feature Level: AVX2 2021-12-31T21:25:07.007979Z INFO rav1e::api::internal: Using 56 tiles (7x8) 2021-12-31T21:25:07.007981Z INFO rav1e::api::internal: Using 56 tiles (7x8) (cut)

Let's see what we got!

Shell session
$ ls -lhA svg-letter.* -rw-r--r--. 1 amos amos 805K Dec 31 22:25 svg-letter.avif -rw-r--r--. 1 amos amos 404K Dec 31 22:25 svg-letter.jpg -rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png -rw-r--r--. 1 amos amos 289K Dec 31 22:25 svg-letter.webp

Uhh the .avif is larger than I'd expect. And it was slower than I expected at speed=6 too, so I had to bring down the speed to 4. But what's even more annoying is that...

Shell session
$ avifdec svg-letter.avif /tmp/svg-letter.png Decoding with AV1 codec 'dav1d' (1 worker thread), please wait... ERROR: Failed to decode image: BMFF parsing failed Diagnostics: * Box[meta] does not have a Box[hdlr] as its first child box doesn't seem to be a valid AVIF file.

Let's see now...

Shell session
$ cargo tree -i ravif ravif v0.6.4 └── image v0.23.14 ├── salvage v1.4.0 (/home/amos/bearcove/salvage) └── webp v0.2.0 └── salvage v1.4.0 (/home/amos/bearcove/salvage)

Ah! It's using an old ravif! The latest is 0.8.8. Let's see if it performs better? First we'll remove the avif-encoder feature of image, and then:

Shell session
$ cargo add ravif Updating '' index Adding ravif v0.8.8 to dependencies $ cargo add rgb Updating '' index Adding rgb v0.8.31 to dependencies

And change our code:

Rust code
let avif = s.spawn(|_| { info!(%avif_path, "writing AVIF"); let config = ravif::Config { quality: 50.0, alpha_quality: 50.0, speed: 4, premultiplied_alpha: false, color_space: ravif::ColorSpace::YCbCr, threads: 0, }; let img = img.to_rgba8(); let img = Img::new( img.as_bytes().as_rgba(), img.width() as _, img.height() as _, ); let (avif_bytes, _, _) = ravif::encode_rgba(img, &config).map_err(|e| eyre!("ravif error: {}", e))?; std::fs::write(&avif_path, &avif_bytes)?; info!(%avif_path, "writing AVIF... done!"); Ok::<_, eyre::Report>(()) });

I brought down the speed (which should give better results), and reduced the quality (which should give worse results).

Let's try again!

Shell session
$ rm salvage-db.json && salvage . (cut) $ ls -lhA svg-letter* -rw-r--r--. 1 amos amos 278K Dec 31 22:37 svg-letter.avif -rw-r--r--. 1 amos amos 404K Dec 31 22:37 svg-letter.jpg -rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png -rw-r--r--. 1 amos amos 289K Dec 31 22:37 svg-letter.webp

Better! Much better. "Smaller than webp" is what I aim for.

Shell session
$ avifdec svg-letter.avif /tmp/svg-letter.png Decoding with AV1 codec 'dav1d' (1 worker thread), please wait... Image decoded: svg-letter.avif Image details: * Resolution : 2108x1528 * Bit Depth : 8 * Format : YUV444 * Alpha : Not premultiplied * Range : Full * Color Primaries: 1 * Transfer Char. : 13 * Matrix Coeffs. : 1 * ICC Profile : Absent (0 bytes) * XMP Metadata : Absent (0 bytes) * EXIF Metadata : Absent (0 bytes) * Transformations: None Wrote PNG: /tmp/svg-letter.png

Okay, it's a legit AVIF file this time!

Let's compare those images:

The image above is a PNG, by the way.

futile (my website's software) only makes a <picture> tag with AVIF, WebP and JPEG sources is when it encounters a Markdown image that ends in .jpg. I would never normally use .jpg, so that's the bat signal.

As expected, JPEG is by far the worst: with no transparency and awful artifacts. I wonder if anyone is reading this from a browser that supports neither WebP or AVIF. Maybe Internet Explorer 11?

Next steps

This is where this particular series ends: I did everything I set out to do.

But I'm thinking about the future... now that salvage can run completely headless, and instead of shoving those large .jpg, .webp and .avif files in my Git repository... why not have it run as a service on the same machine?

And cache generated assets on some cloud storage somewhere? That sounds like fun! I'm sure it would make for a nice, short article.

Oh boy. Short, yes. We all know what that means...

That's all for me this year!

Until next time, thank you for reading, and have a happy new year 2022. I think we're all going into it with a little more humility than 2021, and that can only be good.

Take care!

This article was made possible thanks to my patrons: Alexander Payne, Fredrik Østrem, David Barsky, Yufan Lou, Stephen Molyneaux, Barret Rennie, Thomas Corbin, MW, Jacob Cheriathundam, Michael Watzko, Embark Studios, Eugene Bulkin, Marcus Griep, Petar Radosevic, Tool Army, Tully, Santiago Lema, Spencer Gilbert, Jörn Huxhorn, Garrett Ward, DEX, Christian Oudard, Ronen Cohen, Thor Kamphefner, Kamran Khan, Cole Kurkowski, Arjen Laarhoven, Vicente Bosch, Chirag Jain, Ville Mattila, Marie Janssen, Vladyslav Batyrenko, Cameron Clausen, spike grobstein, Jon Gjengset, Paul Marques Mota, Jakub Fijałkowski, Mitchell Hamilton, Brad Luyster, Max von Forell, Jake S, Dimitri Merejkowsky, Chris Biscardi, René Ribaud, Alex Doroshenko, Vincent, Steven McGuire, Chad Birch, Chris Emery, Bob Ippolito, John Van Enk, metabaron, Isak Sunde Singh, Philipp Gniewosz, Mads Johansen, lukvol, Ives van Hoorne, Jan De Landtsheer, Daniel Strittmatter, Evgeniy Dubovskoy, Alex Rudy, Shane Lillie, Romet Tagobert, Douglas Creager, Corey Alexander, Molly Howell, knutwalker, Zachary Dremann, Sebastian Ziebell, Julien Roncaglia, Amber Kowalski, T, queenfartbutt, Paul Kline, Kristoffer Ström, Astrid Bek, Yoh Deadfall, Justin Ossevoort, Tomáš Duda, Jeremy Banks, Rasmus Larsen, Torben Clasen, C J Silverio, Walther, Pete Bevin, Shane Sveller, Clara Schultz, jer, Wonwoo Choi, Hawken Rives, João Veiga, Richard Pringle, Adam Perry, Benjamin Röjder Delnavaz, Matt Jadczak, Jonathan Knapp, Maximilian, Seth Stadick, brianloveswords, Sean Bryant, Ember, Sebastian Zimmer, Makoto Nakashima, Geoff Cant, Geoffroy Couprie, Michael Alyn Miller, o0Ignition0o, Zaki, Raphael Gaschignard, Romain Ruetschi, Ignacio Vergara, Pascal, Jane Lusby, Nicolas Goy, Ted Mielczarek, Aurora.

This article is part 7 of the Don't shell out! series.

Read the next part

If you liked this article, please support my work on Patreon!

Become a Patron