The rest of the fucking owl
From the series
Don't shell out!
NO! No no no.
What?
WE WERE DONE!
Well... yes! But also no. We still shell out to a bunch of tools:
$ rg 'Command::new' src/commands/mod.rs 126: let variant = if let Ok(output) = run_command(Command::new("wslpath").arg("-m").arg("/")) { src/commands/cavif.rs 29: Command::new("cavif") src/commands/imagemagick.rs 25: Command::new(&self.bin) src/commands/cwebp.rs 25: Command::new("cwebp") src/commands/svgo.rs 25: Command::new("svgo")
rg
is ripgrep: think grep, but wicked fast and
respecting your .gitignore
(and other ignore files) by default.
The good news? There's crates for all of that.
Let's start simple.
Optimizing SVG
svgo is great actually, but it's JavaScript.
Which means the prerequisites are a decently modern node.js and npm install -g svgo
.
The svgcleaner crate is a pure Rust alternative that gives decent results:
$ svgo /tmp/export.svg -o /tmp/export.svgo.svg export.svg: Done in 139 ms! 90.658 KiB - 54.5% = 41.271 KiB $ svgcleaner /tmp/export.svg /tmp/export.svgcleaner.svg Your image is 39.70% smaller now. $ ls -lhA /tmp/export.* -rw-r--r--. 1 amos amos 3.0K Dec 31 18:50 /tmp/export.pdf -rw-r--r--. 1 amos amos 91K Dec 31 20:38 /tmp/export.svg -rw-r--r--. 1 amos amos 55K Dec 31 20:39 /tmp/export.svgcleaner.svg -rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg
But svgcleaner
has knobs! We can match svgo
's output size if we're willing
to compromise on coordinate precisions:
$ svgcleaner --paths-coordinates-precision 3 --transforms-precision 3 /tmp/export.svg /tmp/export.svgcleaner-imprecise.svg Your image is 53.12% smaller now. $ ls -lhA /tmp/export.{svgo,svgcleaner-imprecise}.svg -rw-r--r--. 1 amos amos 43K Dec 31 20:42 /tmp/export.svgcleaner-imprecise.svg -rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg
And I don't know about you, but I sure cannot tell the difference:
svgcleaner
is a bin+lib
package, so we can just.. bring it in!
$ cargo add svgcleaner Updating 'https://github.com/rust-lang/crates.io-index' index Adding svgcleaner v0.9.5 to dependencies
// in `salvage/src/commands/svgcleaner.rs` use color_eyre::eyre; use std::{fmt::Display, path::Path}; use svgcleaner::{ cleaner::{clean_doc, parse_data, write_buffer}, CleaningOptions, }; #[derive(Clone)] pub struct SvgCleaner {} impl SvgCleaner { pub fn new() -> Self { Self {} } pub fn optimize_svg(&self, input: &Path, output: &Path) -> Result<(), eyre::Error> { let data = std::fs::read_to_string(input)?; let parse_opts = Default::default(); let mut doc = parse_data(&data, &parse_opts).map_err(fmt_err)?; let clean_opts = CleaningOptions { paths_coordinates_precision: 3, transforms_precision: 3, ..Default::default() }; let write_opts = Default::default(); clean_doc(&mut doc, &clean_opts, &write_opts).map_err(fmt_err)?; let mut buf = vec![]; write_buffer(&doc, &write_opts, &mut buf); std::fs::write(output, buf)?; Ok(()) } } fn fmt_err<E: Display>(e: E) -> eyre::Report { color_eyre::eyre::eyre!("{}", e) }
Wait, files? What, why?
So it's a drop-in replacement for the svgo thingy. salvage
was really only
meant to be a glorified command runner, think make
but with hardcoded rules
and knowledge of WSL2 and stuff.
I definitely want to change the interfaces here: there's no reason why
svgcleaner
should read its input from disk, but that way it's trivial to
change this:
#[tracing::instrument] fn process_drawio(&self, env: &Environment) -> Result<(), eyre::Error> { let tmp_dir = mktemp::Temp::new_dir()?; fs::create_dir_all(&tmp_dir)?; let pdf_path = tmp_dir.join("temp.pdf"); let inksvg_path = tmp_dir.join("temp.inkscape.svg"); let safesvg_path = tmp_dir.join("temp.optimized.svg"); // TODO: keep a chrome instance (and http server) running? // TODO: don't spawn a tokio runtime every time here env.commands .drawio_headless .drawio_to_pdf(&self.input_path, &pdf_path)?; // TODO: keep it all in memory, don't write to disk :) env.commands.poppler.pdf_to_svg(&pdf_path, &inksvg_path)?; env.commands .svgo .get()? .optimize_svg(&inksvg_path, &safesvg_path)?; { let mut dst = BufWriter::new(File::create(self.output_path())?); let mut src = BufReader::new(File::open(&safesvg_path)?); std::io::copy(&mut src, &mut dst)?; dst.flush()?; } Ok(()) }
To this:
// cut: drawio_headless & poppler invocation env.commands .svgcleaner .optimize_svg(&inksvg_path, &safesvg_path)?;
I can always refactor that later. Our only mission here is to just get rid of invocations!
Replacing imagemagick, cavif and cwebp
The image crate provides PNG decoding, JPEG encoding, and AVIF encoding. Unfortunately, it doesn't do WebP encoding, but the webp crate does!
Let's bring those in, with only the features we need.
$ cargo add image --no-default-features --features "png jpeg avif-encoder" Updating 'https://github.com/rust-lang/crates.io-index' index Adding image v0.23.14 to dependencies with features: ["png", "jpeg", "avif-encoder"] $ cargo add webp Updating 'https://github.com/rust-lang/crates.io-index' index Adding webp v0.2.0 to dependencies
But this time, we'll change the interface a little bit.
Again, salvage
is architected so that it can run command-line tools
efficiently. So I chose kind of a weird infrastructure: it spins up a bunch of
workers:
let mut handles = vec![]; let num_workers = num_cpus::get(); for _ in 0..num_workers { let rx = rx.clone(); let env = env.clone(); handles.push(std::thread::spawn(move || { while let Ok(transform) = rx.recv() { transform.process(&env).unwrap(); } })); }
...and they can react to events:
/// A transformation from an input file (e.g. `.drawio`) to /// an output file (e.g. `.safe.svg`) pub struct Transform { pub kind: TransformKind, pub workspace: Arc<Workspace>, pub input_path: PathBuf, pub output_ext: &'static str, } #[derive(Debug)] pub struct Workspace { pub db: Arc<RwLock<Database>>, pub source_dir: PathBuf, pub output_dir: PathBuf, } #[derive(Debug)] pub enum TransformKind { /// Render `.drawio` diagram to font-safe svg DrawIO, /// Convert an image to jpeg Jpeg, /// Convert an image to webp Webp, /// Convert an image to avif Avif, }
...that are sent by this function:
#[tracing::instrument] fn process_dir(tx: Sender<Transform>, workspace: Arc<Workspace>) -> Result<(), eyre::Error> { let entries = fs::read_dir(&workspace.source_dir)?; for entry in entries { let entry = entry?; let input_path = entry.path(); if let Some(ext) = input_path .extension() .map(|x| x.to_string_lossy().to_string()) { match ext.as_ref() { "drawio" => { tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg")) .unwrap(); } "png" => { tx.send(workspace.make_transform( TransformKind::Jpeg, input_path.clone(), "jpg", )) .unwrap(); tx.send(workspace.make_transform( TransformKind::Webp, input_path.clone(), "webp", )) .unwrap(); tx.send(workspace.make_transform(TransformKind::Avif, input_path, "avif")) .unwrap(); } _ => { /* ignore */ } } } } Ok(()) }
But honestly, we don't need all that noise anymore. So how are going to refactor this?
First off, let's change so that we have a single TransformKind
for bitmaps:
#[derive(Debug)] pub enum TransformKind { /// Render `.drawio` diagram to font-safe svg DrawIO, /// Convert a PNG image to JPEG, WebP, and AVIF Bitmap, }
We'll send a single TransformKind
for pngs:
match ext.as_ref() { "drawio" => { tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg")) .unwrap(); } "png" => { tx.send(workspace.make_transform( TransformKind::Bitmap, input_path.clone(), "jpg", )) .unwrap(); } _ => { /* ignore */ } }
And, fuck it, we'll rewrite process_drawio
too:
#[tracing::instrument] fn process_drawio(&self) -> Result<(), eyre::Error> { let input_bytes = std::fs::read(&self.input_path)?; let pdf_bytes = DrawioHeadless::drawio_to_pdf(input_bytes)?; let svg_bytes = Poppler::pdf_to_svg(&pdf_bytes[..])?; let optimized_svg_bytes = SvgCleaner::optimize_svg(&svg_bytes[..])?; info!(output_path = %self.output_path(), "writing optimized SVG"); std::fs::write(self.output_path(), optimized_svg_bytes)?; Ok(()) } #[tracing::instrument] fn process_bitmap(&self) -> Result<(), eyre::Error> { let img = image::load( BufReader::new(File::open(&self.input_path)?), ImageFormat::Png, )?; let out_path = self.output_path(); let jpeg_path = out_path.with_extension("jpg"); let webp_path = out_path.with_extension("webp"); let avif_path = out_path.with_extension("avif"); // JPEG info!(%jpeg_path, "writing JPEG"); JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45).encode_image(&img)?; // WebP info!(%webp_path, "writing WebP"); std::fs::write( webp_path, &webp::Encoder::from_image(&img) .map_err(|e| eyre!("webp error: {}", e))? .encode(75.0)[..], )?; // AVIF info!(%avif_path, "writing AVIF"); AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 4, 60).write_image( img.as_bytes(), img.width(), img.height(), img.color(), )?; Ok(()) }
There! All in-memory and nice.
Wait! The JPEG/WebP/AVIF thing used to be parallel and now it's not, let's fix that:
$ cargo add crossbeam-utils Updating 'https://github.com/rust-lang/crates.io-index' index Adding crossbeam-utils v0.8.5 to dependencies
#[tracing::instrument] fn process_bitmap(&self) -> Result<(), eyre::Error> { let img = image::load( BufReader::new(File::open(&self.input_path)?), ImageFormat::Png, )?; let out_path = self.output_path(); let jpeg_path = out_path.with_extension("jpg"); let webp_path = out_path.with_extension("webp"); let avif_path = out_path.with_extension("avif"); crossbeam_utils::thread::scope(|s| { let jpeg = s.spawn(|_| { info!(%jpeg_path, "writing JPEG"); JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45) .encode_image(&img)?; Ok::<_, eyre::Report>(()) }); let webp = s.spawn(|_| { info!(%webp_path, "writing WebP"); std::fs::write( webp_path, &webp::Encoder::from_image(&img) .map_err(|e| eyre!("webp error: {}", e))? .encode(75.0)[..], )?; Ok::<_, eyre::Report>(()) }); let avif = s.spawn(|_| { info!(%avif_path, "writing AVIF"); AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 6, 60) .write_image(img.as_bytes(), img.width(), img.height(), img.color())?; Ok::<_, eyre::Report>(()) }); jpeg.join().unwrap()?; webp.join().unwrap()?; avif.join().unwrap()?; Ok::<_, eyre::Report>(()) }) .unwrap()?; Ok(()) }
There! Not the prettiest, but it'll do.
Testing it all
salvage maintains a salvage-db.json
file, like so:
{ "input_files": { "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.png": "4ce11f85ddc447de", "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-use-tag.png": "a15ecb21b94b7e8e", "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-word.png": "f03d8d2673407a67", "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/true-bold.png": "2f14729992449d6b" } }
If we remove it, it'll simply re-process all those files.
$ rm salvage-db.json && salvage . 2021-12-31T21:25:06.956729Z INFO salvage: Workspace: /home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets => /home/amos/bearcove/fasterthanli.me/content/series/ dont-shell-out/part-2/assets 2021-12-31T21:25:06.957218Z INFO process: salvage: /home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.png => /home/amos/bearcove/fasterthanli.me/co ntent/series/dont-shell-out/part-2/assets/svg-letter.jpg 2021-12-31T21:25:06.991646Z INFO salvage: writing JPEG jpeg_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.jpg 2021-12-31T21:25:06.991672Z INFO salvage: writing WebP webp_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.webp 2021-12-31T21:25:06.991753Z INFO salvage: writing AVIF avif_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.avif 2021-12-31T21:25:07.007929Z INFO rav1e::api::config: CPU Feature Level: AVX2 2021-12-31T21:25:07.007934Z INFO rav1e::api::config: CPU Feature Level: AVX2 2021-12-31T21:25:07.007979Z INFO rav1e::api::internal: Using 56 tiles (7x8) 2021-12-31T21:25:07.007981Z INFO rav1e::api::internal: Using 56 tiles (7x8) (cut)
Let's see what we got!
$ ls -lhA svg-letter.* -rw-r--r--. 1 amos amos 805K Dec 31 22:25 svg-letter.avif -rw-r--r--. 1 amos amos 404K Dec 31 22:25 svg-letter.jpg -rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png -rw-r--r--. 1 amos amos 289K Dec 31 22:25 svg-letter.webp
Uhh the .avif
is larger than I'd expect. And it was slower than I expected at
speed=6
too, so I had to bring down the speed to 4
. But what's even more
annoying is that...
$ avifdec svg-letter.avif /tmp/svg-letter.png Decoding with AV1 codec 'dav1d' (1 worker thread), please wait... ERROR: Failed to decode image: BMFF parsing failed Diagnostics: * Box[meta] does not have a Box[hdlr] as its first child box
...it doesn't seem to be a valid AVIF file.
Let's see now...
$ cargo tree -i ravif ravif v0.6.4 └── image v0.23.14 ├── salvage v1.4.0 (/home/amos/bearcove/salvage) └── webp v0.2.0 └── salvage v1.4.0 (/home/amos/bearcove/salvage)
Ah! It's using an old ravif
! The latest is 0.8.8
. Let's see if it performs
better? First we'll remove the avif-encoder
feature of image
, and then:
$ cargo add ravif Updating 'https://github.com/rust-lang/crates.io-index' index Adding ravif v0.8.8 to dependencies $ cargo add rgb Updating 'https://github.com/rust-lang/crates.io-index' index Adding rgb v0.8.31 to dependencies
And change our code:
let avif = s.spawn(|_| { info!(%avif_path, "writing AVIF"); let config = ravif::Config { quality: 50.0, alpha_quality: 50.0, speed: 4, premultiplied_alpha: false, color_space: ravif::ColorSpace::YCbCr, threads: 0, }; let img = img.to_rgba8(); let img = Img::new( img.as_bytes().as_rgba(), img.width() as _, img.height() as _, ); let (avif_bytes, _, _) = ravif::encode_rgba(img, &config).map_err(|e| eyre!("ravif error: {}", e))?; std::fs::write(&avif_path, &avif_bytes)?; info!(%avif_path, "writing AVIF... done!"); Ok::<_, eyre::Report>(()) });
I brought down the speed (which should give better results), and reduced the quality (which should give worse results).
Let's try again!
$ rm salvage-db.json && salvage . (cut) $ ls -lhA svg-letter* -rw-r--r--. 1 amos amos 278K Dec 31 22:37 svg-letter.avif -rw-r--r--. 1 amos amos 404K Dec 31 22:37 svg-letter.jpg -rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png -rw-r--r--. 1 amos amos 289K Dec 31 22:37 svg-letter.webp
Better! Much better. "Smaller than webp" is what I aim for.
$ avifdec svg-letter.avif /tmp/svg-letter.png Decoding with AV1 codec 'dav1d' (1 worker thread), please wait... Image decoded: svg-letter.avif Image details: * Resolution : 2108x1528 * Bit Depth : 8 * Format : YUV444 * Alpha : Not premultiplied * Range : Full * Color Primaries: 1 * Transfer Char. : 13 * Matrix Coeffs. : 1 * ICC Profile : Absent (0 bytes) * XMP Metadata : Absent (0 bytes) * EXIF Metadata : Absent (0 bytes) * Transformations: None Wrote PNG: /tmp/svg-letter.png
Okay, it's a legit AVIF file this time!
Let's compare those images:
The image above is a PNG, by the way.
futile
(my website's software) only makes a <picture>
tag with AVIF, WebP
and JPEG sources is when it encounters a Markdown image that ends in .jpg
. I
would never normally use .jpg
, so that's the bat signal.
As expected, JPEG is by far the worst: with no transparency and awful artifacts. I wonder if anyone is reading this from a browser that supports neither WebP or AVIF. Maybe Internet Explorer 11?
Next steps
This is where this particular series ends: I did everything I set out to do.
But I'm thinking about the future... now that salvage
can run completely
headless, and instead of shoving those large .jpg, .webp and .avif files in my
Git repository... why not have it run as a service on the same machine?
And cache generated assets on some cloud storage somewhere? That sounds like fun! I'm sure it would make for a nice, short article.
Oh boy. Short, yes. We all know what that means...
That's all for me this year!
Until next time, thank you for reading, and have a happy new year 2022. I think we're all going into it with a little more humility than 2021, and that can only be good.
Take care!
This article is part 7 of the Don't shell out! series.
If you liked what you saw, please support my work!