Thanks to my sponsors: Herman J. Radtke III, Marky Mark, Garret Kelly, Max von Forell, Richard Stephens, Mark Old, Urs Metz, Andronik, Michal Hošna, std__mpa, Guillaume Demonet, Zaki, Chris Emery, Max Heaton, Egor Ternovoi, Blake Johnson, Romain Ruetschi, Leigh Oliver, traxys, Johnathan Pagnutti and 255 more
The rest of the fucking owl
👋 This page was last updated ~3 years ago. Just so you know.
NO! No no no.
What?
WE WERE DONE!
Well... yes! But also no. We still shell out to a bunch of tools:
$ rg 'Command::new'
src/commands/mod.rs
126: let variant = if let Ok(output) = run_command(Command::new("wslpath").arg("-m").arg("/")) {
src/commands/cavif.rs
29: Command::new("cavif")
src/commands/imagemagick.rs
25: Command::new(&self.bin)
src/commands/cwebp.rs
25: Command::new("cwebp")
src/commands/svgo.rs
25: Command::new("svgo")
Cool bear's hot tip
rg
is ripgrep: think grep, but wicked fast and
respecting your .gitignore
(and other ignore files) by default.
The good news? There's crates for all of that.
Let's start simple.
Optimizing SVG
svgo is great actually, but it's JavaScript.
Which means the prerequisites are a decently modern node.js and npm install -g svgo
.
The svgcleaner crate is a pure Rust alternative that gives decent results:
$ svgo /tmp/export.svg -o /tmp/export.svgo.svg
export.svg:
Done in 139 ms!
90.658 KiB - 54.5% = 41.271 KiB
$ svgcleaner /tmp/export.svg /tmp/export.svgcleaner.svg
Your image is 39.70% smaller now.
$ ls -lhA /tmp/export.*
-rw-r--r--. 1 amos amos 3.0K Dec 31 18:50 /tmp/export.pdf
-rw-r--r--. 1 amos amos 91K Dec 31 20:38 /tmp/export.svg
-rw-r--r--. 1 amos amos 55K Dec 31 20:39 /tmp/export.svgcleaner.svg
-rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg
But svgcleaner
has knobs! We can match svgo
's output size if we're willing
to compromise on coordinate precisions:
$ svgcleaner --paths-coordinates-precision 3 --transforms-precision 3 /tmp/export.svg /tmp/export.svgcleaner-imprecise.svg
Your image is 53.12% smaller now.
$ ls -lhA /tmp/export.{svgo,svgcleaner-imprecise}.svg
-rw-r--r--. 1 amos amos 43K Dec 31 20:42 /tmp/export.svgcleaner-imprecise.svg
-rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg
And I don't know about you, but I sure cannot tell the difference:
svgcleaner
is a bin+lib
package, so we can just.. bring it in!
$ cargo add svgcleaner
Updating 'https://github.com/rust-lang/crates.io-index' index
Adding svgcleaner v0.9.5 to dependencies
// in `salvage/src/commands/svgcleaner.rs`
use color_eyre::eyre;
use std::{fmt::Display, path::Path};
use svgcleaner::{
cleaner::{clean_doc, parse_data, write_buffer},
CleaningOptions,
};
#[derive(Clone)]
pub struct SvgCleaner {}
impl SvgCleaner {
pub fn new() -> Self {
Self {}
}
pub fn optimize_svg(&self, input: &Path, output: &Path) -> Result<(), eyre::Error> {
let data = std::fs::read_to_string(input)?;
let parse_opts = Default::default();
let mut doc = parse_data(&data, &parse_opts).map_err(fmt_err)?;
let clean_opts = CleaningOptions {
paths_coordinates_precision: 3,
transforms_precision: 3,
..Default::default()
};
let write_opts = Default::default();
clean_doc(&mut doc, &clean_opts, &write_opts).map_err(fmt_err)?;
let mut buf = vec![];
write_buffer(&doc, &write_opts, &mut buf);
std::fs::write(output, buf)?;
Ok(())
}
}
fn fmt_err<E: Display>(e: E) -> eyre::Report {
color_eyre::eyre::eyre!("{}", e)
}
Wait, files? What, why?
So it's a drop-in replacement for the svgo thingy. salvage
was really only
meant to be a glorified command runner, think make
but with hardcoded rules
and knowledge of WSL2 and stuff.
I definitely want to change the interfaces here: there's no reason why
svgcleaner
should read its input from disk, but that way it's trivial to
change this:
#[tracing::instrument]
fn process_drawio(&self, env: &Environment) -> Result<(), eyre::Error> {
let tmp_dir = mktemp::Temp::new_dir()?;
fs::create_dir_all(&tmp_dir)?;
let pdf_path = tmp_dir.join("temp.pdf");
let inksvg_path = tmp_dir.join("temp.inkscape.svg");
let safesvg_path = tmp_dir.join("temp.optimized.svg");
// TODO: keep a chrome instance (and http server) running?
// TODO: don't spawn a tokio runtime every time here
env.commands
.drawio_headless
.drawio_to_pdf(&self.input_path, &pdf_path)?;
// TODO: keep it all in memory, don't write to disk :)
env.commands.poppler.pdf_to_svg(&pdf_path, &inksvg_path)?;
env.commands
.svgo
.get()?
.optimize_svg(&inksvg_path, &safesvg_path)?;
{
let mut dst = BufWriter::new(File::create(self.output_path())?);
let mut src = BufReader::new(File::open(&safesvg_path)?);
std::io::copy(&mut src, &mut dst)?;
dst.flush()?;
}
Ok(())
}
To this:
// cut: drawio_headless & poppler invocation
env.commands
.svgcleaner
.optimize_svg(&inksvg_path, &safesvg_path)?;
I can always refactor that later. Our only mission here is to just get rid of invocations!
Replacing imagemagick, cavif and cwebp
The image crate provides PNG decoding, JPEG encoding, and AVIF encoding. Unfortunately, it doesn't do WebP encoding, but the webp crate does!
Let's bring those in, with only the features we need.
$ cargo add image --no-default-features --features "png jpeg avif-encoder"
Updating 'https://github.com/rust-lang/crates.io-index' index
Adding image v0.23.14 to dependencies with features: ["png", "jpeg", "avif-encoder"]
$ cargo add webp
Updating 'https://github.com/rust-lang/crates.io-index' index
Adding webp v0.2.0 to dependencies
But this time, we'll change the interface a little bit.
Again, salvage
is architected so that it can run command-line tools
efficiently. So I chose kind of a weird infrastructure: it spins up a bunch of
workers:
let mut handles = vec![];
let num_workers = num_cpus::get();
for _ in 0..num_workers {
let rx = rx.clone();
let env = env.clone();
handles.push(std::thread::spawn(move || {
while let Ok(transform) = rx.recv() {
transform.process(&env).unwrap();
}
}));
}
...and they can react to events:
/// A transformation from an input file (e.g. `.drawio`) to
/// an output file (e.g. `.safe.svg`)
pub struct Transform {
pub kind: TransformKind,
pub workspace: Arc<Workspace>,
pub input_path: PathBuf,
pub output_ext: &'static str,
}
#[derive(Debug)]
pub struct Workspace {
pub db: Arc<RwLock<Database>>,
pub source_dir: PathBuf,
pub output_dir: PathBuf,
}
#[derive(Debug)]
pub enum TransformKind {
/// Render `.drawio` diagram to font-safe svg
DrawIO,
/// Convert an image to jpeg
Jpeg,
/// Convert an image to webp
Webp,
/// Convert an image to avif
Avif,
}
...that are sent by this function:
#[tracing::instrument]
fn process_dir(tx: Sender<Transform>, workspace: Arc<Workspace>) -> Result<(), eyre::Error> {
let entries = fs::read_dir(&workspace.source_dir)?;
for entry in entries {
let entry = entry?;
let input_path = entry.path();
if let Some(ext) = input_path
.extension()
.map(|x| x.to_string_lossy().to_string())
{
match ext.as_ref() {
"drawio" => {
tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg"))
.unwrap();
}
"png" => {
tx.send(workspace.make_transform(
TransformKind::Jpeg,
input_path.clone(),
"jpg",
))
.unwrap();
tx.send(workspace.make_transform(
TransformKind::Webp,
input_path.clone(),
"webp",
))
.unwrap();
tx.send(workspace.make_transform(TransformKind::Avif, input_path, "avif"))
.unwrap();
}
_ => { /* ignore */ }
}
}
}
Ok(())
}
But honestly, we don't need all that noise anymore. So how are going to refactor this?
First off, let's change so that we have a single TransformKind
for bitmaps:
#[derive(Debug)]
pub enum TransformKind {
/// Render `.drawio` diagram to font-safe svg
DrawIO,
/// Convert a PNG image to JPEG, WebP, and AVIF
Bitmap,
}
We'll send a single TransformKind
for pngs:
match ext.as_ref() {
"drawio" => {
tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg"))
.unwrap();
}
"png" => {
tx.send(workspace.make_transform(
TransformKind::Bitmap,
input_path.clone(),
"jpg",
))
.unwrap();
}
_ => { /* ignore */ }
}
And, fuck it, we'll rewrite process_drawio
too:
#[tracing::instrument]
fn process_drawio(&self) -> Result<(), eyre::Error> {
let input_bytes = std::fs::read(&self.input_path)?;
let pdf_bytes = DrawioHeadless::drawio_to_pdf(input_bytes)?;
let svg_bytes = Poppler::pdf_to_svg(&pdf_bytes[..])?;
let optimized_svg_bytes = SvgCleaner::optimize_svg(&svg_bytes[..])?;
info!(output_path = %self.output_path(), "writing optimized SVG");
std::fs::write(self.output_path(), optimized_svg_bytes)?;
Ok(())
}
#[tracing::instrument]
fn process_bitmap(&self) -> Result<(), eyre::Error> {
let img = image::load(
BufReader::new(File::open(&self.input_path)?),
ImageFormat::Png,
)?;
let out_path = self.output_path();
let jpeg_path = out_path.with_extension("jpg");
let webp_path = out_path.with_extension("webp");
let avif_path = out_path.with_extension("avif");
// JPEG
info!(%jpeg_path, "writing JPEG");
JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45).encode_image(&img)?;
// WebP
info!(%webp_path, "writing WebP");
std::fs::write(
webp_path,
&webp::Encoder::from_image(&img)
.map_err(|e| eyre!("webp error: {}", e))?
.encode(75.0)[..],
)?;
// AVIF
info!(%avif_path, "writing AVIF");
AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 4, 60).write_image(
img.as_bytes(),
img.width(),
img.height(),
img.color(),
)?;
Ok(())
}
There! All in-memory and nice.
Wait! The JPEG/WebP/AVIF thing used to be parallel and now it's not, let's fix that:
$ cargo add crossbeam-utils
Updating 'https://github.com/rust-lang/crates.io-index' index
Adding crossbeam-utils v0.8.5 to dependencies
#[tracing::instrument]
fn process_bitmap(&self) -> Result<(), eyre::Error> {
let img = image::load(
BufReader::new(File::open(&self.input_path)?),
ImageFormat::Png,
)?;
let out_path = self.output_path();
let jpeg_path = out_path.with_extension("jpg");
let webp_path = out_path.with_extension("webp");
let avif_path = out_path.with_extension("avif");
crossbeam_utils::thread::scope(|s| {
let jpeg = s.spawn(|_| {
info!(%jpeg_path, "writing JPEG");
JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45)
.encode_image(&img)?;
Ok::<_, eyre::Report>(())
});
let webp = s.spawn(|_| {
info!(%webp_path, "writing WebP");
std::fs::write(
webp_path,
&webp::Encoder::from_image(&img)
.map_err(|e| eyre!("webp error: {}", e))?
.encode(75.0)[..],
)?;
Ok::<_, eyre::Report>(())
});
let avif =
s.spawn(|_| {
info!(%avif_path, "writing AVIF");
AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 6, 60)
.write_image(img.as_bytes(), img.width(), img.height(), img.color())?;
Ok::<_, eyre::Report>(())
});
jpeg.join().unwrap()?;
webp.join().unwrap()?;
avif.join().unwrap()?;
Ok::<_, eyre::Report>(())
})
.unwrap()?;
Ok(())
}
There! Not the prettiest, but it'll do.
Testing it all
salvage maintains a salvage-db.json
file, like so:
{
"input_files": {
"/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.png": "4ce11f85ddc447de",
"/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-use-tag.png": "a15ecb21b94b7e8e",
"/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-word.png": "f03d8d2673407a67",
"/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/true-bold.png": "2f14729992449d6b"
}
}
If we remove it, it'll simply re-process all those files.
$ rm salvage-db.json && salvage .
2021-12-31T21:25:06.956729Z INFO salvage: Workspace: /home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets => /home/amos/bearcove/fasterthanli.me/content/series/
dont-shell-out/part-2/assets
2021-12-31T21:25:06.957218Z INFO process: salvage: /home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.png => /home/amos/bearcove/fasterthanli.me/co
ntent/series/dont-shell-out/part-2/assets/svg-letter.jpg
2021-12-31T21:25:06.991646Z INFO salvage: writing JPEG jpeg_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.jpg
2021-12-31T21:25:06.991672Z INFO salvage: writing WebP webp_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.webp
2021-12-31T21:25:06.991753Z INFO salvage: writing AVIF avif_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.avif
2021-12-31T21:25:07.007929Z INFO rav1e::api::config: CPU Feature Level: AVX2
2021-12-31T21:25:07.007934Z INFO rav1e::api::config: CPU Feature Level: AVX2
2021-12-31T21:25:07.007979Z INFO rav1e::api::internal: Using 56 tiles (7x8)
2021-12-31T21:25:07.007981Z INFO rav1e::api::internal: Using 56 tiles (7x8)
(cut)
Let's see what we got!
$ ls -lhA svg-letter.*
-rw-r--r--. 1 amos amos 805K Dec 31 22:25 svg-letter.avif
-rw-r--r--. 1 amos amos 404K Dec 31 22:25 svg-letter.jpg
-rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png
-rw-r--r--. 1 amos amos 289K Dec 31 22:25 svg-letter.webp
Uhh the .avif
is larger than I'd expect. And it was slower than I expected at
speed=6
too, so I had to bring down the speed to 4
. But what's even more
annoying is that...
$ avifdec svg-letter.avif /tmp/svg-letter.png
Decoding with AV1 codec 'dav1d' (1 worker thread), please wait...
ERROR: Failed to decode image: BMFF parsing failed
Diagnostics:
* Box[meta] does not have a Box[hdlr] as its first child box
...it doesn't seem to be a valid AVIF file.
Let's see now...
$ cargo tree -i ravif
ravif v0.6.4
└── image v0.23.14
├── salvage v1.4.0 (/home/amos/bearcove/salvage)
└── webp v0.2.0
└── salvage v1.4.0 (/home/amos/bearcove/salvage)
Ah! It's using an old ravif
! The latest is 0.8.8
. Let's see if it performs
better? First we'll remove the avif-encoder
feature of image
, and then:
$ cargo add ravif
Updating 'https://github.com/rust-lang/crates.io-index' index
Adding ravif v0.8.8 to dependencies
$ cargo add rgb
Updating 'https://github.com/rust-lang/crates.io-index' index
Adding rgb v0.8.31 to dependencies
And change our code:
let avif = s.spawn(|_| {
info!(%avif_path, "writing AVIF");
let config = ravif::Config {
quality: 50.0,
alpha_quality: 50.0,
speed: 4,
premultiplied_alpha: false,
color_space: ravif::ColorSpace::YCbCr,
threads: 0,
};
let img = img.to_rgba8();
let img = Img::new(
img.as_bytes().as_rgba(),
img.width() as _,
img.height() as _,
);
let (avif_bytes, _, _) =
ravif::encode_rgba(img, &config).map_err(|e| eyre!("ravif error: {}", e))?;
std::fs::write(&avif_path, &avif_bytes)?;
info!(%avif_path, "writing AVIF... done!");
Ok::<_, eyre::Report>(())
});
I brought down the speed (which should give better results), and reduced the quality (which should give worse results).
Let's try again!
$ rm salvage-db.json && salvage .
(cut)
$ ls -lhA svg-letter*
-rw-r--r--. 1 amos amos 278K Dec 31 22:37 svg-letter.avif
-rw-r--r--. 1 amos amos 404K Dec 31 22:37 svg-letter.jpg
-rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png
-rw-r--r--. 1 amos amos 289K Dec 31 22:37 svg-letter.webp
Better! Much better. "Smaller than webp" is what I aim for.
$ avifdec svg-letter.avif /tmp/svg-letter.png
Decoding with AV1 codec 'dav1d' (1 worker thread), please wait...
Image decoded: svg-letter.avif
Image details:
* Resolution : 2108x1528
* Bit Depth : 8
* Format : YUV444
* Alpha : Not premultiplied
* Range : Full
* Color Primaries: 1
* Transfer Char. : 13
* Matrix Coeffs. : 1
* ICC Profile : Absent (0 bytes)
* XMP Metadata : Absent (0 bytes)
* EXIF Metadata : Absent (0 bytes)
* Transformations: None
Wrote PNG: /tmp/svg-letter.png
Okay, it's a legit AVIF file this time!
Let's compare those images:
The image above is a PNG, by the way.
futile
(my website's software) only makes a <picture>
tag with AVIF, WebP
and JPEG sources is when it encounters a Markdown image that ends in .jpg
. I
would never normally use .jpg
, so that's the bat signal.
As expected, JPEG is by far the worst: with no transparency and awful artifacts. I wonder if anyone is reading this from a browser that supports neither WebP or AVIF. Maybe Internet Explorer 11?
Next steps
This is where this particular series ends: I did everything I set out to do.
But I'm thinking about the future... now that salvage
can run completely
headless, and instead of shoving those large .jpg, .webp and .avif files in my
Git repository... why not have it run as a service on the same machine?
And cache generated assets on some cloud storage somewhere? That sounds like fun! I'm sure it would make for a nice, short article.
Oh boy. Short, yes. We all know what that means...
That's all for me this year!
Until next time, thank you for reading, and have a happy new year 2022. I think we're all going into it with a little more humility than 2021, and that can only be good.
Take care!
Here's another article just for you:
Aiming for correctness with types
The Nature weekly journal of science was first published in 1869. And after one and a half century, it has finally completed one cycle of carcinization, by publishing an article about the Rust programming language.
It's a really good article.
What I liked about this article is that it didn't just talk about performance, or even just memory safety - it also talked about correctness.