Home
Log in

The rest of the fucking owl
From the series Don't shell out!

NO! No no no.

What?

WE WERE DONE!

Well... yes! But also no. We still shell out to a bunch of tools:

Shell session
$ rg 'Command::new'
src/commands/mod.rs
126:        let variant = if let Ok(output) = run_command(Command::new("wslpath").arg("-m").arg("/")) {

src/commands/cavif.rs
29:            Command::new("cavif")

src/commands/imagemagick.rs
25:            Command::new(&self.bin)

src/commands/cwebp.rs
25:            Command::new("cwebp")

src/commands/svgo.rs
25:            Command::new("svgo")
Cool bear's hot tip

rg is ripgrep: think grep, but wicked fast and respecting your .gitignore (and other ignore files) by default.

The good news? There's crates for all of that.

Let's start simple.

Optimizing SVG

svgo is great actually, but it's JavaScript. Which means the prerequisites are a decently modern node.js and npm install -g svgo.

The svgcleaner crate is a pure Rust alternative that gives decent results:

Shell session
$ svgo /tmp/export.svg -o /tmp/export.svgo.svg

export.svg:
Done in 139 ms!
90.658 KiB - 54.5% = 41.271 KiB

$ svgcleaner /tmp/export.svg /tmp/export.svgcleaner.svg
Your image is 39.70% smaller now.

$ ls -lhA /tmp/export.*
-rw-r--r--. 1 amos amos 3.0K Dec 31 18:50 /tmp/export.pdf
-rw-r--r--. 1 amos amos  91K Dec 31 20:38 /tmp/export.svg
-rw-r--r--. 1 amos amos  55K Dec 31 20:39 /tmp/export.svgcleaner.svg
-rw-r--r--. 1 amos amos  42K Dec 31 20:39 /tmp/export.svgo.svg

But svgcleaner has knobs! We can match svgo's output size if we're willing to compromise on coordinate precisions:

Shell session
$ svgcleaner --paths-coordinates-precision 3 --transforms-precision 3 /tmp/export.svg /tmp/export.svgcleaner-imprecise.svg
Your image is 53.12% smaller now.

$ ls -lhA /tmp/export.{svgo,svgcleaner-imprecise}.svg
-rw-r--r--. 1 amos amos 43K Dec 31 20:42 /tmp/export.svgcleaner-imprecise.svg
-rw-r--r--. 1 amos amos 42K Dec 31 20:39 /tmp/export.svgo.svg

And I don't know about you, but I sure cannot tell the difference:

svgcleaner is a bin+lib package, so we can just.. bring it in!

Shell session
$ cargo add svgcleaner
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding svgcleaner v0.9.5 to dependencies
Rust code
// in `salvage/src/commands/svgcleaner.rs`

use color_eyre::eyre;
use std::{fmt::Display, path::Path};
use svgcleaner::{
    cleaner::{clean_doc, parse_data, write_buffer},
    CleaningOptions,
};

#[derive(Clone)]
pub struct SvgCleaner {}

impl SvgCleaner {
    pub fn new() -> Self {
        Self {}
    }

    pub fn optimize_svg(&self, input: &Path, output: &Path) -> Result<(), eyre::Error> {
        let data = std::fs::read_to_string(input)?;

        let parse_opts = Default::default();
        let mut doc = parse_data(&data, &parse_opts).map_err(fmt_err)?;

        let clean_opts = CleaningOptions {
            paths_coordinates_precision: 3,
            transforms_precision: 3,
            ..Default::default()
        };
        let write_opts = Default::default();
        clean_doc(&mut doc, &clean_opts, &write_opts).map_err(fmt_err)?;

        let mut buf = vec![];
        write_buffer(&doc, &write_opts, &mut buf);

        std::fs::write(output, buf)?;

        Ok(())
    }
}

fn fmt_err<E: Display>(e: E) -> eyre::Report {
    color_eyre::eyre::eyre!("{}", e)
}

Wait, files? What, why?

So it's a drop-in replacement for the svgo thingy. salvage was really only meant to be a glorified command runner, think make but with hardcoded rules and knowledge of WSL2 and stuff.

I definitely want to change the interfaces here: there's no reason why svgcleaner should read its input from disk, but that way it's trivial to change this:

Rust code
    #[tracing::instrument]
    fn process_drawio(&self, env: &Environment) -> Result<(), eyre::Error> {
        let tmp_dir = mktemp::Temp::new_dir()?;
        fs::create_dir_all(&tmp_dir)?;

        let pdf_path = tmp_dir.join("temp.pdf");
        let inksvg_path = tmp_dir.join("temp.inkscape.svg");
        let safesvg_path = tmp_dir.join("temp.optimized.svg");

        // TODO: keep a chrome instance (and http server) running?
        // TODO: don't spawn a tokio runtime every time here
        env.commands
            .drawio_headless
            .drawio_to_pdf(&self.input_path, &pdf_path)?;

        // TODO: keep it all in memory, don't write to disk :)
        env.commands.poppler.pdf_to_svg(&pdf_path, &inksvg_path)?;

        env.commands
            .svgo
            .get()?
            .optimize_svg(&inksvg_path, &safesvg_path)?;

        {
            let mut dst = BufWriter::new(File::create(self.output_path())?);
            let mut src = BufReader::new(File::open(&safesvg_path)?);
            std::io::copy(&mut src, &mut dst)?;
            dst.flush()?;
        }

        Ok(())
    }

To this:

Rust code
        // cut: drawio_headless & poppler invocation

        env.commands
            .svgcleaner
            .optimize_svg(&inksvg_path, &safesvg_path)?;

I can always refactor that later. Our only mission here is to just get rid of invocations!

Replacing imagemagick, cavif and cwebp

The image crate provides PNG decoding, JPEG encoding, and AVIF encoding. Unfortunately, it doesn't do WebP encoding, but the webp crate does!

Let's bring those in, with only the features we need.

Shell session
$ cargo add image --no-default-features --features "png jpeg avif-encoder"
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding image v0.23.14 to dependencies with features: ["png", "jpeg", "avif-encoder"]

$ cargo add webp
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding webp v0.2.0 to dependencies

But this time, we'll change the interface a little bit.

Again, salvage is architected so that it can run command-line tools efficiently. So I chose kind of a weird infrastructure: it spins up a bunch of workers:

Rust code
    let mut handles = vec![];
    let num_workers = num_cpus::get();
    for _ in 0..num_workers {
        let rx = rx.clone();
        let env = env.clone();
        handles.push(std::thread::spawn(move || {
            while let Ok(transform) = rx.recv() {
                transform.process(&env).unwrap();
            }
        }));
    }

...and they can react to events:

Rust code
/// A transformation from an input file (e.g. `.drawio`) to
/// an output file (e.g. `.safe.svg`)
pub struct Transform {
    pub kind: TransformKind,
    pub workspace: Arc<Workspace>,
    pub input_path: PathBuf,
    pub output_ext: &'static str,
}

#[derive(Debug)]
pub struct Workspace {
    pub db: Arc<RwLock<Database>>,
    pub source_dir: PathBuf,
    pub output_dir: PathBuf,
}

#[derive(Debug)]
pub enum TransformKind {
    /// Render `.drawio` diagram to font-safe svg
    DrawIO,
    /// Convert an image to jpeg
    Jpeg,
    /// Convert an image to webp
    Webp,
    /// Convert an image to avif
    Avif,
}

...that are sent by this function:

Rust code
#[tracing::instrument]
fn process_dir(tx: Sender<Transform>, workspace: Arc<Workspace>) -> Result<(), eyre::Error> {
    let entries = fs::read_dir(&workspace.source_dir)?;
    for entry in entries {
        let entry = entry?;
        let input_path = entry.path();
        if let Some(ext) = input_path
            .extension()
            .map(|x| x.to_string_lossy().to_string())
        {
            match ext.as_ref() {
                "drawio" => {
                    tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg"))
                        .unwrap();
                }
                "png" => {
                    tx.send(workspace.make_transform(
                        TransformKind::Jpeg,
                        input_path.clone(),
                        "jpg",
                    ))
                    .unwrap();
                    tx.send(workspace.make_transform(
                        TransformKind::Webp,
                        input_path.clone(),
                        "webp",
                    ))
                    .unwrap();
                    tx.send(workspace.make_transform(TransformKind::Avif, input_path, "avif"))
                        .unwrap();
                }
                _ => { /* ignore */ }
            }
        }
    }
    Ok(())
}

But honestly, we don't need all that noise anymore. So how are going to refactor this?

First off, let's change so that we have a single TransformKind for bitmaps:

Rust code
#[derive(Debug)]
pub enum TransformKind {
    /// Render `.drawio` diagram to font-safe svg
    DrawIO,
    /// Convert a PNG image to JPEG, WebP, and AVIF
    Bitmap,
}

We'll send a single TransformKind for pngs:

Rust code
            match ext.as_ref() {
                "drawio" => {
                    tx.send(workspace.make_transform(TransformKind::DrawIO, input_path, "svg"))
                        .unwrap();
                }
                "png" => {
                    tx.send(workspace.make_transform(
                        TransformKind::Bitmap,
                        input_path.clone(),
                        "jpg",
                    ))
                    .unwrap();
                }
                _ => { /* ignore */ }
            }

And, fuck it, we'll rewrite process_drawio too:

Rust code
    #[tracing::instrument]
    fn process_drawio(&self) -> Result<(), eyre::Error> {
        let input_bytes = std::fs::read(&self.input_path)?;

        let pdf_bytes = DrawioHeadless::drawio_to_pdf(input_bytes)?;
        let svg_bytes = Poppler::pdf_to_svg(&pdf_bytes[..])?;
        let optimized_svg_bytes = SvgCleaner::optimize_svg(&svg_bytes[..])?;

        info!(output_path = %self.output_path(), "writing optimized SVG");
        std::fs::write(self.output_path(), optimized_svg_bytes)?;
        Ok(())
    }

    #[tracing::instrument]
    fn process_bitmap(&self) -> Result<(), eyre::Error> {
        let img = image::load(
            BufReader::new(File::open(&self.input_path)?),
            ImageFormat::Png,
        )?;

        let out_path = self.output_path();
        let jpeg_path = out_path.with_extension("jpg");
        let webp_path = out_path.with_extension("webp");
        let avif_path = out_path.with_extension("avif");

        // JPEG
        info!(%jpeg_path, "writing JPEG");
        JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45).encode_image(&img)?;

        // WebP
        info!(%webp_path, "writing WebP");
        std::fs::write(
            webp_path,
            &webp::Encoder::from_image(&img)
                .map_err(|e| eyre!("webp error: {}", e))?
                .encode(75.0)[..],
        )?;

        // AVIF
        info!(%avif_path, "writing AVIF");
        AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 4, 60).write_image(
            img.as_bytes(),
            img.width(),
            img.height(),
            img.color(),
        )?;

        Ok(())
    }

There! All in-memory and nice.

Wait! The JPEG/WebP/AVIF thing used to be parallel and now it's not, let's fix that:

Shell session
$ cargo add crossbeam-utils
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding crossbeam-utils v0.8.5 to dependencies
Rust code
    #[tracing::instrument]
    fn process_bitmap(&self) -> Result<(), eyre::Error> {
        let img = image::load(
            BufReader::new(File::open(&self.input_path)?),
            ImageFormat::Png,
        )?;

        let out_path = self.output_path();
        let jpeg_path = out_path.with_extension("jpg");
        let webp_path = out_path.with_extension("webp");
        let avif_path = out_path.with_extension("avif");

        crossbeam_utils::thread::scope(|s| {
            let jpeg = s.spawn(|_| {
                info!(%jpeg_path, "writing JPEG");
                JpegEncoder::new_with_quality(&mut File::create(jpeg_path)?, 45)
                    .encode_image(&img)?;
                Ok::<_, eyre::Report>(())
            });
            let webp = s.spawn(|_| {
                info!(%webp_path, "writing WebP");
                std::fs::write(
                    webp_path,
                    &webp::Encoder::from_image(&img)
                        .map_err(|e| eyre!("webp error: {}", e))?
                        .encode(75.0)[..],
                )?;
                Ok::<_, eyre::Report>(())
            });
            let avif =
                s.spawn(|_| {
                    info!(%avif_path, "writing AVIF");
                    AvifEncoder::new_with_speed_quality(File::create(avif_path)?, 6, 60)
                        .write_image(img.as_bytes(), img.width(), img.height(), img.color())?;
                    Ok::<_, eyre::Report>(())
                });

            jpeg.join().unwrap()?;
            webp.join().unwrap()?;
            avif.join().unwrap()?;
            Ok::<_, eyre::Report>(())
        })
        .unwrap()?;

        Ok(())
    }

There! Not the prettiest, but it'll do.

Testing it all

salvage maintains a salvage-db.json file, like so:

JSON
{
  "input_files": {
    "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.png": "4ce11f85ddc447de",
    "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-use-tag.png": "a15ecb21b94b7e8e",
    "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-word.png": "f03d8d2673407a67",
    "/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/true-bold.png": "2f14729992449d6b"
  }
}

If we remove it, it'll simply re-process all those files.

Shell session
$ rm salvage-db.json && salvage .
2021-12-31T21:25:06.956729Z  INFO salvage: Workspace: /home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets => /home/amos/bearcove/fasterthanli.me/content/series/
dont-shell-out/part-2/assets
2021-12-31T21:25:06.957218Z  INFO process: salvage: /home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.png => /home/amos/bearcove/fasterthanli.me/co
ntent/series/dont-shell-out/part-2/assets/svg-letter.jpg
2021-12-31T21:25:06.991646Z  INFO salvage: writing JPEG jpeg_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.jpg
2021-12-31T21:25:06.991672Z  INFO salvage: writing WebP webp_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.webp
2021-12-31T21:25:06.991753Z  INFO salvage: writing AVIF avif_path=/home/amos/bearcove/fasterthanli.me/content/series/dont-shell-out/part-2/assets/svg-letter.avif
2021-12-31T21:25:07.007929Z  INFO rav1e::api::config: CPU Feature Level: AVX2
2021-12-31T21:25:07.007934Z  INFO rav1e::api::config: CPU Feature Level: AVX2
2021-12-31T21:25:07.007979Z  INFO rav1e::api::internal: Using 56 tiles (7x8)
2021-12-31T21:25:07.007981Z  INFO rav1e::api::internal: Using 56 tiles (7x8)
(cut)

Let's see what we got!

Shell session
$ ls -lhA svg-letter.*
-rw-r--r--. 1 amos amos 805K Dec 31 22:25 svg-letter.avif
-rw-r--r--. 1 amos amos 404K Dec 31 22:25 svg-letter.jpg
-rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png
-rw-r--r--. 1 amos amos 289K Dec 31 22:25 svg-letter.webp

Uhh the .avif is larger than I'd expect. And it was slower than I expected at speed=6 too, so I had to bring down the speed to 4. But what's even more annoying is that...

Shell session
$ avifdec svg-letter.avif /tmp/svg-letter.png
Decoding with AV1 codec 'dav1d' (1 worker thread), please wait...
ERROR: Failed to decode image: BMFF parsing failed
Diagnostics:
 * Box[meta] does not have a Box[hdlr] as its first child box

...it doesn't seem to be a valid AVIF file.

Let's see now...

Shell session
$ cargo tree -i ravif
ravif v0.6.4
└── image v0.23.14
    ├── salvage v1.4.0 (/home/amos/bearcove/salvage)
    └── webp v0.2.0
        └── salvage v1.4.0 (/home/amos/bearcove/salvage)

Ah! It's using an old ravif! The latest is 0.8.8. Let's see if it performs better? First we'll remove the avif-encoder feature of image, and then:

Shell session
$ cargo add ravif
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding ravif v0.8.8 to dependencies

$ cargo add rgb
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding rgb v0.8.31 to dependencies

And change our code:

Rust code
            let avif = s.spawn(|_| {
                info!(%avif_path, "writing AVIF");
                let config = ravif::Config {
                    quality: 50.0,
                    alpha_quality: 50.0,
                    speed: 4,
                    premultiplied_alpha: false,
                    color_space: ravif::ColorSpace::YCbCr,
                    threads: 0,
                };
                let img = img.to_rgba8();
                let img = Img::new(
                    img.as_bytes().as_rgba(),
                    img.width() as _,
                    img.height() as _,
                );
                let (avif_bytes, _, _) =
                    ravif::encode_rgba(img, &config).map_err(|e| eyre!("ravif error: {}", e))?;
                std::fs::write(&avif_path, &avif_bytes)?;

                info!(%avif_path, "writing AVIF... done!");
                Ok::<_, eyre::Report>(())
            });

I brought down the speed (which should give better results), and reduced the quality (which should give worse results).

Let's try again!

Shell session
$ rm salvage-db.json && salvage .
(cut)

$ ls -lhA svg-letter*
-rw-r--r--. 1 amos amos 278K Dec 31 22:37 svg-letter.avif
-rw-r--r--. 1 amos amos 404K Dec 31 22:37 svg-letter.jpg
-rw-r--r--. 1 amos amos 505K Dec 27 09:27 svg-letter.png
-rw-r--r--. 1 amos amos 289K Dec 31 22:37 svg-letter.webp

Better! Much better. "Smaller than webp" is what I aim for.

Shell session
$ avifdec svg-letter.avif /tmp/svg-letter.png
Decoding with AV1 codec 'dav1d' (1 worker thread), please wait...
Image decoded: svg-letter.avif
Image details:
 * Resolution     : 2108x1528
 * Bit Depth      : 8
 * Format         : YUV444
 * Alpha          : Not premultiplied
 * Range          : Full
 * Color Primaries: 1
 * Transfer Char. : 13
 * Matrix Coeffs. : 1
 * ICC Profile    : Absent (0 bytes)
 * XMP Metadata   : Absent (0 bytes)
 * EXIF Metadata  : Absent (0 bytes)
 * Transformations: None
Wrote PNG: /tmp/svg-letter.png

Okay, it's a legit AVIF file this time!

Let's compare those images:

The image above is a PNG, by the way.

futile (my website's software) only makes a <picture> tag with AVIF, WebP and JPEG sources is when it encounters a Markdown image that ends in .jpg. I would never normally use .jpg, so that's the bat signal.

As expected, JPEG is by far the worst: with no transparency and awful artifacts. I wonder if anyone is reading this from a browser that supports neither WebP or AVIF. Maybe Internet Explorer 11?

Next steps

This is where this particular series ends: I did everything I set out to do.

But I'm thinking about the future... now that salvage can run completely headless, and instead of shoving those large .jpg, .webp and .avif files in my Git repository... why not have it run as a service on the same machine?

And cache generated assets on some cloud storage somewhere? That sounds like fun! I'm sure it would make for a nice, short article.

Oh boy. Short, yes. We all know what that means...

That's all for me this year!

Until next time, thank you for reading, and have a happy new year 2022. I think we're all going into it with a little more humility than 2021, and that can only be good.

Take care!

This article is part 7 of the Don't shell out! series.

Read the next part

If you liked what you saw, please support my work!

Github logo Donate on GitHub Patreon logo Donate on Patreon

Looking for the homepage?
Another article: Rust modules vs files