One funny way to bundle assets

👋 This page was last updated ~3 years ago. Just so you know.

Bear

There's one thing that bothers me. In part 1, why are we using hyper-staticfile? Couldn't we just use file:/// URLs?

Well, first off: showing off how easy it is to serve some static files, even in a "scary" language like Rust, is just not something I could pass up.

But also: think about distributing salvage as a tool. Will we want to distribute all those HTML/CSS/JS/font files alongside it?

Bear

Mhh no we'd probably want to bake them into the executable.

Right! So it's a single, self-contained application. And if those assets are in there, would we want to extract them to disk? When we need them?

Bear

I could see that getting messy... what if there's several instances of salvage running?

Exactly! They might try to write to the same path and we'd end up with corrupted files, or we'd have them generate unique paths and then we'd have a bunch of stuff on disk that might never get cleaned up.

Bear

Mh. And there's also the fact that.. the code we showed only ever used a single .drawio file that was already on disk.

Right! Multiple requests to convert .drawio files to .pdf might be in-flight at any time, so our tool has to be resilient against that.

Bear

So what did we end up doing again?

I don't believe I've explained it yet.

Bear

Bonus part?

Bonus part!

One funny way to bundle assets

Here's the basic problem: you have a directory structure like this one:

    src/
        main.rs
    drawio-assets/
        index.html

And you don't want to have to distribute futile.exe (or futile on non-Windows platforms) with all those files. You want them bundled in the binary.

The way I went with it (and it's not the only way) is to make a build script!

TOML markup
# in `salvage/Cargo.toml`

[build-dependencies]
walkdir = "2.3.2"
zip = { version = "0.5.13", default-features = false, features = ["deflate", "time"] }
Rust code
use std::{env, fs::File, path::PathBuf};

use walkdir::WalkDir;
use zip::ZipWriter;

fn main() {
    bundle_drawio_assets();
        // cut: other steps
}

fn bundle_drawio_assets() {
    let manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
    let drawio_assets_dir = manifest_dir.join("drawio-assets");
    println!("cargo:rerun-if-changed={}", drawio_assets_dir.display());
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    let bundle_path = out_dir.join("drawio-assets.zip");
    let mut zw = ZipWriter::new(File::create(&bundle_path).unwrap());
    for entry in WalkDir::new(&drawio_assets_dir) {
        let entry = entry.unwrap();
        let disk_path = entry.path();
        let rel_path = disk_path
            .strip_prefix(&drawio_assets_dir)
            .unwrap()
            .to_string_lossy();
        let meta = entry.metadata().unwrap();

        if meta.is_dir() {
            zw.add_directory(rel_path, Default::default()).unwrap();
        } else if meta.is_file() {
            zw.start_file(rel_path, Default::default()).unwrap();
            std::io::copy(&mut File::open(disk_path).unwrap(), &mut zw).unwrap();
        } else {
            println!("cargo:warning=Ignoring entry {}", disk_path.display());
        }
    }
}

It's pretty straightforward! First, we find the drawio-assets directory, based on the value of the CARGO_MANIFEST_DIR environment variable that's passed to all cargo build scripts.

Then, we determine the path we should write our zip file to, somewhere underneath OUT_DIR - which is also standard for cargo build scripts. We make sure to instruct cargo to re-run this script if anything in the drawio-assets/ directory changes, by simply printing cargo:rerun-if-changed=the-path.

Bear

Does this do what you'd expect for directories?

Amos

It does! It's fantastic.

Bear

And then what do we do at run-time?

As discussed, we could straight up extract them. But we don't need to!

First let's discuss the actual bundling: it's simply a matter of calling the include_bytes! macro.

Rust code
// in `salvage/src/processors/drawio.rs`

use std::{
    convert::Infallible, ffi::OsString, future::Future, io::Cursor, net::SocketAddr, pin::Pin,
    sync::Arc,
};

use camino::Utf8Path;
use color_eyre::{
    eyre::{self, eyre},
    Report,
};
use headless_chrome::{protocol::page::PrintToPdfOptions, Browser, LaunchOptionsBuilder};
use hyper::{
    header,
    service::{make_service_fn, Service},
    Body, Request, Response, Server, StatusCode,
};
use tokio::task::spawn_blocking;
use tracing::{debug, warn};
use zip::ZipArchive;

const ZIP_DATA: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/drawio-assets.zip"));
Amos

I've shown all the use directives here as a little appetizer.

I think the first big difference is that we allow the server to bind to any local port that's not used yet. Otherwise, the first instance would bind to 5000 and the second instance would fail:

Rust code
// TODO: make salvage itself async so we don't have to spin up a new tokio
// executor just for these. (this is currently a terrible hack)
// TODO: keep a chrome instance (and http server) running?
#[tokio::main]
pub async fn drawio_to_pdf(input: Vec<u8>) -> Result<Vec<u8>, eyre::Error> {
    let input = Arc::new(input);

    let addr = SocketAddr::from(([127, 0, 0, 1], 0));
    let make_svc = make_service_fn(move |_conn| {
        let input = input.clone();
        async move { Ok::<_, Infallible>(StaticFileService { input }) }
    });

    let server = Server::bind(&addr).serve(make_svc);
    let addr = server.local_addr();
    let port = addr.port();

    let server_task = tokio::spawn(server);
    let chrome_task = spawn_blocking(move || orchestrate_chrome(port));

    let output_bytes = chrome_task.await??;
    server_task.abort();

    Ok(output_bytes)
}

We do something interesting with input, which is no longer a file path: it's just a Vec<u8>. Because we're able to service any number of HTTP requests (and we do service several: some HTML, some JS, some CSS, a webfont, and our input .drawio file), all "instances" of our StaticFileService has to potentially have access to that input.

But because it doesn't need to mutate it, an Arc<Vec<u8>> works just fine. When the last reference drops (in this case at the end of drawio_to_pdf), it'll get freed. We don't actually clone the contents, we just increment its reference count (which is what input.clone() does here)>

Whatever port we could use, we pass to orchestrate_chrome, understandably. Which now returns a Vec<u8> by the way! Since the PDF is just a temporary form of our asset, and we don't want to write that one to disk either:

Rust code
pub fn orchestrate_chrome(port: u16) -> Result<Vec<u8>, Report> {
    // `headless_chrome` can use `chromium-browser` if installed, and Google
    // Chrome on Windows, so this is actually relatively dependency-free.  the
    // `--no-sandbox` stuff is useful within docker containers and in installs
    // where the SUID sandbox helper is not set up correctly.
    let no_sandbox = OsString::from("--no-sandbox");
    let browser = Browser::new(
        LaunchOptionsBuilder::default()
            .headless(true)
            .args(vec![&no_sandbox])
            .build()
            .map_err(|err| eyre!("{}", err))?,
    )
    .map_err(|err| eyre!("{}", err))?;

    let tab = browser
        .wait_for_initial_tab()
        .map_err(|err| eyre!("{}", err))?;

    let url = format!("http://localhost:{}/index.html", port);
    debug!(%url, "Navigating...");

    tab.navigate_to(&url).map_err(|err| eyre!("{}", err))?;
    tab.wait_until_navigated().map_err(|err| eyre!("{}", err))?;
    debug!("Navigating... done!");

    let width = tab
        .evaluate("document.body.clientWidth", false)
        .map_err(|err| eyre!("{}", err))?
        .value
        .unwrap()
        .as_u64()
        .unwrap();
    let height = tab
        .evaluate("document.body.clientHeight", false)
        .map_err(|err| eyre!("{}", err))?
        .value
        .unwrap()
        .as_u64()
        .unwrap();
    debug!(%width, %height, "Got dimensions");

    let pdf = tab
        .print_to_pdf(Some(PrintToPdfOptions {
            display_header_footer: Some(false),
            prefer_css_page_size: Some(false),
            landscape: None,
            print_background: None,
            scale: None,
            paper_width: Some(width as f32 / 96.0),
            paper_height: Some(height as f32 / 96.0),
            margin_top: Some(0.0),
            margin_bottom: Some(0.0),
            margin_left: Some(0.0),
            margin_right: Some(0.0),
            page_ranges: None,
            ignore_invalid_page_ranges: None,
            header_template: None,
            footer_template: None,
        }))
        .map_err(|err| eyre!("{}", err))?;

    Ok(pdf)
}

And then... well then we don't actually use hyper-staticfile anymore. We have this new StaticFileService:

Rust code
pub struct StaticFileService {
    input: Arc<Vec<u8>>,
}

type BoxFut<O> = Pin<Box<dyn Future<Output = O> + Send + 'static>>;

...along with a type alias that makes clippy very, very happy.

If you want to learn more about tower's Service type, there's an excellent post about it.

Our service is always ready (we don't ever apply backpressure), and it does some straightforward routing: we get a path like /foo/bar/baz and strip the initial /. If the result is input.drawio, we serve from that Arc<Vec<u8>> we've been lugging around!

If it's not, we open our in-memory zip file and try to find an entry that corresponds to what we're trying to serve. If we find one, we serve it, with the appropriate mime type.

Rust code
impl Service<Request<Body>> for StaticFileService {
    type Response = Response<Body>;
    type Error = Report;
    type Future = BoxFut<Result<Self::Response, Self::Error>>;

    fn poll_ready(
        &mut self,
        _cx: &mut std::task::Context<'_>,
    ) -> std::task::Poll<Result<(), Self::Error>> {
        Ok(()).into()
    }

    fn call(&mut self, req: Request<Body>) -> Self::Future {
        let input = self.input.clone();

        Box::pin(async move {
            debug!(?req, "Got request");

            let path = req.uri().path().trim_start_matches('/');
            match path {
                "input.drawio" => {
                    debug!(%path, "Serving input .drawio file");

                    let res = Response::builder()
                        .status(StatusCode::OK)
                        .header(header::CONTENT_TYPE, "application/x-drawio")
                        .body(Body::from(input.to_vec()))?;
                    Ok(res)
                }
                _ => {
                    debug!(%path, "Serving static file!");

                    let utf8_path: &Utf8Path = path.into();
                    let mime = if let Some(ext) = utf8_path.extension() {
                        mime_guess::from_ext(ext).first_or_octet_stream()
                    } else {
                        mime::APPLICATION_OCTET_STREAM
                    };

                    let bytes = {
                        let path = path.to_string();
                        spawn_blocking(move || {
                            let mut zr = ZipArchive::new(Cursor::new(ZIP_DATA))?;
                            let bytes = match zr.by_name(&path) {
                                Ok(mut f) => {
                                    debug!(crc32 = %f.crc32(), size = %f.size(), "Found zip entry");
                                    let mut bytes = vec![0u8; 0];
                                    std::io::copy(&mut f, &mut bytes)?;
                                    Some(bytes)
                                }
                                Err(err) => {
                                    warn!(%err, "Could not find zip entry");
                                    None
                                }
                            };
                            Ok::<_, Report>(bytes)
                        })
                        .await??
                    };

                    match bytes {
                        Some(bytes) => {
                            let res = Response::builder()
                                .status(StatusCode::OK)
                                .header(header::CONTENT_TYPE, mime.essence_str())
                                .body(Body::from(bytes))?;
                            Ok(res)
                        }
                        None => Ok(Response::builder()
                            .status(StatusCode::NOT_FOUND)
                            .body(Body::empty())?),
                    }
                }
            }
        })
    }
}

We could potentially do streaming decompression of zip entries as we send them to the user agent instead of decompressing everything in memory but I have made the executive decision to Not Care For Now and look at me, it shipped.

And that's all there is to it!

As mentioned in the comments above, I should probably make all of salvage async-aware instead of (ab)using #[tokio::main] like I did, and I would also want to keep Chrome running - I feel like we're paying for its startup time a lot.

I feel like ideally each tab would be a "worker" in a pool? And it would have all the draw.io javascript all loaded up and we'd be able to just issue commands, print to PDF, rinse, repeat. There's something beautiful there that's just waiting to be made.

But not today.

You've somehow finished reading Don't shell out!.

If you liked what you saw, please support my work!

Github logo Donate on GitHub Patreon logo Donate on Patreon

Here's another article just for you:

Abstracting away correctness

I've been banging the same drum for years: APIs must be carefully designed.

This statement doesn't resonate the same way with everyone. In order to really understand what I mean by "careful API design", one has to have experienced both ends of the spectrum.

But there is a silver lining - once you have experienced "good design", it's really hard to go back to the other kind. Even after acknowledging that "good design" inevitably comes at a cost, whether it's cognitive load, compile times, making hiring more challenging, etc.