Thanks to my sponsors: Zoran Zaric, Mark Old, Michael, joseph, Mario Fleischhacker, Luke Yue, Olivia Crain, Kristoffer Winther Balling, ZacJW, Olly Swanson, Chris Walker, Tiziano Santoro, Jacob Cheriathundam, Zaki, Chris, Integer 32, LLC, Raphaël Thériault, Richard Stephens, Ronen Ulanovsky, Aleksandre Khokhiashvili and 227 more
One funny way to bundle assets
👋 This page was last updated ~3 years ago. Just so you know.
There's one thing that bothers me. In part 1, why are we using
hyper-staticfile
? Couldn't we just use file:///
URLs?
Well, first off: showing off how easy it is to serve some static files, even in a "scary" language like Rust, is just not something I could pass up.
But also: think about distributing salvage
as a tool. Will we want to
distribute all those HTML/CSS/JS/font files alongside it?
Mhh no we'd probably want to bake them into the executable.
Right! So it's a single, self-contained application. And if those assets are in there, would we want to extract them to disk? When we need them?
I could see that getting messy... what if there's several instances of salvage running?
Exactly! They might try to write to the same path and we'd end up with corrupted files, or we'd have them generate unique paths and then we'd have a bunch of stuff on disk that might never get cleaned up.
Mh. And there's also the fact that.. the code we showed only ever used a single
.drawio
file that was already on disk.
Right! Multiple requests to convert .drawio
files to .pdf
might be in-flight
at any time, so our tool has to be resilient against that.
So what did we end up doing again?
I don't believe I've explained it yet.
Bonus part?
Bonus part!
One funny way to bundle assets
Here's the basic problem: you have a directory structure like this one:
src/ main.rs drawio-assets/ index.html
And you don't want to have to distribute futile.exe
(or futile
on
non-Windows platforms) with all those files. You want them bundled in the
binary.
The way I went with it (and it's not the only way) is to make a build script!
# in `salvage/Cargo.toml` [build-dependencies] walkdir = "2.3.2" zip = { version = "0.5.13", default-features = false, features = ["deflate", "time"] }
use std::{env, fs::File, path::PathBuf}; use walkdir::WalkDir; use zip::ZipWriter; fn main() { bundle_drawio_assets(); // cut: other steps } fn bundle_drawio_assets() { let manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap()); let drawio_assets_dir = manifest_dir.join("drawio-assets"); println!("cargo:rerun-if-changed={}", drawio_assets_dir.display()); let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap()); let bundle_path = out_dir.join("drawio-assets.zip"); let mut zw = ZipWriter::new(File::create(&bundle_path).unwrap()); for entry in WalkDir::new(&drawio_assets_dir) { let entry = entry.unwrap(); let disk_path = entry.path(); let rel_path = disk_path .strip_prefix(&drawio_assets_dir) .unwrap() .to_string_lossy(); let meta = entry.metadata().unwrap(); if meta.is_dir() { zw.add_directory(rel_path, Default::default()).unwrap(); } else if meta.is_file() { zw.start_file(rel_path, Default::default()).unwrap(); std::io::copy(&mut File::open(disk_path).unwrap(), &mut zw).unwrap(); } else { println!("cargo:warning=Ignoring entry {}", disk_path.display()); } } }
It's pretty straightforward! First, we find the drawio-assets
directory, based
on the value of the CARGO_MANIFEST_DIR
environment variable that's passed to
all cargo build scripts.
Then, we determine the path we should write our zip file to, somewhere
underneath OUT_DIR
- which is also standard for cargo build scripts. We make
sure to instruct cargo to re-run this script if anything in the drawio-assets/
directory changes, by simply printing cargo:rerun-if-changed=the-path
.
Does this do what you'd expect for directories?
It does! It's fantastic.
And then what do we do at run-time?
As discussed, we could straight up extract them. But we don't need to!
First let's discuss the actual bundling: it's simply a matter of calling the include_bytes! macro.
// in `salvage/src/processors/drawio.rs` use std::{ convert::Infallible, ffi::OsString, future::Future, io::Cursor, net::SocketAddr, pin::Pin, sync::Arc, }; use camino::Utf8Path; use color_eyre::{ eyre::{self, eyre}, Report, }; use headless_chrome::{protocol::page::PrintToPdfOptions, Browser, LaunchOptionsBuilder}; use hyper::{ header, service::{make_service_fn, Service}, Body, Request, Response, Server, StatusCode, }; use tokio::task::spawn_blocking; use tracing::{debug, warn}; use zip::ZipArchive; const ZIP_DATA: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/drawio-assets.zip"));
I've shown all the use
directives here as a little appetizer.
I think the first big difference is that we allow the server to bind to any local port that's not used yet. Otherwise, the first instance would bind to 5000 and the second instance would fail:
// TODO: make salvage itself async so we don't have to spin up a new tokio // executor just for these. (this is currently a terrible hack) // TODO: keep a chrome instance (and http server) running? #[tokio::main] pub async fn drawio_to_pdf(input: Vec<u8>) -> Result<Vec<u8>, eyre::Error> { let input = Arc::new(input); let addr = SocketAddr::from(([127, 0, 0, 1], 0)); let make_svc = make_service_fn(move |_conn| { let input = input.clone(); async move { Ok::<_, Infallible>(StaticFileService { input }) } }); let server = Server::bind(&addr).serve(make_svc); let addr = server.local_addr(); let port = addr.port(); let server_task = tokio::spawn(server); let chrome_task = spawn_blocking(move || orchestrate_chrome(port)); let output_bytes = chrome_task.await??; server_task.abort(); Ok(output_bytes) }
We do something interesting with input
, which is no longer a file path: it's
just a Vec<u8>
. Because we're able to service any number of HTTP requests (and
we do service several: some HTML, some JS, some CSS, a webfont, and our input
.drawio file), all "instances" of our StaticFileService
has to potentially have
access to that input.
But because it doesn't need to mutate it, an Arc<Vec<u8>>
works just fine.
When the last reference drops (in this case at the end of drawio_to_pdf
),
it'll get freed. We don't actually clone the contents, we just increment its
reference count (which is what input.clone()
does here)>
Whatever port we could use, we pass to orchestrate_chrome
, understandably.
Which now returns a Vec<u8>
by the way! Since the PDF is just a temporary form
of our asset, and we don't want to write that one to disk either:
pub fn orchestrate_chrome(port: u16) -> Result<Vec<u8>, Report> { // `headless_chrome` can use `chromium-browser` if installed, and Google // Chrome on Windows, so this is actually relatively dependency-free. the // `--no-sandbox` stuff is useful within docker containers and in installs // where the SUID sandbox helper is not set up correctly. let no_sandbox = OsString::from("--no-sandbox"); let browser = Browser::new( LaunchOptionsBuilder::default() .headless(true) .args(vec![&no_sandbox]) .build() .map_err(|err| eyre!("{}", err))?, ) .map_err(|err| eyre!("{}", err))?; let tab = browser .wait_for_initial_tab() .map_err(|err| eyre!("{}", err))?; let url = format!("http://localhost:{}/index.html", port); debug!(%url, "Navigating..."); tab.navigate_to(&url).map_err(|err| eyre!("{}", err))?; tab.wait_until_navigated().map_err(|err| eyre!("{}", err))?; debug!("Navigating... done!"); let width = tab .evaluate("document.body.clientWidth", false) .map_err(|err| eyre!("{}", err))? .value .unwrap() .as_u64() .unwrap(); let height = tab .evaluate("document.body.clientHeight", false) .map_err(|err| eyre!("{}", err))? .value .unwrap() .as_u64() .unwrap(); debug!(%width, %height, "Got dimensions"); let pdf = tab .print_to_pdf(Some(PrintToPdfOptions { display_header_footer: Some(false), prefer_css_page_size: Some(false), landscape: None, print_background: None, scale: None, paper_width: Some(width as f32 / 96.0), paper_height: Some(height as f32 / 96.0), margin_top: Some(0.0), margin_bottom: Some(0.0), margin_left: Some(0.0), margin_right: Some(0.0), page_ranges: None, ignore_invalid_page_ranges: None, header_template: None, footer_template: None, })) .map_err(|err| eyre!("{}", err))?; Ok(pdf) }
And then... well then we don't actually use hyper-staticfile
anymore. We have
this new StaticFileService
:
pub struct StaticFileService { input: Arc<Vec<u8>>, } type BoxFut<O> = Pin<Box<dyn Future<Output = O> + Send + 'static>>;
...along with a type alias that makes clippy very, very happy.
If you want to learn more about tower's Service
type, there's an excellent post
about it.
Our service is always ready (we don't ever apply backpressure), and it does some
straightforward routing: we get a path like /foo/bar/baz
and strip the initial
/
. If the result is input.drawio
, we serve from that Arc<Vec<u8>>
we've
been lugging around!
If it's not, we open our in-memory zip file and try to find an entry that corresponds to what we're trying to serve. If we find one, we serve it, with the appropriate mime type.
impl Service<Request<Body>> for StaticFileService { type Response = Response<Body>; type Error = Report; type Future = BoxFut<Result<Self::Response, Self::Error>>; fn poll_ready( &mut self, _cx: &mut std::task::Context<'_>, ) -> std::task::Poll<Result<(), Self::Error>> { Ok(()).into() } fn call(&mut self, req: Request<Body>) -> Self::Future { let input = self.input.clone(); Box::pin(async move { debug!(?req, "Got request"); let path = req.uri().path().trim_start_matches('/'); match path { "input.drawio" => { debug!(%path, "Serving input .drawio file"); let res = Response::builder() .status(StatusCode::OK) .header(header::CONTENT_TYPE, "application/x-drawio") .body(Body::from(input.to_vec()))?; Ok(res) } _ => { debug!(%path, "Serving static file!"); let utf8_path: &Utf8Path = path.into(); let mime = if let Some(ext) = utf8_path.extension() { mime_guess::from_ext(ext).first_or_octet_stream() } else { mime::APPLICATION_OCTET_STREAM }; let bytes = { let path = path.to_string(); spawn_blocking(move || { let mut zr = ZipArchive::new(Cursor::new(ZIP_DATA))?; let bytes = match zr.by_name(&path) { Ok(mut f) => { debug!(crc32 = %f.crc32(), size = %f.size(), "Found zip entry"); let mut bytes = vec![0u8; 0]; std::io::copy(&mut f, &mut bytes)?; Some(bytes) } Err(err) => { warn!(%err, "Could not find zip entry"); None } }; Ok::<_, Report>(bytes) }) .await?? }; match bytes { Some(bytes) => { let res = Response::builder() .status(StatusCode::OK) .header(header::CONTENT_TYPE, mime.essence_str()) .body(Body::from(bytes))?; Ok(res) } None => Ok(Response::builder() .status(StatusCode::NOT_FOUND) .body(Body::empty())?), } } } }) } }
We could potentially do streaming decompression of zip entries as we send them to the user agent instead of decompressing everything in memory but I have made the executive decision to Not Care For Now and look at me, it shipped.
And that's all there is to it!
As mentioned in the comments above, I should probably make all of salvage
async-aware instead of (ab)using #[tokio::main]
like I did, and I would also
want to keep Chrome running - I feel like we're paying for its startup time a
lot.
I feel like ideally each tab would be a "worker" in a pool? And it would have all the draw.io javascript all loaded up and we'd be able to just issue commands, print to PDF, rinse, repeat. There's something beautiful there that's just waiting to be made.
But not today.
If you liked what you saw, please support my work!
Here's another article just for you:
I've been banging the same drum for years: APIs must be carefully designed.
This statement doesn't resonate the same way with everyone. In order to really understand what I mean by "careful API design", one has to have experienced both ends of the spectrum.
But there is a silver lining - once you have experienced "good design", it's really hard to go back to the other kind. Even after acknowledging that "good design" inevitably comes at a cost, whether it's cognitive load, compile times, making hiring more challenging, etc.