Thanks to my sponsors: traxys, Ross Williams, Rufus Cable, Michael Alyn Miller, Zaki, Gran PC, Dylan Anthony, Seth, Josiah Bull, Thor Kamphefner, Mark Old, Yufan Lou, Marcus Griep, Ronen Ulanovsky, Mark Tomlin, AdrianEddy, Jon Gjengset, Christoph Grabo, Matt Jackson, Tyler Schmidtke and 249 more
Extra credit
We've achieved our goals already with this series: we have a web service written in Rust, built into a Docker image with nix, with a nice dev shell, that we can deploy to fly.io.
But there's always room for improvement, and so I wanted to talk about a few things we didn't bother doing in the previous chapters.
Making clash-geoip
available in the dev shell
When we ran our app locally, we signed up for a MaxMindDB account, and had to download and unpack the "Country" Geolite2 database manually.
But then when we packaged the app, we simply grabbed the clash-geoip
package
straight from nixpkgs and we were good to go!
Could we do the same for our local dev shell?
First we'll need to remove this line from .envrc
:
# in .envrc
export GEOLITE2_COUNTRY_DB="db/GeoLite2-Country.mmdb"
And run direnv allow
.
Then, in flake.nix
:
devShells.default = mkShell
{
inputsFrom = [ bin ];
# new in list: clash-geoip 👇
buildInputs = with pkgs; [ dive docker shadow flyctl just clash-geoip ];
# this is just a shell script, but we can refer to packages here:
shellHook = ''
export GEOLITE2_COUNTRY_DB=${clash-geoip}/etc/clash/Country.mmdb
'';
};
And after pressing Enter to let our environment reload, we get:
$ env | grep '^GEOLITE'
GEOLITE2_COUNTRY_DB=/nix/store/l1amkbsx8mqv11b4mg98f9lyw5nkrcbv-clash-geoip-20230112/etc/clash/Country.mmdb
Nice! That means we can get rid of our locally-downloaded file in db/
:
$ rm db/GeoLite2-Country.mmdb
That'll make it easier for other folks to contribute: remember we don't want to commit this database to git, but we also don't want to force other contributors to download their own copy, or have to send it to them via some side-channel.
Switching from openssl
to rustls
I like rustls a lot. It's been audited, it supports all the modern stuff you need to talk to browsers, the only compelling reasons to use openssl at this point are: you need something weird and old and deprecated, probably for IoT (internet of thing) devices that can't be updated, OR you're targetting a platform that you can't (yet) build Rust code for.
Anyway, getting rid of it is not that hard, let's see what depends on it:
$ cargo tree --invert openssl-sys
openssl-sys v0.9.78
├── native-tls v0.2.11
│ ├── hyper-tls v0.5.0
│ │ └── reqwest v0.11.13
│ │ ├── catscii v0.1.0 (/home/amos/catscii)
│ │ ├── libhoney-rust v0.1.6 (https://github.com/ramosbugs/libhoney-rust?rev=98710516cb63d3393d26a22d5493a421c550525a#98710516)
│ │ │ └── opentelemetry-honeycomb v0.1.0 (https://github.com/fasterthanlime/opentelemetry-honeycomb-rs?branch=simplified#2a197b9b)
│ │ │ └── catscii v0.1.0 (/home/amos/catscii)
│ │ └── sentry v0.29.0
│ │ └── catscii v0.1.0 (/home/amos/catscii)
│ ├── reqwest v0.11.13 (*)
│ ├── sentry v0.29.0 (*)
│ └── tokio-native-tls v0.3.0
│ ├── hyper-tls v0.5.0 (*)
│ └── reqwest v0.11.13 (*)
└── openssl v0.10.43
└── native-tls v0.2.11 (*)
Alright! sentry
and reqwest
, we can fix that:
# in `catscii/`Cargo.toml`
[dependences]
# omitted: other dependencies
reqwest = { version = "0.11", default-features = false, features = ["json", "rustls-tls-webpki-roots"] }
sentry = { version = "0.29", default-features = false, features = ["reqwest", "rustls", "backtrace", "contexts", "panic"] }
We're opting out of default features (which for sentry ends up pulling
native-tls
, which pulls in openssl), and into TLS support via rustls.
Note that for reqwest
, I've chosen the rustls-tls-webpki-roots
feature,
which ships its own set of CA certificates, so not only can we remove openssl
from our buildInputs
:
# had openssl 👇
buildInputs = with pkgs; [ sqlite ];
...we can also remove cacert
from the Docker image!
dockerImage = pkgs.dockerTools.streamLayeredImage {
name = "catscii";
tag = "latest";
# 👇 had `pkgs.cacert`
contents = [ bin ];
config = {
Cmd = [ "${bin}/bin/catscii" ];
Env = with pkgs; [ "GEOLITE2_COUNTRY_DB=${clash-geoip}/etc/clash/Country.mmdb" ];
};
};
Using docker compose
to run the service locally
I mentioned it in passing, it's time to actually do it!
Let's move a few variables from .envrc
to a new .env
file:
# in `.env`
SENTRY_DSN="redacted"
HONEYCOMB_API_KEY="redacted"
CARGO_REGISTRIES_CATSCII_TOKEN="redacted"
ANALYTICS_DB="db/analytics.db"
And in .envrc
, replace the lines we moved with dotenv .env
, making our
whole .envrc
file looks like:
# in `.envrc`
#!/bin/bash
if ! has nix_direnv_version || ! nix_direnv_version 2.2.1; then
source_url "https://raw.githubusercontent.com/nix-community/nix-direnv/2.2.1/direnvrc" "sha256-zelF0vLbEl5uaqrfIzbgNzJWGmLzCmYAkInj/LNxvKs="
fi
nix_direnv_watch_file rust-toolchain.toml
use flake
# 👇
dotenv .env
Now, we can set up a simple docker-compose.yml
file that will run our app
locally:
# in `docker-compose.yml`
services:
catscii:
image: "catscii:latest"
env_file: ".env"
ports:
- "8080:8080"
stop_signal: SIGINT
And now (after building the image), all we need to do to run it is just:
$ docker compose up
By default, it'll stick in the foreground and show the output of the service:
$ docker compose up
[+] Running 1/0
⠿ Container catscii-catscii-1 Created 0.0s
Attaching to catscii-catscii-1
catscii-catscii-1 | /nix/store/561wgc73s0x1250hrgp7jm22hhv7yfln-bash-5.2-p15/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
catscii-catscii-1 | {"timestamp":"2023-02-23T19:57:35.590683Z","level":"INFO","fields":{"message":"Creating honey client","log.target":"libhoney::client","log.module_path":"libhoney::client","log.file":"/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/2b916d13eeef5c4c978dd7cdbe0195cb3a2cb3e77889acf9f85433ddf74a8e14/libhoney-rust-0.1.6/src/client.rs","log.line":78},"target":"libhoney::client"}
catscii-catscii-1 | {"timestamp":"2023-02-23T19:57:35.590761Z","level":"INFO","fields":{"message":"transmission starting","log.target":"libhoney::transmission","log.module_path":"libhoney::transmission","log.file":"/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/2b916d13eeef5c4c978dd7cdbe0195cb3a2cb3e77889acf9f85433ddf74a8e14/libhoney-rust-0.1.6/src/transmission.rs","log.line":124},"target":"libhoney::transmission"}
catscii-catscii-1 | {"timestamp":"2023-02-23T19:57:35.596155Z","level":"INFO","fields":{"message":"Listening on 0.0.0.0:8080"},"target":"catscii"}
Hitting ctrl-c
sends the stop signal, which we configured to be SIGINT
(it's
SIGSTOP
by default), and that initiates a graceful shutdown of our server, so
everything works fine!
Apart from not having to remember the correct docker run
incantation, Compose
files come
in particularly handy when.. composing services.
Say we needed an OpenTelemetry Exporter, a Postgres database, or perhaps an S3-compatible object store like minio, we could just list them there!
Compose files also let us define networks, volumes, etc.
Persisting the analytics database
SPEAKING OF VOLUMES. Do you know what we forgot, bear?
?
Your silence speaks volumes.
You did not just make that joke.
We forgot to persist the analytics database! Right now, both in development and in production, our database lives... in the docker image, kind of. Well it lives in the read-write overlay on top of our docker image, or whatever the actual implementation details are.
Point is, every time we stop the container, the database is lost. Same if we re-deploy the service to fly.io.
Now that we have a docker-compose.yml
file, it's easy to fix locally!
First, let's remove ANALYTICS_DB
from .env
, because it feels like it no
longer belongs there. Instead, we can set it directly from where we define the
volume:
services:
catscii:
image: "catscii:latest"
# 👇 new!
volumes:
- catscii-db:/db
environment:
ANALYTICS_DB: /db/analytics.db
# (old)
env_file: ".env"
ports:
- "8080:8080"
stop_signal: SIGINT
# 👇 new again!
volumes:
catscii-db:
And that takes care of our local environment! That doesn't really do geo-ip lookups anyway, since, well, we're only seeing Private IP addresses, so, it's really just for the exercise.
As for our production setup, well if you really don't want to read the fly.io docs, I'll do it for you.
First we create a volume... let's say 1G, I'll provision mine in Paris (CDG airport):
$ fly vol create catscii_db --size 1 --region cdg
Update available 0.0.456 -> v0.0.463.
Run "flyctl version update" to upgrade.
ID: vol_zmjnv8m3dkyrywgx
Name: catscii_db
App: old-frost-6294
Region: cdg
Zone: 0e8c
Size GB: 1
Encrypted: true
Created at: 23 Feb 23 20:20 UTC
Note that I'm running this in the catscii/
directory so:
- We have
flyctl
in path - But its auto-updater is sad that our flake pins an older version
- There's a
fly.toml
file in the current directory, meaning we don't need to pass-a
/--app
And then we politely ask fly to mount the volume at /db
:
# in `fly.toml`
[mounts]
source = "catscii_db"
destination = "/db"
[env]
# don't forget to update this 👇
ANALYTICS_DB = "/db/analytics.db"
And then we deploy!
$ just deploy
(cut)
And I guess we'll find out if it truly does persist the next time we deploy a change.
Using sqlite asynchronously
There's a bunch of good options to use sqlite asynchronously from Rust.
Well... not truly asynchronously. Async I/O in Rust relies on having an async
runtime that probably uses something like epoll
under the hood to subscribe to
events and only issue write
/ read
calls when it knows they won't block -
that's what tokio
does for us, underpinning the axum
web framework we're using.
But sqlite is a big ball of C code, it has its own concept of I/O. In fact it even has an abstraction over it: virtual filesystems (vfs).
So there's not really a way to plug "async Rust I/O" into sqlite.
We can however, do the next best thing: run sqlite operations in a thread pool!
Some higher-level frameworks handle that directly, but I'm happy to stick with something rusqlite-shaped for now, and so let's just reach out for tokio-rusqlite.
This is happening in locat
, so:
$ cd locat/
$ cargo add tokio-rusqlite
Command 'cargo' not found, but can be installed with:
sudo snap install rustup # version 1.24.3, or
sudo apt install cargo # version 0.60.0ubuntu1-0ubuntu3
See 'snap info rustup' for additional versions.
Oh, right.
Well... I guess we can make a tiny flake.
# in `locat/flake.nix`
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
flake-utils.url = "github:numtide/flake-utils";
rust-overlay = {
url = "github:oxalica/rust-overlay";
inputs = {
nixpkgs.follows = "nixpkgs";
flake-utils.follows = "flake-utils";
};
};
};
outputs = { self, nixpkgs, flake-utils, rust-overlay }:
flake-utils.lib.eachDefaultSystem
(system:
let
overlays = [ (import rust-overlay) ];
pkgs = import nixpkgs {
inherit system overlays;
};
in
with pkgs; {
devShells.default = mkShell {
buildInputs = with pkgs; [ rust-bin.nightly.latest.default sqlite ];
};
}
);
}
And a tiny .envrc file:
#!/bin/bash
# in `locat/.envrc`
if ! has nix_direnv_version || ! nix_direnv_version 2.2.1; then
source_url "https://raw.githubusercontent.com/nix-community/nix-direnv/2.2.1/direnvrc" "sha256-zelF0vLbEl5uaqrfIzbgNzJWGmLzCmYAkInj/LNxvKs="
fi
use flake
It's all still a bit verbose, but at least here we don't have a
rust-toolchain.toml
file to worry about.
Now!
$ direnv allow
(cut)
$ cargo add tokio-rusqlite
Updating crates.io index
Adding tokio-rusqlite v0.3.0 to dependencies.
That's more like it.
Let's make sure we don't have conflicts as to the rusqlite
dependency:
Note that cargo add tokio-rusqlite
will currently conflict with rusqlite 0.28. Upgrading rusqlite to version 0.29 resolves the conflict.
$ cargo tree --invert rusqlite
rusqlite v0.29.0
├── locat v0.4.0 (/home/amos/locat)
└── tokio-rusqlite v0.3.0
└── locat v0.4.0 (/home/amos/locat)
Nope, looks good!
Let's also add tokio
explicitly, we'll need it to remove some other blocking
operations:
$ cargo add tokio --features fs,test-util
(cut)
Alright, pay attention to the comments now:
// in `locat/src/lib.rs`
use std::net::IpAddr;
// We're using tokio-rusqlite's own Connection type now
use tokio_rusqlite::Connection;
/// Allows geo-locating IPs and keeps analytics
pub struct Locat {
reader: maxminddb::Reader<Vec<u8>>,
analytics: Db,
}
#[derive(Debug, thiserror::Error)]
pub enum Error {
#[error("maxminddb error: {0}")]
MaxMindDb(#[from] maxminddb::MaxMindDBError),
// this can happen while reading the geoip db from disk
#[error("io error: {0}")]
Io(#[from] std::io::Error),
#[error("rusqlite error: {0}")]
Rusqlite(#[from] rusqlite::Error),
}
impl Locat {
// this is async now
pub async fn new(geoip_country_db_path: &str, analytics_db_path: &str) -> Result<Self, Error> {
// read the geoip database into memory asynchronously. previously
// we used a synchronous method off of `maxminddb::Reader` directly.
let geoip_data = tokio::fs::read(geoip_country_db_path).await?;
Ok(Self {
// this is all in-memory, no I/O involved here
reader: maxminddb::Reader::from_source(geoip_data)?,
analytics: Db::open(analytics_db_path).await?,
})
}
/// Converts an address to an ISO 3166-1 alpha-2 country code
pub async fn ip_to_iso_code(&self, addr: IpAddr) -> Option<&str> {
let iso_code = self
.reader
.lookup::<maxminddb::geoip2::Country>(addr)
.ok()?
.country?
.iso_code?;
if let Err(e) = self.analytics.increment(iso_code).await {
eprintln!("Could not increment analytics: {e}");
}
Some(iso_code)
}
/// Returns a map of country codes to number of requests
pub async fn get_analytics(&self) -> Result<Vec<(String, u64)>, Error> {
Ok(self.analytics.list().await?)
}
}
struct Db {
// this used to store a path, now it stores a connection
conn: Connection,
}
impl Db {
async fn open(path: &str) -> Result<Self, rusqlite::Error> {
// open and migrate the database in a non-blocking way
let conn = Connection::open(path).await?;
// this is how operations are run on a thread pool: we pass a
// closure. note that it must be `'static`, so we can't borrow
// anything from the outside: owned types only.
conn.call(|conn| {
// create analytics table
conn.execute(
"CREATE TABLE IF NOT EXISTS analytics (
iso_code TEXT PRIMARY KEY,
count INTEGER NOT NULL
)",
[],
)?;
Ok::<_, rusqlite::Error>(())
})
.await?;
Ok(Self { conn })
}
async fn list(&self) -> Result<Vec<(String, u64)>, rusqlite::Error> {
self.conn
.call(|conn| {
let mut stmt = conn.prepare("SELECT iso_code, count FROM analytics")?;
let mut rows = stmt.query([])?;
let mut analytics = Vec::new();
while let Some(row) = rows.next()? {
let iso_code: String = row.get(0)?;
let count: u64 = row.get(1)?;
analytics.push((iso_code, count));
}
Ok(analytics)
})
.await
}
async fn increment(&self, iso_code: &str) -> Result<(), rusqlite::Error> {
// we have to use `iso_code` from within the closure and the closure
// must be 'static, so:
let iso_code = iso_code.to_owned();
self.conn.call(|conn| {
let mut stmt = conn
.prepare("INSERT INTO analytics (iso_code, count) VALUES (?, 1) ON CONFLICT (iso_code) DO UPDATE SET count = count + 1")
?;
stmt.execute([iso_code])?;
Ok(())
}).await
}
}
#[cfg(test)]
mod tests {
use crate::Db;
struct RemoveOnDrop {
path: &'static str,
}
impl Drop for RemoveOnDrop {
fn drop(&mut self) {
_ = std::fs::remove_file(self.path);
}
}
// this test needs an async runtime now, hence, `tokio::test`
#[tokio::test]
async fn test_db() {
let path = "/tmp/loca-test.db";
let db = Db::open(path).await.unwrap();
let _remove_on_drop = RemoveOnDrop { path };
let analytics = db.list().await.unwrap();
assert_eq!(analytics.len(), 0);
db.increment("US").await.unwrap();
let analytics = db.list().await.unwrap();
assert_eq!(analytics.len(), 1);
db.increment("US").await.unwrap();
db.increment("FR").await.unwrap();
let analytics = db.list().await.unwrap();
assert_eq!(analytics.len(), 2);
// contains US at count 2
assert!(analytics.contains(&("US".to_string(), 2)));
// contains FR at count 1
assert!(analytics.contains(&("FR".to_string(), 1)));
// doesn't contain DE
assert!(!analytics.contains(&("DE".to_string(), 0)));
}
}
And you know the rest! Bump locat/Cargo.toml
to 1.0.0
(since we broke the
API, this is a major release), commit, push, cargo publish
, and update the
dependency in catscii
.
The changes required in catscii
fit on one line, and are left as an exercise
to the reader.
Deploying again shows that the analytics database does, in fact, persist across deploys now!
Here's another article just for you:
A half-hour to learn Rust
In order to increase fluency in a programming language, one has to read a lot of it.
But how can you read a lot of it if you don't know what it means?
In this article, instead of focusing on one or two concepts, I'll try to go through as many Rust snippets as I can, and explain what the keywords and symbols they contain mean.
Ready? Go!