We've achieved our goals already with this series: we have a web service written in Rust, built into a Docker image with nix, with a nice dev shell, that we can deploy to fly.io.

But there's always room for improvement, and so I wanted to talk about a few things we didn't bother doing in the previous chapters.

Making clash-geoip available in the dev shell

When we ran our app locally, we signed up for a MaxMindDB account, and had to download and unpack the "Country" Geolite2 database manually.

But then when we packaged the app, we simply grabbed the clash-geoip package straight from nixpkgs and we were good to go!

Could we do the same for our local dev shell?

First we'll need to remove this line from .envrc:

Shell session
# in .envrc
export GEOLITE2_COUNTRY_DB="db/GeoLite2-Country.mmdb"

And run direnv allow.

Then, in flake.nix:

nix
  devShells.default = mkShell
    {
      inputsFrom = [ bin ];
      # new in list: clash-geoip 👇
      buildInputs = with pkgs; [ dive docker shadow flyctl just clash-geoip ];
      # this is just a shell script, but we can refer to packages here:
      shellHook = ''
        export GEOLITE2_COUNTRY_DB=${clash-geoip}/etc/clash/Country.mmdb
      '';
    };

And after pressing Enter to let our environment reload, we get:

Shell session
$ env | grep '^GEOLITE'
GEOLITE2_COUNTRY_DB=/nix/store/l1amkbsx8mqv11b4mg98f9lyw5nkrcbv-clash-geoip-20230112/etc/clash/Country.mmdb

Nice! That means we can get rid of our locally-downloaded file in db/:

Shell session
$ rm db/GeoLite2-Country.mmdb

That'll make it easier for other folks to contribute: remember we don't want to commit this database to git, but we also don't want to force other contributors to download their own copy, or have to send it to them via some side-channel.

Switching from openssl to rustls

I like rustls a lot. It's been audited, it supports all the modern stuff you need to talk to browsers, the only compelling reasons to use openssl at this point are: you need something weird and old and deprecated, probably for IoT (internet of thing) devices that can't be updated, OR you're targetting a platform that you can't (yet) build Rust code for.

Anyway, getting rid of it is not that hard, let's see what depends on it:

Shell session
$ cargo tree --invert openssl-sys
openssl-sys v0.9.78
├── native-tls v0.2.11
│   ├── hyper-tls v0.5.0
│   │   └── reqwest v0.11.13
│   │       ├── catscii v0.1.0 (/home/amos/catscii)
│   │       ├── libhoney-rust v0.1.6 (https://github.com/ramosbugs/libhoney-rust?rev=98710516cb63d3393d26a22d5493a421c550525a#98710516)
│   │       │   └── opentelemetry-honeycomb v0.1.0 (https://github.com/fasterthanlime/opentelemetry-honeycomb-rs?branch=simplified#2a197b9b)
│   │       │       └── catscii v0.1.0 (/home/amos/catscii)
│   │       └── sentry v0.29.0
│   │           └── catscii v0.1.0 (/home/amos/catscii)
│   ├── reqwest v0.11.13 (*)
│   ├── sentry v0.29.0 (*)
│   └── tokio-native-tls v0.3.0
│       ├── hyper-tls v0.5.0 (*)
│       └── reqwest v0.11.13 (*)
└── openssl v0.10.43
    └── native-tls v0.2.11 (*)

Alright! sentry and reqwest, we can fix that:

TOML markup
# in `catscii/`Cargo.toml`

[dependences]
# omitted: other dependencies
reqwest = { version = "0.11", default-features = false, features = ["json", "rustls-tls-webpki-roots"] }
sentry = { version = "0.29", default-features = false, features = ["reqwest", "rustls", "backtrace", "contexts", "panic"] }

We're opting out of default features (which for sentry ends up pulling native-tls, which pulls in openssl), and into TLS support via rustls.

Note that for reqwest, I've chosen the rustls-tls-webpki-roots feature, which ships its own set of CA certificates, so not only can we remove openssl from our buildInputs:

nix
  #                had openssl 👇
  buildInputs = with pkgs; [ sqlite ];

...we can also remove cacert from the Docker image!

nix
  dockerImage = pkgs.dockerTools.streamLayeredImage {
    name = "catscii";
    tag = "latest";
    # 👇 had `pkgs.cacert`
    contents = [ bin ];
    config = {
      Cmd = [ "${bin}/bin/catscii" ];
      Env = with pkgs; [ "GEOLITE2_COUNTRY_DB=${clash-geoip}/etc/clash/Country.mmdb" ];
    };
  };

Using docker compose to run the service locally

I mentioned it in passing, it's time to actually do it!

Let's move a few variables from .envrc to a new .env file:

Bash
# in `.env`

SENTRY_DSN="redacted"
HONEYCOMB_API_KEY="redacted"
CARGO_REGISTRIES_CATSCII_TOKEN="redacted"
ANALYTICS_DB="db/analytics.db"

And in .envrc, replace the lines we moved with dotenv .env, making our whole .envrc file looks like:

Bash
# in `.envrc`

#!/bin/bash

if ! has nix_direnv_version || ! nix_direnv_version 2.2.1; then
  source_url "https://raw.githubusercontent.com/nix-community/nix-direnv/2.2.1/direnvrc" "sha256-zelF0vLbEl5uaqrfIzbgNzJWGmLzCmYAkInj/LNxvKs="
fi

nix_direnv_watch_file rust-toolchain.toml
use flake

# 👇
dotenv .env

Now, we can set up a simple docker-compose.yml file that will run our app locally:

YAML
# in `docker-compose.yml`

services:
  catscii:
    image: "catscii:latest"
    env_file: ".env"
    ports:
      - "8080:8080"
    stop_signal: SIGINT

And now (after building the image), all we need to do to run it is just:

Shell session
$ docker compose up

By default, it'll stick in the foreground and show the output of the service:

Shell session
$ docker compose up
[+] Running 1/0
 â ¿ Container catscii-catscii-1  Created                                                                                                      0.0s
Attaching to catscii-catscii-1
catscii-catscii-1  | /nix/store/561wgc73s0x1250hrgp7jm22hhv7yfln-bash-5.2-p15/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
catscii-catscii-1  | {"timestamp":"2023-02-23T19:57:35.590683Z","level":"INFO","fields":{"message":"Creating honey client","log.target":"libhoney::client","log.module_path":"libhoney::client","log.file":"/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/2b916d13eeef5c4c978dd7cdbe0195cb3a2cb3e77889acf9f85433ddf74a8e14/libhoney-rust-0.1.6/src/client.rs","log.line":78},"target":"libhoney::client"}
catscii-catscii-1  | {"timestamp":"2023-02-23T19:57:35.590761Z","level":"INFO","fields":{"message":"transmission starting","log.target":"libhoney::transmission","log.module_path":"libhoney::transmission","log.file":"/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/2b916d13eeef5c4c978dd7cdbe0195cb3a2cb3e77889acf9f85433ddf74a8e14/libhoney-rust-0.1.6/src/transmission.rs","log.line":124},"target":"libhoney::transmission"}
catscii-catscii-1  | {"timestamp":"2023-02-23T19:57:35.596155Z","level":"INFO","fields":{"message":"Listening on 0.0.0.0:8080"},"target":"catscii"}

Hitting ctrl-c sends the stop signal, which we configured to be SIGINT (it's SIGSTOP by default), and that initiates a graceful shutdown of our server, so everything works fine!

Apart from not having to remember the correct docker run incantation, Compose files come in particularly handy when.. composing services.

Say we needed an OpenTelemetry Exporter, a Postgres database, or perhaps an S3-compatible object store like minio, we could just list them there!

Compose files also let us define networks, volumes, etc.

Persisting the analytics database

SPEAKING OF VOLUMES. Do you know what we forgot, bear?

?

Your silence speaks volumes.

You did not just make that joke.

We forgot to persist the analytics database! Right now, both in development and in production, our database lives... in the docker image, kind of. Well it lives in the read-write overlay on top of our docker image, or whatever the actual implementation details are.

Point is, every time we stop the container, the database is lost. Same if we re-deploy the service to fly.io.

Now that we have a docker-compose.yml file, it's easy to fix locally!

First, let's remove ANALYTICS_DB from .env, because it feels like it no longer belongs there. Instead, we can set it directly from where we define the volume:

YAML
services:
  catscii:
    image: "catscii:latest"

    # 👇 new!
    volumes:
      - catscii-db:/db
    environment:
      ANALYTICS_DB: /db/analytics.db

    # (old)
    env_file: ".env"
    ports:
      - "8080:8080"
    stop_signal: SIGINT

# 👇 new again!
volumes:
  catscii-db:

And that takes care of our local environment! That doesn't really do geo-ip lookups anyway, since, well, we're only seeing Private IP addresses, so, it's really just for the exercise.

As for our production setup, well if you really don't want to read the fly.io docs, I'll do it for you.

First we create a volume... let's say 1G, I'll provision mine in Paris (CDG airport):

Shell session
$ fly vol create catscii_db --size 1 --region cdg
Update available 0.0.456 -> v0.0.463.
Run "flyctl version update" to upgrade.
        ID: vol_zmjnv8m3dkyrywgx
      Name: catscii_db
       App: old-frost-6294
    Region: cdg
      Zone: 0e8c
   Size GB: 1
 Encrypted: true
Created at: 23 Feb 23 20:20 UTC

Note that I'm running this in the catscii/ directory so:

And then we politely ask fly to mount the volume at /db:

TOML markup
# in `fly.toml`

[mounts]
source = "catscii_db"
destination = "/db"

[env]
# don't forget to update this 👇
ANALYTICS_DB = "/db/analytics.db"

And then we deploy!

Shell session
$ just deploy
(cut)

And I guess we'll find out if it truly does persist the next time we deploy a change.

Using sqlite asynchronously

There's a bunch of good options to use sqlite asynchronously from Rust.

Well... not truly asynchronously. Async I/O in Rust relies on having an async runtime that probably uses something like epoll under the hood to subscribe to events and only issue write / read calls when it knows they won't block - that's what tokio does for us, underpinning the axum web framework we're using.

But sqlite is a big ball of C code, it has its own concept of I/O. In fact it even has an abstraction over it: virtual filesystems (vfs).

So there's not really a way to plug "async Rust I/O" into sqlite.

We can however, do the next best thing: run sqlite operations in a thread pool!

Some higher-level frameworks handle that directly, but I'm happy to stick with something rusqlite-shaped for now, and so let's just reach out for tokio-rusqlite.

This is happening in locat, so:

Shell session
$ cd locat/

$ cargo add tokio-rusqlite
Command 'cargo' not found, but can be installed with:
sudo snap install rustup  # version 1.24.3, or
sudo apt  install cargo   # version 0.60.0ubuntu1-0ubuntu3
See 'snap info rustup' for additional versions.

Oh, right.

Well... I guess we can make a tiny flake.

nix
# in `locat/flake.nix`

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    flake-utils.url = "github:numtide/flake-utils";
    rust-overlay = {
      url = "github:oxalica/rust-overlay";
      inputs = {
        nixpkgs.follows = "nixpkgs";
        flake-utils.follows = "flake-utils";
      };
    };
  };
  outputs = { self, nixpkgs, flake-utils, rust-overlay }:
    flake-utils.lib.eachDefaultSystem
      (system:
        let
          overlays = [ (import rust-overlay) ];
          pkgs = import nixpkgs {
            inherit system overlays;
          };
        in
        with pkgs; {
          devShells.default = mkShell {
            buildInputs = with pkgs; [ rust-bin.nightly.latest.default sqlite ];
          };
        }
      );
}

And a tiny .envrc file:

Bash
#!/bin/bash

# in `locat/.envrc`

if ! has nix_direnv_version || ! nix_direnv_version 2.2.1; then
  source_url "https://raw.githubusercontent.com/nix-community/nix-direnv/2.2.1/direnvrc" "sha256-zelF0vLbEl5uaqrfIzbgNzJWGmLzCmYAkInj/LNxvKs="
fi

use flake

It's all still a bit verbose, but at least here we don't have a rust-toolchain.toml file to worry about.

Now!

Shell session
$ direnv allow
(cut)

$ cargo add tokio-rusqlite
    Updating crates.io index
      Adding tokio-rusqlite v0.3.0 to dependencies.

That's more like it.

Let's make sure we don't have conflicts as to the rusqlite dependency:

Note that cargo add tokio-rusqlite will currently conflict with rusqlite 0.28. Upgrading rusqlite to version 0.29 resolves the conflict.

Shell session
$ cargo tree --invert rusqlite
rusqlite v0.29.0
├── locat v0.4.0 (/home/amos/locat)
└── tokio-rusqlite v0.3.0
    └── locat v0.4.0 (/home/amos/locat)

Nope, looks good!

Let's also add tokio explicitly, we'll need it to remove some other blocking operations:

Shell session
$ cargo add tokio --features fs,test-util
(cut)

Alright, pay attention to the comments now:

Rust code
// in `locat/src/lib.rs`

use std::net::IpAddr;

// We're using tokio-rusqlite's own Connection type now
use tokio_rusqlite::Connection;

/// Allows geo-locating IPs and keeps analytics
pub struct Locat {
    reader: maxminddb::Reader<Vec<u8>>,
    analytics: Db,
}

#[derive(Debug, thiserror::Error)]
pub enum Error {
    #[error("maxminddb error: {0}")]
    MaxMindDb(#[from] maxminddb::MaxMindDBError),

    // this can happen while reading the geoip db from disk
    #[error("io error: {0}")]
    Io(#[from] std::io::Error),

    #[error("rusqlite error: {0}")]
    Rusqlite(#[from] rusqlite::Error),
}

impl Locat {
    // this is async now
    pub async fn new(geoip_country_db_path: &str, analytics_db_path: &str) -> Result<Self, Error> {
        // read the geoip database into memory asynchronously. previously
        // we used a synchronous method off of `maxminddb::Reader` directly.
        let geoip_data = tokio::fs::read(geoip_country_db_path).await?;

        Ok(Self {
            // this is all in-memory, no I/O involved here
            reader: maxminddb::Reader::from_source(geoip_data)?,
            analytics: Db::open(analytics_db_path).await?,
        })
    }

    /// Converts an address to an ISO 3166-1 alpha-2 country code
    pub async fn ip_to_iso_code(&self, addr: IpAddr) -> Option<&str> {
        let iso_code = self
            .reader
            .lookup::<maxminddb::geoip2::Country>(addr)
            .ok()?
            .country?
            .iso_code?;

        if let Err(e) = self.analytics.increment(iso_code).await {
            eprintln!("Could not increment analytics: {e}");
        }

        Some(iso_code)
    }

    /// Returns a map of country codes to number of requests
    pub async fn get_analytics(&self) -> Result<Vec<(String, u64)>, Error> {
        Ok(self.analytics.list().await?)
    }
}

struct Db {
    // this used to store a path, now it stores a connection
    conn: Connection,
}

impl Db {
    async fn open(path: &str) -> Result<Self, rusqlite::Error> {
        // open and migrate the database in a non-blocking way
        let conn = Connection::open(path).await?;
        // this is how operations are run on a thread pool: we pass a
        // closure. note that it must be `'static`, so we can't borrow
        // anything from the outside: owned types only.
        conn.call(|conn| {
            // create analytics table
            conn.execute(
                "CREATE TABLE IF NOT EXISTS analytics (
                iso_code TEXT PRIMARY KEY,
                count INTEGER NOT NULL
            )",
                [],
            )?;

            Ok::<_, rusqlite::Error>(())
        })
        .await?;

        Ok(Self { conn })
    }

    async fn list(&self) -> Result<Vec<(String, u64)>, rusqlite::Error> {
        self.conn
            .call(|conn| {
                let mut stmt = conn.prepare("SELECT iso_code, count FROM analytics")?;
                let mut rows = stmt.query([])?;
                let mut analytics = Vec::new();
                while let Some(row) = rows.next()? {
                    let iso_code: String = row.get(0)?;
                    let count: u64 = row.get(1)?;
                    analytics.push((iso_code, count));
                }
                Ok(analytics)
            })
            .await
    }

    async fn increment(&self, iso_code: &str) -> Result<(), rusqlite::Error> {
        // we have to use `iso_code` from within the closure and the closure
        // must be 'static, so:
        let iso_code = iso_code.to_owned();

        self.conn.call(|conn| {
            let mut stmt = conn
                .prepare("INSERT INTO analytics (iso_code, count) VALUES (?, 1) ON CONFLICT (iso_code) DO UPDATE SET count = count + 1")
                ?;
            stmt.execute([iso_code])?;
            Ok(())
        }).await
    }
}

#[cfg(test)]
mod tests {
    use crate::Db;

    struct RemoveOnDrop {
        path: &'static str,
    }

    impl Drop for RemoveOnDrop {
        fn drop(&mut self) {
            _ = std::fs::remove_file(self.path);
        }
    }

    // this test needs an async runtime now, hence, `tokio::test`
    #[tokio::test]
    async fn test_db() {
        let path = "/tmp/loca-test.db";
        let db = Db::open(path).await.unwrap();

        let _remove_on_drop = RemoveOnDrop { path };

        let analytics = db.list().await.unwrap();
        assert_eq!(analytics.len(), 0);

        db.increment("US").await.unwrap();
        let analytics = db.list().await.unwrap();
        assert_eq!(analytics.len(), 1);

        db.increment("US").await.unwrap();
        db.increment("FR").await.unwrap();
        let analytics = db.list().await.unwrap();
        assert_eq!(analytics.len(), 2);
        // contains US at count 2
        assert!(analytics.contains(&("US".to_string(), 2)));
        // contains FR at count 1
        assert!(analytics.contains(&("FR".to_string(), 1)));
        // doesn't contain DE
        assert!(!analytics.contains(&("DE".to_string(), 0)));
    }
}

And you know the rest! Bump locat/Cargo.toml to 1.0.0 (since we broke the API, this is a major release), commit, push, cargo publish, and update the dependency in catscii.

The changes required in catscii fit on one line, and are left as an exercise to the reader.

Deploying again shows that the analytics database does, in fact, persist across deploys now!