I sold you on some additional functionality for catscii last chapter, and we got caught up in private registry / docker shenanigans, so, now, let's resume web development as promised.

Adding geolocation

We kinda left the locat crate stubby, it doesn't actually do any IP to location lookups. It doesn't even have a dependency on a crate that can do that.

So, let's do that. We'll use the maxminddb crate, which can read the MaxMind DB format, like GeoIP2 and GeoLite2:

Shell session
$ cd locat/
$ cargo add maxminddb@0.23
    Updating crates.io index
      Adding maxminddb v0.23 to dependencies.
             Features as of v0.23.0:
             - memmap2
             - mmap
             - unsafe-str-decode

Because we're going to do things that can fail, we'll also want an error crate like thiserror:

Shell session
$ cargo add thiserror@1
    Updating crates.io index
      Adding thiserror v1.0.38 to dependencies.

And now locat's src/lib.rs becomes:

Rust code
use std::net::IpAddr;

/// Allows geo-locating IPs and keeps analytics
pub struct Locat {
    geoip: maxminddb::Reader<Vec<u8>>,
}

#[derive(Debug, thiserror::Error)]
pub enum Error {
    #[error("maxminddb error: {0}")]
    MaxMindDb(#[from] maxminddb::MaxMindDBError),
}

impl Locat {
    pub fn new(geoip_country_db_path: &str, _analytics_db_path: &str) -> Result<Self, Error> {
        // TODO: create analytics db

        Ok(Self {
            geoip: maxminddb::Reader::open_readfile(geoip_country_db_path)?,
        })
    }

    /// Converts an address to an ISO 3166-1 alpha-2 country code
    pub async fn ip_to_iso_code(&self, addr: IpAddr) -> Option<&str> {
        self.geoip
            .lookup::<maxminddb::geoip2::Country>(addr)
            .ok()?
            .country?
            .iso_code
    }

    /// Returns a map of country codes to number of requests
    pub async fn get_analytics(&self) -> Vec<(String, u64)> {
        Default::default()
    }
}

We can then bump it to 0.3.0 in Cargo.toml:

TOML markup
[package]
name = "locat"
# 👇 bumped!
version = "0.3.0"
edition = "2021"
publish = ["catscii"]

[dependencies]
maxminddb = "0.23"
thiserror = "1.0.38"

And publish it:

Shell session
$ cargo publish
(cut)
    Finished dev [unoptimized + debuginfo] target(s) in 13.69s
   Uploading locat v0.3.0 (/home/amos/locat)
    Updating `catscii` index

Now, we can bump the dependency in catscii/Cargo.toml from 0.2.0 to 0.3.0, do cargo run again, and...

Shell session
$ cargo run
    Blocking waiting for file lock on package cache
    Updating `catscii` index
    Updating crates.io index
error: failed to select a version for `thiserror`.
    ... required by package `opentelemetry-honeycomb v0.1.0 (https://github.com/fasterthanlime/opentelemetry-honeycomb-rs?branch=simplified#2a197b9b)`
    ... which satisfies git dependency `opentelemetry-honeycomb` (locked to 0.1.0) of package `catscii v0.1.0 (/home/amos/catscii)`
versions that meet the requirements `^1.0` (locked to 1.0.37) are: 1.0.37

all possible versions conflict with previously selected packages.

  previously selected package `thiserror v1.0.38`
    ... which satisfies dependency `thiserror = "^1.0.38"` of package `locat v0.3.0 (registry `catscii`)`
    ... which satisfies dependency `locat = "^0.3.0"` of package `catscii v0.1.0 (/home/amos/catscii)`

failed to select a version for `thiserror` which could resolve this conflict

Ah.

The Cargo.toml for opentelemetry-honeycomb-rs only specifies "1.0", which should cover 1.0.38, which means... the problem is the Cargo.lock in catscii, which we can fix directly with:

Shell session
$ cargo update --package thiserror
    Updating crates.io index
    Updating `catscii` index
      Adding ipnetwork v0.18.0
    Updating locat v0.2.0 (registry `catscii`) -> v0.3.0
      Adding maxminddb v0.23.0
    Updating thiserror v1.0.37 -> v1.0.38
    Updating thiserror-impl v1.0.37 -> v1.0.38

Anyway I was saying: cargo run:

Shell session
$ cargo run
(cut)
error[E0308]: mismatched types
   --> src/main.rs:56:25
    |
56  |         locat: Arc::new(Locat::new("todo_geoip_path.mmdb", "todo_analytics.db")),
    |                -------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Locat`, found enum `Result`
    |                |
    |                arguments to this function are incorrect
    |
    = note: expected struct `Locat`
                 found enum `Result<Locat, locat::Error>`
note: associated function defined here
   --> /home/amos/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:361:12
    |
361 |     pub fn new(data: T) -> Arc<T> {
    |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `catscii` due to previous error

Ah right, we changed the API - for now let's add an .unwrap() after Locat::new() and try again:

Rust code
    // in `catscii/src/main.rs`, in async fn main()
    let state = ServerState {
        client: Default::default(),
        locat: Arc::new(Locat::new("todo_geoip_path.mmdb", "todo_analytics.db").unwrap()),
    };
Shell session
$ cargo run
   Compiling catscii v0.1.0 (/home/amos/catscii)
    Finished dev [unoptimized + debuginfo] target(s) in 7.68s
     Running `target/debug/catscii`
{"timestamp":"2023-02-14T16:22:11.271524Z","level":"INFO","fields":{"message":"Creating honey client","log.target":"libhoney::client","log.module_path":"libhoney::client","log.file":"/home/amos/.cargo/git/checkouts/libhoney-rust-3b5a30a6076b74c0/9871051/src/client.rs","log.line":78},"target":"libhoney::client"}
{"timestamp":"2023-02-14T16:22:11.271660Z","level":"INFO","fields":{"message":"transmission starting","log.target":"libhoney::transmission","log.module_path":"libhoney::transmission","log.file":"/home/amos/.cargo/git/checkouts/libhoney-rust-3b5a30a6076b74c0/9871051/src/transmission.rs","log.line":124},"target":"libhoney::transmission"}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: MaxMindDb(IoError("No such file or directory (os error 2)"))', src/main.rs:56:81
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Ah, yes, of course. We need to actually have a GeoIP database to read.

Note that because we called .unwrap(), if you set up Sentry like I did, you should've gotten an e-mail about "oh no the application is panicking" (but in development), which is nice.

So, on MaxMind's website we can sign up for GeoLite2 and download them. They used to be accessible under a more liberal license, but, I guess they realized what they had and decided to monetize it.

They're not the only game in town, if you look at ipgeolocation, they sell their "IP to Country" database for a cool $1K per year per server! Don't scale out I guess.

So yeah, ok MaxMind, you can have my e-mail address.

Cool bear's hot tip

As an alternative, ipinfo.io offers an IP to Country DB under a CC-BY-4.0 license, and they come in MMDB format, too!

We'll want the "GeoLite2 Country" database in .mmdb format. It comes .tar.gzipped, but once unpacked, we get a

$ cd catscii/
$ mkdir db
$ cd db/
$ # (download the .tar.gz file here)
$ tar --preserve-permissions --extract --file GeoLite2-Country_20230214.tar.gz 
$ ls GeoLite2-Country_20230214/
COPYRIGHT.txt          GeoLite2-Country.mmdb  LICENSE.txt            
$ mv GeoLite2-Country_20230214/GeoLite2-Country.mmdb .
$ rm GeoLite2-Country_* -rf
$ ls --long --header --almost-all
total 5.5M
-rw-r--r-- 1 amos amos 5.5M Feb 14 00:07 GeoLite2-Country.mmdb

So, yay, we have it! It's reasonably small, too.

We might not want to commit that to Git, since it's a big binary blob, so we can add it to our .gitignore:

# in .gitignore
/target
# 👇 new!
/db

Now, we can keep with our strategy of configuring our service through environment variables, and set it in .envrc:

# in .envrc
# (omitted: SENTRY_DSN, HONEYCOMB_API_KEY, CARGO_REGISTRIES_CATSCII_TOKEN, etc.)
export GEOLITE2_COUNTRY_DB="db/GeoLite2-Country.mmdb"

Don't forget to run direnv allow afterwards, to apply the changes.

And then, we read that environment variable at startup:

Rust code
    // in `catscii/src/main.rs`, in `async fn main()`
    let country_db_env_var = "GEOLITE2_COUNTRY_DB";
    let country_db_path = std::env::var(country_db_env_var)
        .unwrap_or_else(|_| panic!("${country_db_env_var} must be set"));
    let state = ServerState {
        client: Default::default(),
        locat: Arc::new(Locat::new(&country_db_path, "todo_analytics.db").unwrap()),
    };

And finally, cargo run runs.

But this isn't very interesting! The only IP address the development application ever sees is 127.0.0.1, or ::1.

So let's try and deploy this:

Shell session
$ just de

Wait, wait, no!

Mh?

Just trying to save some trial-and-error here. Is the database file baked into the Docker image?

No, that's what I was trying to show here, and how it's going to be different when we move to nix.

Well let's just copy it in there, shall we?

Okay, sure. So in the Dockerfile, we can add:

Dockerfile
# Copy Geolite2 database
RUN mkdir /db 
COPY ./db/GeoLite2-Country.mmdb /db/

...right before the final CMD. Then we can set this in fly.toml:

TOML markup
# that section existed, but it was empty
[env]
GEOLITE2_COUNTRY_DB = "/db/GeoLite2-Country.mmdb"

# omitted: everything else

Then we can run just deploy, visit our app, and in the logs, we see:

2023-02-14T17:10:04Z app[ff4c6095] cdg [info]{"timestamp":"2023-02-14T17:10:04.736583Z","level":"INFO","fields":{"message":"Got request from FR"},"target":"catscii"}

Hurray! (You gotta scroll right a little, it says "Got request from FR"). I am indeed in France, the system works.

Adding analytics

Now, it would be fun if we kept track of how many visits we get per country.

This is pretty vague, so I think we're not going to run afoul of any privacy laws, but it's still entertaining to look at: if you share the link to your app to your online colleagues or friends, you might be surprised how far it travels!

We'll do that with an sqlite database, which is complete overkill, but this is all, still preparation for when we'll need to ship all of that with nix.

This is also functionality locat will have, so, let's add more dependencies to locat:

Shell session
$ cargo add rusqlite@0.28
(cut)

So, let's take a first stab at this:

Rust code
// in `locat/src/lib.rs`

use std::net::IpAddr;

/// Allows geo-locating IPs and keeps analytics
pub struct Locat {
    reader: maxminddb::Reader<Vec<u8>>,
    analytics: Db,
}

#[derive(Debug, thiserror::Error)]
pub enum Error {
    #[error("maxminddb error: {0}")]
    MaxMindDb(#[from] maxminddb::MaxMindDBError),

    #[error("rusqlite error: {0}")]
    Rusqlite(#[from] rusqlite::Error),
}

impl Locat {
    pub fn new(geoip_country_db_path: &str, analytics_db_path: &str) -> Result<Self, Error> {
        Ok(Self {
            reader: maxminddb::Reader::open_readfile(geoip_country_db_path)?,
            analytics: Db {
                path: analytics_db_path.to_string(),
            },
        })
    }

    /// Converts an address to an ISO 3166-1 alpha-2 country code
    pub async fn ip_to_iso_code(&self, addr: IpAddr) -> Option<&str> {
        let iso_code = self
            .reader
            .lookup::<maxminddb::geoip2::Country>(addr)
            .ok()?
            .country?
            .iso_code?;

        if let Err(e) = self.analytics.increment(iso_code) {
            eprintln!("Could not increment analytics: {e}");
        }

        Some(iso_code)
    }

    /// Returns a map of country codes to number of requests
    pub async fn get_analytics(&self) -> Result<Vec<(String, u64)>, Error> {
        Ok(self.analytics.list()?)
    }
}

struct Db {
    path: String,
}

impl Db {
    fn list(&self) -> Result<Vec<(String, u64)>, rusqlite::Error> {
        let conn = self.get_conn()?;
        let mut stmt = conn.prepare("SELECT iso_code, count FROM analytics")?;
        let mut rows = stmt.query([])?;
        let mut analytics = Vec::new();
        while let Some(row) = rows.next()? {
            let iso_code: String = row.get(0)?;
            let count: u64 = row.get(1)?;
            analytics.push((iso_code, count));
        }
        Ok(analytics)
    }

    fn increment(&self, iso_code: &str) -> Result<(), rusqlite::Error> {
        let conn = self.get_conn()?;
        let mut stmt = conn
            .prepare("INSERT INTO analytics (iso_code, count) VALUES (?, 1) ON CONFLICT (iso_code) DO UPDATE SET count = count + 1")
            ?;
        stmt.execute([iso_code])?;
        Ok(())
    }

    fn get_conn(&self) -> Result<rusqlite::Connection, rusqlite::Error> {
        let conn = rusqlite::Connection::open(&self.path)?;
        self.migrate(&conn)?;
        Ok(conn)
    }

    fn migrate(&self, conn: &rusqlite::Connection) -> Result<(), rusqlite::Error> {
        // create analytics table
        conn.execute(
            "CREATE TABLE IF NOT EXISTS analytics (
                iso_code TEXT PRIMARY KEY,
                count INTEGER NOT NULL
            )",
            [],
        )?;
        Ok(())
    }
}

#[cfg(test)]
mod tests {
    use crate::Db;

    struct RemoveOnDrop {
        path: String,
    }

    impl Drop for RemoveOnDrop {
        fn drop(&mut self) {
            _ = std::fs::remove_file(&self.path);
        }
    }

    #[test]
    fn test_db() {
        let db = Db {
            path: "/tmp/locat-test.db".to_string(),
        };
        let _remove_on_drop = RemoveOnDrop {
            path: db.path.clone(),
        };

        let analytics = db.list().unwrap();
        assert_eq!(analytics.len(), 0);

        db.increment("US").unwrap();
        let analytics = db.list().unwrap();
        assert_eq!(analytics.len(), 1);

        db.increment("US").unwrap();
        db.increment("FR").unwrap();
        let analytics = db.list().unwrap();
        assert_eq!(analytics.len(), 2);
        // contains US at count 2
        assert!(analytics.contains(&("US".to_string(), 2)));
        // contains FR at count 1
        assert!(analytics.contains(&("FR".to_string(), 1)));
        // doesn't contain DE
        assert!(!analytics.contains(&("DE".to_string(), 0)));
    }
}

This code is, of course, awful. That's because it was mostly generated through GitHub Copilot. We'll have plenty of time to roast review and improve it later.

Off the top of my head we have:

Don't think of it as terribly awful code, think of it as a lot of room for improvement later — let's focus on shipping this to hit our KPIs and OKRs and other three-letter acronyms.

I've written some tests, so, we should run them:

Shell session
$ cargo test
   Compiling locat v0.3.0 (/home/amos/locat)
error: linking with `cc` failed: exit status: 1
  |
  = note: "cc" "-m64" "/tmp/rustcwAryeN/symbols.o" "/home/amos/locat/target/debug/deps/locat-925e8cba729664ee.13qpde65w7t2xobj.rcgu.o" (cut) "-Wl,--gc-sections" "-pie" "-Wl,-zrelro,-znow" "-nodefaultlibs"
  = note: /usr/bin/ld: cannot find -lsqlite3: No such file or directory
          collect2: error: ld returned 1 exit status
          

error: could not compile `locat` due to previous error
warning: build failed, waiting for other jobs to finish...

Ah, we don't have sqlite installed! It's a C library, so we must get it from somewhere. There's a bundled cargo feature for the rusqlite crate, which would make everything work easily, but I've specifically chosen sqlite to show off native dependencies, so let's keep linking with it dynamically and, in development, just install it on our Ubuntu VM:

Shell session
$ apt-cache search '^libsqlite'
libsqlite3-0 - SQLite 3 shared library
libsqlite3-dev - SQLite 3 development files
libsqlite3-mod-ceph - SQLite3 VFS for Ceph
libsqlite3-mod-ceph-dev - SQLite3 VFS for Ceph (development files)
libsqlite-tcl - SQLite 2 Tcl bindings
libsqlite0 - SQLite 2 shared library
libsqlite0-dev - SQLite 2 development files
libsqlite3-gst - SQLite bindings for GNU Smalltalk
libsqlite3-mod-blobtoxy - SQLite3 extension module for read-only BLOB to X/Y mapping
libsqlite3-mod-csvtable - SQLite3 extension module for read-only access to CSV files
libsqlite3-mod-impexp - SQLite3 extension module for SQL script, XML, JSON and CSV import/export
libsqlite3-mod-rasterlite2 - SQLite 3 module for huge raster coverages
libsqlite3-mod-spatialite - Geospatial extension for SQLite - loadable module
libsqlite3-mod-virtualpg - Loadable dynamic extension to both SQLite and SpatiaLite
libsqlite3-mod-xpath - SQLite3 extension module for querying XML data with XPath
libsqlite3-mod-zipfile - SQLite3 extension module for read-only access to ZIP files
libsqlite3-ocaml - Embeddable SQL Database for OCaml Programs (runtime)
libsqlite3-ocaml-dev - Embeddable SQL Database for OCaml Programs (development)
libsqlite3-tcl - SQLite 3 Tcl bindings
libsqliteodbc - ODBC driver for SQLite embedded database

Ah. I can never quite remember the name of Ubuntu packages, but luckily, apt-cache search accepts regular expressions! Isn't that nice.

So:

Shell session
$ sudo apt install libsqlite3-dev
[sudo] password for amos: 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  sqlite3-doc
The following NEW packages will be installed:
  libsqlite3-dev
0 upgraded, 1 newly installed, 0 to remove and 46 not upgraded.
Need to get 846 kB of archives.
(cut)

And then:

Shell session
$ cargo test
   Compiling locat v0.3.0 (/home/amos/locat)
    Finished test [unoptimized + debuginfo] target(s) in 0.35s
     Running unittests src/lib.rs (target/debug/deps/locat-925e8cba729664ee)

running 1 test
test tests::test_db ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

   Doc-tests locat

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Looks good! I'm always suspicious of tests passing the first time, so I messed with the test code to make sure it could fail (and was being run at all), and yep, everything seems fine.

Well then! Time to publish a new version, you know what to do: bump package.version in locat's Cargo.toml, run cargo publish, then bump the dependency in catscii's Cargo.toml, and.. let's also add environment variables for the analytics DB everywhere.

In the code:

Rust code
    // in `catscii/src/main.rs`
    let country_db_env_var = "GEOLITE2_COUNTRY_DB";
    let country_db_path = std::env::var(country_db_env_var)
        .unwrap_or_else(|_| panic!("${country_db_env_var} must be set"));

    let analytics_db_env_var = "ANALYTICS_DB";
    let analytics_db_path = std::env::var(analytics_db_env_var)
        .unwrap_or_else(|_| panic!("${analytics_db_env_var} must be set"));

    let state = ServerState {
        client: Default::default(),
        locat: Arc::new(Locat::new(&country_db_path, &analytics_db_path).unwrap()),
    };

In .envrc:

export ANALYTICS_DB="db/analytics.db"
Shell session
$ direnv allow

In fly.toml:

TOML markup
[env]
GEOLITE2_COUNTRY_DB = "/db/GeoLite2-Country.mmdb"
ANALYTICS_DB = "analytics.db"

And let's add an endpoint that shows analytics:

Rust code
    // in `async fn main()`
    let app = Router::new()
        .route("/", get(root_get))
        .route("/analytics", get(analytics_get))
        .route("/panic", get(|| async { panic!("This is a test panic") }))
        .with_state(state);

// later down:
async fn analytics_get(State(state): State<ServerState>) -> Response<BoxBody> {
    let analytics = state.locat.get_analytics().await.unwrap();
    let mut response = String::new();
    use std::fmt::Write;
    for (country, count) in analytics {
        _ = writeln!(&mut response, "{country}: {count}");
    }
    response.into_response()
}

Alright! I think we're ready to deploy:

Shell session
$ just deploy
(cut)
#20 12.41    Compiling catscii v0.1.0 (/app)
#20 62.70 error: linking with `cc` failed: exit status: 1
#20 62.70   |
#20 62.70   = note: "cc" "-m64" "/tmp/rustcqwxz9u/symbols.o" "/app/target/release/deps/catscii-eff43af45afb3155.catscii.39b5b9b0-cgu.1.rcgu.o" "-Wl,--as-needed" "-L" "/app/target/release/deps" "-L" "/app/target/release/build/ring-ce3ece41d6d6a103/out" "-L" "/root/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/tmp/rustcqwxz9u/libring-9da25afb38225173.rlib" "/root/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-66b9c3ae5ff29c13.rlib" "-Wl,-Bdynamic" "-lssl" "-lcrypto" "-lsqlite3" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-L" "/root/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/app/target/release/deps/catscii-eff43af45afb3155" "-Wl,--gc-sections" "-pie" "-Wl,-zrelro,-znow" "-nodefaultlibs"
#20 62.70   = note: /usr/bin/ld: cannot find -lsqlite3
#20 62.70           collect2: error: ld returned 1 exit status

Oh whoops, no, we also have to install libsqlite3-dev inside the Docker container. Or rather... we can install libsqlite3-dev for compile-time, and just libsqlite3-0 for run-time:

Dockerfile
# (cut)

# Install compile-time dependencies
RUN set -eux; \
		apt update; \
		apt install --yes --no-install-recommends \
			openssh-client git-core curl ca-certificates gcc libc6-dev pkg-config libssl-dev \
			libsqlite3-dev \
			;

# (cut)

# Install run-time dependencies, remove extra APT files afterwards.
# This must be done in the same `RUN` command, otherwise it doesn't help
# to reduce the image size.
RUN set -eux; \
		apt update; \
		apt install --yes --no-install-recommends \
			ca-certificates \
			libsqlite3-0 \
			; \
		apt clean autoclean; \
		apt autoremove --yes; \
		# Note: 👇 this only works because of the `SHELL` instruction above.
		rm -rf /var/lib/{apt,dpkg,cache,log}/

# (cut)

I was going to do another few back-and-forths about this to really emphasize the pain of using Dockerfiles for this, but essentially: we need to install compile-time & run-time dependencies separately to keep our image slim. Both packages have uncomfortable naming conventions on Ubuntu (why the -0 in libsqlite3-0?). A missing compile-time dependency fails at docker build time, which is nice, but a missing run-time dependency means a crash in production, which isn't nice.

What we should be doing here is use something like docker-compose to run the image locally before we deploy it. We could even have a staging app we deploy to first, so it's as close to production as possible - fly.io makes it easy-ish to do that.

And, another just deploy later, our service is up and running. It shows me an ASCII art cat, and https://old-frost-6294.fly.dev/analytics currently shows me:

FR: 1

Success! Don't feel bad about the codebase: companies make more money than your lifetime earnings every month with much worse code.

I'm curious to play with IP geo-location some more though, so, mhh, I don't want to leak the address before the series is published, but I have an idea.

KeyCDN's performance test hits your website from 10 different locations around the world. So let me try that, and... the analytics now look like this:

FR: 1
DE: 1
GB: 1
NL: 1
US: 3
IN: 1
JP: 1
AU: 1
SG: 1

Nice!

All that's left really, is to deploy this with nix instead. I know it's taken a while to build up to this point, but now we have a service that looks somewhat like a real-world thing, and we have all the classic deployment problems to solve.