Doing geo-location and keeping analytics
👋 This page was last updated ~2 years ago. Just so you know.
I sold you on some additional functionality for catscii
last chapter, and we
got caught up in private registry / docker shenanigans, so, now, let's resume
web development as promised.
Adding geolocation
We kinda left the locat
crate stubby, it doesn't actually do any IP to
location lookups. It doesn't even have a dependency on a crate that can do
that.
So, let's do that. We'll use the maxminddb crate, which can read the MaxMind DB format, like GeoIP2 and GeoLite2:
$ cd locat/ $ cargo add maxminddb@0.23 Updating crates.io index Adding maxminddb v0.23 to dependencies. Features as of v0.23.0: - memmap2 - mmap - unsafe-str-decode
Because we're going to do things that can fail, we'll also want an error crate like thiserror:
$ cargo add thiserror@1 Updating crates.io index Adding thiserror v1.0.38 to dependencies.
And now locat's src/lib.rs
becomes:
use std::net::IpAddr; /// Allows geo-locating IPs and keeps analytics pub struct Locat { geoip: maxminddb::Reader<Vec<u8>>, } #[derive(Debug, thiserror::Error)] pub enum Error { #[error("maxminddb error: {0}")] MaxMindDb(#[from] maxminddb::MaxMindDBError), } impl Locat { pub fn new(geoip_country_db_path: &str, _analytics_db_path: &str) -> Result<Self, Error> { // TODO: create analytics db Ok(Self { geoip: maxminddb::Reader::open_readfile(geoip_country_db_path)?, }) } /// Converts an address to an ISO 3166-1 alpha-2 country code pub async fn ip_to_iso_code(&self, addr: IpAddr) -> Option<&str> { self.geoip .lookup::<maxminddb::geoip2::Country>(addr) .ok()? .country? .iso_code } /// Returns a map of country codes to number of requests pub async fn get_analytics(&self) -> Vec<(String, u64)> { Default::default() } }
We can then bump it to 0.3.0 in Cargo.toml
:
[package] name = "locat" # 👇 bumped! version = "0.3.0" edition = "2021" publish = ["catscii"] [dependencies] maxminddb = "0.23" thiserror = "1.0.38"
And publish it:
$ cargo publish (cut) Finished dev [unoptimized + debuginfo] target(s) in 13.69s Uploading locat v0.3.0 (/home/amos/locat) Updating `catscii` index
Now, we can bump the dependency in catscii/Cargo.toml
from 0.2.0
to 0.3.0
,
do cargo run
again, and...
$ cargo run Blocking waiting for file lock on package cache Updating `catscii` index Updating crates.io index error: failed to select a version for `thiserror`. ... required by package `opentelemetry-honeycomb v0.1.0 (https://github.com/fasterthanlime/opentelemetry-honeycomb-rs?branch=simplified#2a197b9b)` ... which satisfies git dependency `opentelemetry-honeycomb` (locked to 0.1.0) of package `catscii v0.1.0 (/home/amos/catscii)` versions that meet the requirements `^1.0` (locked to 1.0.37) are: 1.0.37 all possible versions conflict with previously selected packages. previously selected package `thiserror v1.0.38` ... which satisfies dependency `thiserror = "^1.0.38"` of package `locat v0.3.0 (registry `catscii`)` ... which satisfies dependency `locat = "^0.3.0"` of package `catscii v0.1.0 (/home/amos/catscii)` failed to select a version for `thiserror` which could resolve this conflict
Ah.
The Cargo.toml
for opentelemetry-honeycomb-rs
only specifies "1.0", which
should cover 1.0.38, which means... the problem is the Cargo.lock
in
catscii
, which we can fix directly with:
$ cargo update --package thiserror Updating crates.io index Updating `catscii` index Adding ipnetwork v0.18.0 Updating locat v0.2.0 (registry `catscii`) -> v0.3.0 Adding maxminddb v0.23.0 Updating thiserror v1.0.37 -> v1.0.38 Updating thiserror-impl v1.0.37 -> v1.0.38
Anyway I was saying: cargo run
:
$ cargo run (cut) error[E0308]: mismatched types --> src/main.rs:56:25 | 56 | locat: Arc::new(Locat::new("todo_geoip_path.mmdb", "todo_analytics.db")), | -------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Locat`, found enum `Result` | | | arguments to this function are incorrect | = note: expected struct `Locat` found enum `Result<Locat, locat::Error>` note: associated function defined here --> /home/amos/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:361:12 | 361 | pub fn new(data: T) -> Arc<T> { | ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `catscii` due to previous error
Ah right, we changed the API - for now let's add an .unwrap()
after
Locat::new()
and try again:
// in `catscii/src/main.rs`, in async fn main() let state = ServerState { client: Default::default(), locat: Arc::new(Locat::new("todo_geoip_path.mmdb", "todo_analytics.db").unwrap()), };
$ cargo run Compiling catscii v0.1.0 (/home/amos/catscii) Finished dev [unoptimized + debuginfo] target(s) in 7.68s Running `target/debug/catscii` {"timestamp":"2023-02-14T16:22:11.271524Z","level":"INFO","fields":{"message":"Creating honey client","log.target":"libhoney::client","log.module_path":"libhoney::client","log.file":"/home/amos/.cargo/git/checkouts/libhoney-rust-3b5a30a6076b74c0/9871051/src/client.rs","log.line":78},"target":"libhoney::client"} {"timestamp":"2023-02-14T16:22:11.271660Z","level":"INFO","fields":{"message":"transmission starting","log.target":"libhoney::transmission","log.module_path":"libhoney::transmission","log.file":"/home/amos/.cargo/git/checkouts/libhoney-rust-3b5a30a6076b74c0/9871051/src/transmission.rs","log.line":124},"target":"libhoney::transmission"} thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: MaxMindDb(IoError("No such file or directory (os error 2)"))', src/main.rs:56:81 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Ah, yes, of course. We need to actually have a GeoIP database to read.
Note that because we called .unwrap()
, if you set up Sentry like I did, you
should've gotten an e-mail about "oh no the application is panicking" (but in
development), which is nice.
So, on MaxMind's website we can sign up for GeoLite2 and download them. They used to be accessible under a more liberal license, but, I guess they realized what they had and decided to monetize it.
They're not the only game in town, if you look at ipgeolocation, they sell their "IP to Country" database for a cool $1K per year per server! Don't scale out I guess.
So yeah, ok MaxMind, you can have my e-mail address.
As an alternative, ipinfo.io offers an IP to Country DB under a CC-BY-4.0 license, and they come in MMDB format, too!
We'll want the "GeoLite2 Country" database in .mmdb format. It comes .tar.gzipped, but once unpacked, we get a
$ cd catscii/ $ mkdir db $ cd db/ $ # (download the .tar.gz file here) $ tar --preserve-permissions --extract --file GeoLite2-Country_20230214.tar.gz $ ls GeoLite2-Country_20230214/ COPYRIGHT.txt GeoLite2-Country.mmdb LICENSE.txt $ mv GeoLite2-Country_20230214/GeoLite2-Country.mmdb . $ rm GeoLite2-Country_* -rf $ ls --long --header --almost-all total 5.5M -rw-r--r-- 1 amos amos 5.5M Feb 14 00:07 GeoLite2-Country.mmdb
So, yay, we have it! It's reasonably small, too.
We might not want to commit that to Git, since it's a big binary blob, so
we can add it to our .gitignore
:
# in .gitignore /target # 👇 new! /db
Now, we can keep with our strategy of configuring our service through
environment variables, and set it in .envrc
:
# in .envrc # (omitted: SENTRY_DSN, HONEYCOMB_API_KEY, CARGO_REGISTRIES_CATSCII_TOKEN, etc.) export GEOLITE2_COUNTRY_DB="db/GeoLite2-Country.mmdb"
Don't forget to run direnv allow
afterwards, to apply the changes.
And then, we read that environment variable at startup:
// in `catscii/src/main.rs`, in `async fn main()` let country_db_env_var = "GEOLITE2_COUNTRY_DB"; let country_db_path = std::env::var(country_db_env_var) .unwrap_or_else(|_| panic!("${country_db_env_var} must be set")); let state = ServerState { client: Default::default(), locat: Arc::new(Locat::new(&country_db_path, "todo_analytics.db").unwrap()), };
And finally, cargo run
runs.
But this isn't very interesting! The only IP address the development application
ever sees is 127.0.0.1
, or ::1
.
So let's try and deploy this:
$ just de
Wait, wait, no!
Mh?
Just trying to save some trial-and-error here. Is the database file baked into the Docker image?
No, that's what I was trying to show here, and how it's going to be different when we move to nix.
Well let's just copy it in there, shall we?
Okay, sure. So in the Dockerfile
, we can add:
# Copy Geolite2 database RUN mkdir /db COPY ./db/GeoLite2-Country.mmdb /db/
...right before the final CMD
. Then we can set this in fly.toml
:
# that section existed, but it was empty [env] GEOLITE2_COUNTRY_DB = "/db/GeoLite2-Country.mmdb" # omitted: everything else
Then we can run just deploy
, visit our app, and in the logs, we see:
2023-02-14T17:10:04Z app[ff4c6095] cdg [info]{"timestamp":"2023-02-14T17:10:04.736583Z","level":"INFO","fields":{"message":"Got request from FR"},"target":"catscii"}
Hurray! (You gotta scroll right a little, it says "Got request from FR"). I am indeed in France, the system works.
Adding analytics
Now, it would be fun if we kept track of how many visits we get per country.
This is pretty vague, so I think we're not going to run afoul of any privacy laws, but it's still entertaining to look at: if you share the link to your app to your online colleagues or friends, you might be surprised how far it travels!
We'll do that with an sqlite database, which is complete overkill, but this is all, still preparation for when we'll need to ship all of that with nix.
This is also functionality locat
will have, so, let's add more dependencies to
locat
:
$ cargo add rusqlite@0.28 (cut)
So, let's take a first stab at this:
// in `locat/src/lib.rs` use std::net::IpAddr; /// Allows geo-locating IPs and keeps analytics pub struct Locat { reader: maxminddb::Reader<Vec<u8>>, analytics: Db, } #[derive(Debug, thiserror::Error)] pub enum Error { #[error("maxminddb error: {0}")] MaxMindDb(#[from] maxminddb::MaxMindDBError), #[error("rusqlite error: {0}")] Rusqlite(#[from] rusqlite::Error), } impl Locat { pub fn new(geoip_country_db_path: &str, analytics_db_path: &str) -> Result<Self, Error> { Ok(Self { reader: maxminddb::Reader::open_readfile(geoip_country_db_path)?, analytics: Db { path: analytics_db_path.to_string(), }, }) } /// Converts an address to an ISO 3166-1 alpha-2 country code pub async fn ip_to_iso_code(&self, addr: IpAddr) -> Option<&str> { let iso_code = self .reader .lookup::<maxminddb::geoip2::Country>(addr) .ok()? .country? .iso_code?; if let Err(e) = self.analytics.increment(iso_code) { eprintln!("Could not increment analytics: {e}"); } Some(iso_code) } /// Returns a map of country codes to number of requests pub async fn get_analytics(&self) -> Result<Vec<(String, u64)>, Error> { Ok(self.analytics.list()?) } } struct Db { path: String, } impl Db { fn list(&self) -> Result<Vec<(String, u64)>, rusqlite::Error> { let conn = self.get_conn()?; let mut stmt = conn.prepare("SELECT iso_code, count FROM analytics")?; let mut rows = stmt.query([])?; let mut analytics = Vec::new(); while let Some(row) = rows.next()? { let iso_code: String = row.get(0)?; let count: u64 = row.get(1)?; analytics.push((iso_code, count)); } Ok(analytics) } fn increment(&self, iso_code: &str) -> Result<(), rusqlite::Error> { let conn = self.get_conn()?; let mut stmt = conn .prepare("INSERT INTO analytics (iso_code, count) VALUES (?, 1) ON CONFLICT (iso_code) DO UPDATE SET count = count + 1") ?; stmt.execute([iso_code])?; Ok(()) } fn get_conn(&self) -> Result<rusqlite::Connection, rusqlite::Error> { let conn = rusqlite::Connection::open(&self.path)?; self.migrate(&conn)?; Ok(conn) } fn migrate(&self, conn: &rusqlite::Connection) -> Result<(), rusqlite::Error> { // create analytics table conn.execute( "CREATE TABLE IF NOT EXISTS analytics ( iso_code TEXT PRIMARY KEY, count INTEGER NOT NULL )", [], )?; Ok(()) } } #[cfg(test)] mod tests { use crate::Db; struct RemoveOnDrop { path: String, } impl Drop for RemoveOnDrop { fn drop(&mut self) { _ = std::fs::remove_file(&self.path); } } #[test] fn test_db() { let db = Db { path: "/tmp/locat-test.db".to_string(), }; let _remove_on_drop = RemoveOnDrop { path: db.path.clone(), }; let analytics = db.list().unwrap(); assert_eq!(analytics.len(), 0); db.increment("US").unwrap(); let analytics = db.list().unwrap(); assert_eq!(analytics.len(), 1); db.increment("US").unwrap(); db.increment("FR").unwrap(); let analytics = db.list().unwrap(); assert_eq!(analytics.len(), 2); // contains US at count 2 assert!(analytics.contains(&("US".to_string(), 2))); // contains FR at count 1 assert!(analytics.contains(&("FR".to_string(), 1))); // doesn't contain DE assert!(!analytics.contains(&("DE".to_string(), 0))); } }
This code is, of course, awful. That's because it was mostly generated through
GitHub Copilot. We'll have plenty of time to roast review and improve it
later.
Off the top of my head we have:
- Doing blocking work (rusqlite stuff) from an
async
function (we should be using something liketokio::task::spawn_blocking
, or an async wrapper over sqlite) - Swallowed errors in
ip_to_iso_code
: since it returns anOption
, we can't really do anything in case there's an error. - Mixed concerns:
ip_to_iso_code
increment analytics instead of just being a lookup. That's not great API design. - Database is re-opened (and re-migrated) for every query. This alleviates some concerns around lifetime, but it's not what we'd normally want.
Don't think of it as terribly awful code, think of it as a lot of room for improvement later — let's focus on shipping this to hit our KPIs and OKRs and other three-letter acronyms.
I've written some tests, so, we should run them:
$ cargo test Compiling locat v0.3.0 (/home/amos/locat) error: linking with `cc` failed: exit status: 1 | = note: "cc" "-m64" "/tmp/rustcwAryeN/symbols.o" "/home/amos/locat/target/debug/deps/locat-925e8cba729664ee.13qpde65w7t2xobj.rcgu.o" (cut) "-Wl,--gc-sections" "-pie" "-Wl,-zrelro,-znow" "-nodefaultlibs" = note: /usr/bin/ld: cannot find -lsqlite3: No such file or directory collect2: error: ld returned 1 exit status error: could not compile `locat` due to previous error warning: build failed, waiting for other jobs to finish...
Ah, we don't have sqlite installed! It's a C library, so we must get it from
somewhere. There's a bundled
cargo feature for the rusqlite
crate, which
would make everything work easily, but I've specifically chosen sqlite to show
off native dependencies, so let's keep linking with it dynamically and, in
development, just install it on our Ubuntu VM:
$ apt-cache search '^libsqlite' libsqlite3-0 - SQLite 3 shared library libsqlite3-dev - SQLite 3 development files libsqlite3-mod-ceph - SQLite3 VFS for Ceph libsqlite3-mod-ceph-dev - SQLite3 VFS for Ceph (development files) libsqlite-tcl - SQLite 2 Tcl bindings libsqlite0 - SQLite 2 shared library libsqlite0-dev - SQLite 2 development files libsqlite3-gst - SQLite bindings for GNU Smalltalk libsqlite3-mod-blobtoxy - SQLite3 extension module for read-only BLOB to X/Y mapping libsqlite3-mod-csvtable - SQLite3 extension module for read-only access to CSV files libsqlite3-mod-impexp - SQLite3 extension module for SQL script, XML, JSON and CSV import/export libsqlite3-mod-rasterlite2 - SQLite 3 module for huge raster coverages libsqlite3-mod-spatialite - Geospatial extension for SQLite - loadable module libsqlite3-mod-virtualpg - Loadable dynamic extension to both SQLite and SpatiaLite libsqlite3-mod-xpath - SQLite3 extension module for querying XML data with XPath libsqlite3-mod-zipfile - SQLite3 extension module for read-only access to ZIP files libsqlite3-ocaml - Embeddable SQL Database for OCaml Programs (runtime) libsqlite3-ocaml-dev - Embeddable SQL Database for OCaml Programs (development) libsqlite3-tcl - SQLite 3 Tcl bindings libsqliteodbc - ODBC driver for SQLite embedded database
Ah. I can never quite remember the name of Ubuntu packages, but luckily,
apt-cache search
accepts regular expressions! Isn't that nice.
So:
$ sudo apt install libsqlite3-dev [sudo] password for amos: Reading package lists... Done Building dependency tree... Done Reading state information... Done Suggested packages: sqlite3-doc The following NEW packages will be installed: libsqlite3-dev 0 upgraded, 1 newly installed, 0 to remove and 46 not upgraded. Need to get 846 kB of archives. (cut)
And then:
$ cargo test Compiling locat v0.3.0 (/home/amos/locat) Finished test [unoptimized + debuginfo] target(s) in 0.35s Running unittests src/lib.rs (target/debug/deps/locat-925e8cba729664ee) running 1 test test tests::test_db ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s Doc-tests locat running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Looks good! I'm always suspicious of tests passing the first time, so I messed with the test code to make sure it could fail (and was being run at all), and yep, everything seems fine.
Well then! Time to publish a new version, you know what to do: bump
package.version
in locat
's Cargo.toml
, run cargo publish
, then bump the
dependency in catscii
's Cargo.toml
, and.. let's also add environment
variables for the analytics DB everywhere.
In the code:
// in `catscii/src/main.rs` let country_db_env_var = "GEOLITE2_COUNTRY_DB"; let country_db_path = std::env::var(country_db_env_var) .unwrap_or_else(|_| panic!("${country_db_env_var} must be set")); let analytics_db_env_var = "ANALYTICS_DB"; let analytics_db_path = std::env::var(analytics_db_env_var) .unwrap_or_else(|_| panic!("${analytics_db_env_var} must be set")); let state = ServerState { client: Default::default(), locat: Arc::new(Locat::new(&country_db_path, &analytics_db_path).unwrap()), };
In .envrc
:
export ANALYTICS_DB="db/analytics.db"
$ direnv allow
In fly.toml
:
[env] GEOLITE2_COUNTRY_DB = "/db/GeoLite2-Country.mmdb" ANALYTICS_DB = "analytics.db"
And let's add an endpoint that shows analytics:
// in `async fn main()` let app = Router::new() .route("/", get(root_get)) .route("/analytics", get(analytics_get)) .route("/panic", get(|| async { panic!("This is a test panic") })) .with_state(state); // later down: async fn analytics_get(State(state): State<ServerState>) -> Response<BoxBody> { let analytics = state.locat.get_analytics().await.unwrap(); let mut response = String::new(); use std::fmt::Write; for (country, count) in analytics { _ = writeln!(&mut response, "{country}: {count}"); } response.into_response() }
Alright! I think we're ready to deploy:
$ just deploy (cut) #20 12.41 Compiling catscii v0.1.0 (/app) #20 62.70 error: linking with `cc` failed: exit status: 1 #20 62.70 | #20 62.70 = note: "cc" "-m64" "/tmp/rustcqwxz9u/symbols.o" "/app/target/release/deps/catscii-eff43af45afb3155.catscii.39b5b9b0-cgu.1.rcgu.o" "-Wl,--as-needed" "-L" "/app/target/release/deps" "-L" "/app/target/release/build/ring-ce3ece41d6d6a103/out" "-L" "/root/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/tmp/rustcqwxz9u/libring-9da25afb38225173.rlib" "/root/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-66b9c3ae5ff29c13.rlib" "-Wl,-Bdynamic" "-lssl" "-lcrypto" "-lsqlite3" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-L" "/root/.rustup/toolchains/nightly-2022-12-24-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/app/target/release/deps/catscii-eff43af45afb3155" "-Wl,--gc-sections" "-pie" "-Wl,-zrelro,-znow" "-nodefaultlibs" #20 62.70 = note: /usr/bin/ld: cannot find -lsqlite3 #20 62.70 collect2: error: ld returned 1 exit status
Oh whoops, no, we also have to install libsqlite3-dev
inside the Docker
container. Or rather... we can install libsqlite3-dev
for compile-time, and
just libsqlite3-0
for run-time:
# (cut) # Install compile-time dependencies RUN set -eux; \ apt update; \ apt install --yes --no-install-recommends \ openssh-client git-core curl ca-certificates gcc libc6-dev pkg-config libssl-dev \ libsqlite3-dev \ ; # (cut) # Install run-time dependencies, remove extra APT files afterwards. # This must be done in the same `RUN` command, otherwise it doesn't help # to reduce the image size. RUN set -eux; \ apt update; \ apt install --yes --no-install-recommends \ ca-certificates \ libsqlite3-0 \ ; \ apt clean autoclean; \ apt autoremove --yes; \ # Note: 👇 this only works because of the `SHELL` instruction above. rm -rf /var/lib/{apt,dpkg,cache,log}/ # (cut)
I was going to do another few back-and-forths about this to really emphasize the
pain of using Dockerfiles for this, but essentially: we need to install
compile-time & run-time dependencies separately to keep our image slim. Both
packages have uncomfortable naming conventions on Ubuntu (why the -0
in
libsqlite3-0
?). A missing compile-time dependency fails at docker build
time, which is nice, but a missing run-time dependency means a crash in
production, which isn't nice.
What we should be doing here is use something like docker-compose
to run the
image locally before we deploy it. We could even have a staging app we deploy to
first, so it's as close to production as possible - fly.io makes it easy-ish to
do that.
And, another just deploy
later, our service is up and running. It shows me
an ASCII art cat, and https://old-frost-6294.fly.dev/analytics currently shows
me:
FR: 1
Success! Don't feel bad about the codebase: companies make more money than your lifetime earnings every month with much worse code.
I'm curious to play with IP geo-location some more though, so, mhh, I don't want to leak the address before the series is published, but I have an idea.
KeyCDN's performance test hits your website from 10 different locations around the world. So let me try that, and... the analytics now look like this:
FR: 1 DE: 1 GB: 1 NL: 1 US: 3 IN: 1 JP: 1 AU: 1 SG: 1
Nice!
All that's left really, is to deploy this with nix instead. I know it's taken a while to build up to this point, but now we have a service that looks somewhat like a real-world thing, and we have all the classic deployment problems to solve.
Thanks to my sponsors: Yuriy Taraday, Joonas Koivunen, Jarek Samic, ofrighil, Brandon Piña, Mikkel Rasmussen, Ben Mitchell, hgranthorner, Jack Duvall, genny, Senyo Simpson, Richard Stephens, Isak Sunde Singh, Geoffroy Couprie, Corey Alexander, Jon Gjengset, Chris Walker, Daniel Strittmatter, C J Silverio, Duane Sibilly and 227 more
If you liked what you saw, please support my work!
Here's another article just for you:
In order to increase fluency in a programming language, one has to read a lot of it.
But how can you read a lot of it if you don't know what it means?
In this article, instead of focusing on one or two concepts, I'll try to go through as many Rust snippets as I can, and explain what the keywords and symbols they contain mean.
Ready? Go!