Remote development with Rust on fly.io

Jun 20, 2022 43 min #rust · #tokio · #io-uring · #bpf · #fly.io

Thanks to my sponsors: Ben Wishovich, Zac Harrold, Kamran Khan, Zaki, Josh Triplett, DaVince, Luke Konopka, Garret Kelly, Geoffrey Thomas, Jake Demarest-Mays, me, Corey Alexander, Chris Biscardi, Aaron Gorodetzky, Senyo Simpson, Raine Godmaire, Christopher Valerio, Ross Williams, Alexandra Østermark, Brooke Tilley and 278 more

👋 This page was last updated ~3 years ago. Just so you know.

Disclaimer:

At the time of this writing, I benefit from the fly.io “Employee Free Tier”. I don’t pay for side projects hosted there “within reasonable limits”. The project discussed here qualifies for that.

Why you might want a remote dev environment

Fearmongering aside — and Cthulhu knows there’s been a bunch, since this unfortunate tweet — there’s a bunch of reasons to want a remote dev environment.

For example, maybe the only computer you have available simply isn’t performant enough to perform the tasks you want to perform. Such as: building a lot a lot of Rust. Like the compiler, or rust-analyzer, or maybe you’re quirky like me and you maintain like 7 big proprietary codebases solo just so your blog slash video platform is juuuuust the way you like it.

So instead of buying a fuck-you CPU (like a Threadripper, or something more consumery like the latest Ryzens), maybe you rent a big cloud machine that you can turn on and off as needed. Just for the big stuff.

If you need a bunch of CPU to work directly on the Rust project itself btw, the Rust foundation has a program for that.

Disclaimer:

At the time of this writing, I am not affiliated with the Rust foundation in any way.

Another good reason is that you invested in another incompatible brand of fuck-you CPUs, like an M1, or an M2 (where will it end?), which is arm64. But you need to deploy for Linux x86_64, for example.

In that case, you can either pretend that macOS arm64 and Linux x86_64 are “close enough” by virtue of being unixy, and maintain your codebase for essentially two targets, leaving nasty surprises for later, OR you can work in a VM. Or an x86_64 Docker container, which, on macOS, runs in a VM.

And that’s fine, but it’ll make most laptops take off. I don’t know what the Apple Silicon situation is like (somebody send me one!) but I’m assuming even with it being a revolution, emulating x86 on it is still not as fast as an actual x86 processor. I might be wrong. I’m sure I’ll find out soon enough.

Another good reason is that you’re maybe running a team of developers, it’s not just you. And you want to be able to 1) hire folks from all around the world, 2) hire folks who don’t already have a chonky computer at home, 3) not have to ship them a chonky computer (which you then have to get back or gift to them), 4) onboard them quickly, giving them a consistent dev environment where everything works the first time.

Me, I don’t have a team. Well… I kinda do: there’s an army of folks who make up for the absence of an editor by reporting spelling errors, inaccuracies etc. every time I release an article. But all the code side is just me. And I have several computers that are more than up to the task of compiling a metric ton of Rust regularly, as I tend to do. None of them run Linux on the desktop, although I exclusively ship stuff for Linux (both for my day job and for my side gig), but it’s nothing VMs can’t fix, and I have a bunch of those.

My reason is much simpler, and probably silly: as I’m writing this, it’s 37 degrees Celsius outside (99 Fahrenheit). Not only is my big desktop tower not the most environment-friendly machine I have available, it also puts out quite a bit of heat. And it adds up.

Also I like to be able to write from different places, so that means having two computers up, and using one (a laptop) to remote into the other (a desktop), which makes the energy+heat problem even worse and also now we get into “how do you manage to make the desktop stay awake while you’re SSH’d into the Linux VM that runs on it, but make it go to sleep very quickly when it’s not”.

You know the second you hit publish someone will tell you that it’s not that hard and if you just do (two pages ensue)

Yes yes I know. There’s still the heat and electricity bill problem.

So anyway, since fly.io pays me during weekdays to make their platform better (more precisely fly-proxy, the thing that all TCP traffic goes through right now, except for IPv6 private networking, and which I’ve recently written about on their blog), I have an employee discount kinda deal.

And the deal is I straight up don’t pay for compute there. So here’s the big old disclaimer: I DON’T HAVE TO PAY FOR ANY OF THIS. ALSO THEY’RE PAYING ME, but for other stuff. This is week-end amos. The people paying me to write this are my patrons - so I’m gonna give my honest opinion there, as a non-paying customer of fly.io.

Are we good with the disclaimer part? Everybody clear?

So, you’re a shill, temporarily cosplaying as “not a shill”, and you found a way to sneak Rust in there because you can’t help yourself.

I mean, sure… but also I just think it’s pretty neat? And I get to explain a bunch of stuff about how it works, and why it works particularly well for me (even if I did have to pay for it).

What the heck is fly.io for, even

This is the part that could be misconstrued as marketing, but really I just want y’all to be clear on what we’re working with. It took me a hot minute to really understand what fly was all about, even after going through the docs several times. After hacking on it from the inside, and several side-projects later (such as my video thing), I think I get it.

So essentially what fly lets you do is push your code there, and then boom it runs in the cloud.

Ah, so like Heroku.

Kinda sorta but also no. Heroku has this whole buildpacks thing, and I guess fly supports it too in some way, but I’m not interested in that part at all so I just don’t know enough to answer that question.

How I personally see it is that I build a Docker image with anything (x86_64 only right now), push it to fly (so they have their own image registry), and boom it runs in the cloud.

Ah, so like Google Cloud Run or whatever the AWS equivalent is called.

Again, kinda sorta but also no, because it doesn’t actually run in Docker. It runs in a Firecracker microVM. Which is a real VM, so you don’t have the usual limitations of containers.

Such as?

Let’s circle back to that later.

First let me show you how to deploy a thing there. Our thing will be Rust, because my blog my rules, as usual.

So, simplest HTTP server I can think of:

$ cargo new hello-axum
     Created binary (application) `hello-axum` package
$ cd hello-axum
$ cargo add tokio +full
$ cargo add axum

// in `hello-axum/src/main.rs`

use axum::{response::IntoResponse, routing::get, Router, Server};

#[tokio::main]
async fn main() {
    let app = Router::new().route("/", get(index));

    let addr = "[::]:8080".parse().unwrap();
    println!("Listening on http://{addr}");
    Server::bind(&addr)
        .serve(app.into_make_service())
        .await
        .unwrap();
}

async fn index() -> impl IntoResponse {
    "hello from axum\n"
}

Does it work?

$ cargo run
   Compiling cfg-if v1.0.0
   Compiling pin-project-lite v0.2.9
   Compiling bytes v1.1.0
   Compiling itoa v1.0.2
   Compiling once_cell v1.12.0
   Compiling smallvec v1.8.0
   Compiling scopeguard v1.1.0
   Compiling fnv v1.0.7
   (cut)
    Finished dev [unoptimized + debuginfo] target(s) in 2.55s
     Running `target/debug/hello-axum`
Listening on http://[::]:8080

Then, from another shell:

$ curl -i 0:8080
HTTP/1.1 200 OK
content-type: text/plain; charset=utf-8
content-length: 23
date: Sat, 18 Jun 2022 15:38:47 GMT

hello from axum

Okay, time to build it as a Docker image:

# in `hello-axum/Dockerfile`
# syntax = docker/dockerfile:1.4

FROM rust:1.61.0-slim-bullseye AS builder

WORKDIR /app
COPY . .
RUN --mount=type=cache,target=/app/target \
		--mount=type=cache,target=/usr/local/cargo/registry \
		--mount=type=cache,target=/usr/local/cargo/git \
		--mount=type=cache,target=/usr/local/rustup \
		set -eux; \
		rustup install stable; \
	 	cargo build --release; \
		objcopy --compress-debug-sections target/release/hello-axum ./hello-axum

################################################################################
FROM debian:11.3-slim

RUN set -eux; \
		export DEBIAN_FRONTEND=noninteractive; \
	  apt update; \
		apt install --yes --no-install-recommends bind9-dnsutils iputils-ping iproute2 curl ca-certificates htop; \
		apt clean autoclean; \
		apt autoremove --yes; \
		rm -rf /var/lib/{apt,dpkg,cache,log}/; \
		echo "Installed base utils!"

WORKDIR app

COPY --from=builder /app/hello-axum ./hello-axum
CMD ["./hello-axum"]

Also let’s exclude /target from the Docker context, we don’t need it there:

# in `hello-axum/.dockerignore`
/target

Also make sure you have docker buildkit enabled, it’s what you want 99% of the time nowadays and it supports all the nice stuff.

$ docker build -t hello-axum .
[+] Building 2.6s (6/13)
 => [internal] load build definition from Dockerfile 0.0s
 => => transferring dockerfile: 990B 0.0s
 => [internal] load .dockerignore 0.0s
 => => transferring context: 79B0.0s
 => [internal] load metadata for docker.io/library/debian:11.3-slim 1.5s
 => [internal] load metadata for docker.io/library/rust:1.61.0-slim-bullseye
(cut)
 => [stage-1 4/4] COPY --from=builder /app/hello-axum ./hello-axum
 => exporting to image
 => => exporting layers
 => => writing image sha256:a6ae1acc11eb094218c1abb4da319a4e53ee93844d98d94c912698d75e2136e0
 => => naming to docker.io/library/hello-axum

There’s a couple of neat tricks in the Dockerfile above: toolchains, dependencies and the target folder are cached, it compresses debug symbols (which aren’t there because I forgot to show you to do [profile.release] debug = "1" in the Cargo.toml but whatever), it uses separate stages

Anyway the resulting image is 152MB for me, not great, not terrible, probably could go with containerless or base it on something like Alpine, but that comes with other tradeoffs, and this isn’t a Docker tutorial, let’s move on.

Point is, it works:

$ docker run --detach --rm --name hello-axum --publish 8080:8080 hello-axum
78dde4d1e52dfc199e6ebdeda1b65192ba534cef6d5ea8ba169106e348eb4749
$ curl 0:8080
hello from axum
$ docker kill hello-axum
hello-axum

That is, if you remembered to kill the other process with Ctrl-C. Otherwise it won’t be able to bind on port 8080.

Time to deploy it to fly. I’ll spare you the pre-onboarding, you need to install flyctl, log in with your fly account, blah blah let’s create an app:

$ fly apps create
? App Name: hello-axum
? Select Organization: Amos Wenger (personal)
New app created: hello-axum

Save the config it autogenerated from us:

$ fly config save -a hello-axum
Wrote config file fly.toml

Which gives us this:

# in `hello-axum/fly.toml`

# fly.toml file generated for hello-axum on 2022-06-18T16:02:28Z

app = "hello-axum"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

Those are the defaults, it’s neat to have them written down. I do want to expose stuff on ports 80 and 443, those seem like reasonable limits for a toy app, I do want to redirect port 80 to 443 automatically, and my internal port is 8080 already, the only thing missing is what image to push, so, adding a new section:

[build]
image = "hello-axum"

And we’re off:

$ fly deploy --local-only
==> Verifying app config
--> Verified app config
==> Building image
Searching for image 'hello-axum' locally...
image found: sha256:3f93ceb9158f5e123253060d58d607f7c2a7e2f93797b49b4edbbbcc8e1b3840
==> Pushing image to fly
The push refers to repository [registry.fly.io/hello-axum]
02f75279051e: Pushed
4e38e245312b: Pushed
85ade8c6ca76: Pushed
ad6562704f37: Pushed
deployment-1655568332: digest: sha256:1ddfda6a6d8d84d804602653501db1c9720677b6e04e31008d3256c53ec09723 size: 1159
--> Pushing image done
==> Creating release
--> release v2 created

--> You can detach the terminal anytime without stopping the deployment
==> Monitoring deployment

 1 desired, 1 placed, 1 healthy, 0 unhealthy [health checks: 1 total, 1 passing]
--> v0 deployed successfully

Because the image was available locally it just pushed it to the fly docker registry (there’s also a remote builder feature which I’ve never used).

And then it created an instance of the app for us… somewhere? Which was eventually allocated on a worker, and.. what feels like an eternity but was almost definitely under a minute, our app is running.

$ curl https://hello-axum.fly.dev -i
HTTP/2 200
content-type: text/plain; charset=utf-8
content-length: 16
date: Sat, 18 Jun 2022 16:07:39 GMT
server: Fly/09a15cede3 (2022-06-17)
via: 2 fly.io
fly-request-id: 01G5VS3SPBQ4XY4M7VZXTG8KBJ-cdg

hello from axum

And we can see some new headers here! Also it’s using http/2, and you can tell I deployed to production yesterday from the server header.

There’s a lot of other cool stuff happening there like built-in metrics but these aren’t really relevant. There’s also a whole web UI that shows that yes we do have an app running, shows some graphs, even the logs live-streaming etc. but it’s easier to just talk about the CLI in this format so there, logs:

$ fly logs
2022-06-18T16:05:53Z runner[fdca430e] cdg [info]Starting instance
2022-06-18T16:05:53Z runner[fdca430e] cdg [info]Configuring virtual machine
2022-06-18T16:05:53Z runner[fdca430e] cdg [info]Pulling container image
2022-06-18T16:05:58Z runner[fdca430e] cdg [info]Unpacking image
2022-06-18T16:05:59Z runner[fdca430e] cdg [info]Preparing kernel init
2022-06-18T16:05:59Z runner[fdca430e] cdg [info]Configuring firecracker
2022-06-18T16:06:00Z runner[fdca430e] cdg [info]Starting virtual machine
2022-06-18T16:06:00Z app[fdca430e] cdg [info][    0.026893] PCI: Fatal: No config space access function found
2022-06-18T16:06:00Z app[fdca430e] cdg [info]Starting init (commit: e21acb3)...
2022-06-18T16:06:00Z app[fdca430e] cdg [info]Preparing to run: `./hello-axum` as root
2022-06-18T16:06:00Z app[fdca430e] cdg [info]2022/06/18 16:06:00 listening on [fdaa:0:446c:a7b:ae02:fdca:430e:2]:22 (DNS: [fdaa::3]:53)
2022-06-18T16:06:00Z app[fdca430e] cdg [info]Listening on http://[::]:8080

I don’t know what the PCI message means, don’t ask me. init is fly’s custom init program, it’s also all Rust, here’s an old snapshot of it, and we can see our app running.

We even know where it’s running (cdg = Charles de Gaulle Airport Paris), the closest region to where I live.

There’s a bunch of other useful CLI commands:

$ fly status
App
  Name     = hello-axum
  Owner    = personal
  Version  = 0
  Status   = running
  Hostname = hello-axum.fly.dev

Deployment Status
  ID          = 70edc42a-9bac-0b2a-803c-c0cec866929a
  Version     = v0
  Status      = successful
  Description = Deployment completed successfully
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS        CREATED
fdca430e        app     0       cdg     run     running 1 total, 1 passing      0               6m50s ago

$ fly vm status fdca430e
Instance
  ID            = fdca430e
  Process       =
  Version       = 0
  Region        = cdg
  Desired       = run
  Status        = running
  Health Checks = 1 total, 1 passing
  Restarts      = 0
  Created       = 7m10s ago

Recent Events
TIMESTAMP            TYPE       MESSAGE
2022-06-18T16:05:52Z Received   Task received by client
2022-06-18T16:05:52Z Task Setup Building Task Directory
2022-06-18T16:06:00Z Started    Task started by client

Checks
ID                               SERVICE  STATE   OUTPUT
3df2415693844068640885b45074b954 tcp-8080 passing TCP connect 172.19.2.2:8080: Success

Recent Logs

And so, yeah, that’s classic fly!

With fly regions set we can decide where our app should run, with fly scale count we can change how many instances are running, with fly scale vm we can switch VM types (it’s very smol right now), for example here’s what I have to serve my videos:

$ fly status
App
  Name     = tube
  Owner    = personal
  Version  = 164
  Status   = running
  Hostname = tube.fly.dev

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS   RESTARTS    CREATED
c1f4d89e        app     164     sjc     run     running                 0           2022-06-14T22:02:22Z
b74afb02        app     164     yyz     run     running                 0           2022-05-09T21:07:53Z
8b5ca0c7        app     164     gru     run     running                 0           2022-05-09T21:07:15Z
0b08b59c        app     164     ams     run     running                 0           2022-05-09T21:06:30Z
6389589a        app     164     cdg     run     running                 0           2022-05-09T21:05:42Z
ea94e5ef        app     164     nrt     run     running                 0           2022-05-09T21:03:21Z
79ecda2b        app     164     iad     run     running                 1           2022-05-09T21:02:51Z
26ea7a65        app     164     yyz     run     running                 0           2022-05-09T21:02:10Z

$ fly scale show
VM Resources for tube
        VM Size: shared-cpu-1x
      VM Memory: 512 MB
          Count: 8
 Max Per Region: Not set

Trying to make them regret their lifetime employee discount thing, are we?

Hey, rules were made to be tested okay.

Oh yeah also there’s volumes! Because instances get created and destroyed and some data you don’t want to lose so you stick it in volumes:

$ fly volumes list
ID                      STATE   NAME            SIZE    REGION  ZONE    ATTACHED VM     CREATED AT
vol_18l524y8j0er7zmp    created tubecache       40GB    ams     8aba    0b08b59c        1 month ago
vol_18l524y8j5jr7zmp    created tubecache       40GB    yyz     d33c    26ea7a65        1 month ago
vol_okgj54580lq4y2wz    created tubecache       40GB    iad     ddf7                    1 month ago
vol_x915grnzw8krn70q    created tubecache       40GB    nrt     0e0f    ea94e5ef        1 month ago
vol_ke628r68g3n4wmnp    created tubecache       40GB    sjc     c0a5    c1f4d89e        1 month ago
vol_02gk9vwnej1v76wm    created tubecache       40GB    cdg     0e8c    6389589a        1 month ago
vol_8zmjnv8em85vywgx    created tubecache       40GB    yyz     5e29    b74afb02        1 month ago
vol_ypkl7vz8k5evqg60    created tubecache       40GB    iad     f6cb    79ecda2b        1 month ago
vol_0nylzre12814qmkp    created tubecache       40GB    gru     2824    8b5ca0c7        1 month ago
vol_52en7r1jpl9rk6yx    created tubecache       40GB    syd     039e                    1 month ago
vol_w1q85vgn7jj4zdxe    created tubecache       40GB    lhr     ad0e                    1 month ago

And you said you didn’t want to do marketing? Where’s the remote dev environment?

I’m getting to it! So back to our hello-axum app, we can SSH into it:

$ fly ssh console
Connecting to top1.nearest.of.hello-axum.internal... complete
# whoami
root
#

And this is where things get interesting, because this is where you start to notice this isn’t actually a container running in docker.

Let’s hop into bash to run a few more commands:

root@fdca430e:/# cat /proc/cpuinfo | grep -i mhz
cpu MHz         : 2799.998

So we got a single shared core, that’s the default.

$ root@fdca430e:/# uname -a
Linux fdca430e 5.12.2 #1 SMP Thu Jun 2 14:26:49 UTC 2022 x86_64 GNU/Linux

Linux 5.12.2, that’s from.. April 2021, still relatively recent. Recent enough that we could play with io-uring if we wanted.

No no no no stay on task.

But yeah our Docker image doesn’t provide a kernel, only a userland. The kernel is whatever fly gives us. Still, we have a kernel. Which will come in handy later.

For now, it’s time to review why a “classic fly.io app” doesn’t really work as a remote dev environment.

For starters, we can’t scale to zero.

$ fly scale count 0
Count changed to 0

(A full minute elapses)

$ fly status
App
  Name     = hello-axum
  Owner    = personal
  Version  = 1
  Status   = dead
  Hostname = hello-axum.fly.dev

Instances
ID      PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS   RESTARTS        CREATED

Okay I guess you can… but as you can see the app’s status is now “dead”. And fly-proxy suddenly doesn’t know the app exists anymore.

So if we try to curl it:

$ curl -v https://hello-axum.fly.dev
*   Trying 2a09:8280:1::1:4857:443...
* TCP_NODELAY set
* Connected to hello-axum.fly.dev (2a09:8280:1::1:4857) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):

We’ll get stuck there, and eventually:

* OpenSSL SSL_connect: Connection reset by peer in connection to hello-axum.fly.dev:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to hello-axum.fly.dev:443

That’s because fly-proxy is waiting around for another instance to be up: which may happen if you have a release strategy where your app temporarily has zero instances between deploys. (You probably don’t want to do that, have at least one instance up to avoid downtime).

We can start it back up with fly scale, but it’s… not fast.

$ time bash -c 'fly scale count 1; while true; do curl https://hello-axum.fly.dev --max-time 1 && exit 0 || echo "still starting..."; done'
Count changed to 1
curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received
still starting...
curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received
still starting...
hello from axum
bash -c   0.14s user 0.07s system 2% cpu 8.421 total

That was a particularly unlucky run. I’ve had it start back up in ~3 seconds while I figured out the right bash incantation.

But still, it makes perfect sense given what’s actually happening:

An API call is made to fly.io
Which creates a nomad job
Which eventually gets allocated somewhere by nomad
Which informs fly.io that it’s up
Also some consul services are created
Eventually fly-proxy knows about the services
And because services are how it knows that an app even exists right now, it knows about the app again, and can start routing traffic there.

Wait wait wait, are you supposed to share that many internal details?

Oh don’t worry, they did it for me.

So, is that suitable for a remote dev environment?

The SSH server problem

fly.io provides an SSH server, but it’s not really good enough. Let’s look at why.

First off, fly ssh console handles all the dirty details for you.

If we want to use vanilla ssh we have to do a bunch more stuff. We could use fly proxy to map a local port to the remote instance’s SSH server port.

$ fly proxy 2200:22 hello-axum.internal
Proxying local port 2200 to remote [hello-axum.internal]:22

(Here we only have one instance. If we had multiple I’d need to use d76c732a.vm.hello-axum.internal)

Then issue an SSH keypair:

$ fly ssh issue
? Select organization: Amos Wenger (personal)
? Email address for user to issue cert:  [redacted]

!!!! WARNING: We're now prompting you to save an SSH private key and certificate       !!!!
!!!! (the private key in "id_whatever" and the certificate in "id_whatever-cert.pub"). !!!!
!!!! These SSH credentials are time-limited and handling them in files is clunky;      !!!!
!!!! consider running an SSH agent and running this command with --agent. Things       !!!!
!!!! should just sort of work like magic if you do.                                    !!!!
? Path to store private key:  /tmp/id_rsa
Wrote 24-hour SSH credential to /tmp/id_rsa, /tmp/id_rsa-cert.pub

(The note about --agent is right now, it worked for me in the past, just not today and I couldn’t debug it)

And then we can connect:

$ ssh -i /tmp/id_rsa localhost -p 2200 whoami
(cut: fingerprint stuff)
root

But we can’t, for example, proxy some ports over it:

$ ssh -i /tmp/id_rsa localhost -p 2200 -L 8080:localhost:8080
#

(Then, from another terminal)

$ curl localhost:8080
curl: (56) Recv failure: Connection reset by peer

I’m not sure why! ssh -vvv isn’t super helpful there. But when I tried connecting from VSCode, by shoving this in my ~/.ssh/config:

Host hello-axum
	HostName localhost
	Port 2200
	IdentityFile /tmp/id_rsa

It was unhappy too:

[19:27:01.785] Remote server is listening on 43703
[19:27:01.785] Parsed server configuration: {"serverConfiguration":{"remoteListeningOn":{"port":43703},"osReleaseId":"debian","arch":"x86_64","webUiAccessToken":"","sshAuthSock":"","display":"","tmpDir":"/tmp","platform":"linux","connectionToken":"1a11a111-1111-111a-aaa1-a11a11111111"},"downloadTime":3407,"installTime":1447,"serverStartTime":99,"installUnpackCode":"success"}
[19:27:01.786] Persisting server connection details to /Users/amos/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-ssh/vscode-ssh-host-9a297f3d-30d9c6cd9483b2cc586687151bcbcd635f373630-0.82.1/data.json
[19:27:01.788] Starting forwarding server. localPort 54022 -> socksPort 54016 -> remotePort 43703
[19:27:01.788] Forwarding server listening on 54022
[19:27:01.788] Waiting for ssh tunnel to be ready
[19:27:01.789] Tunneled 43703 to local port 54022
[19:27:01.789] Resolved "ssh-remote+hello-axum" to "127.0.0.1:54022"
[19:27:01.790] [Forwarding server 54022] Got connection 0
[19:27:01.796] ------
[19:27:01.807] [Forwarding server 54022] Got connection 1
[19:27:01.809] Failed to set up socket for dynamic port forward to remote port 43703: connect ECONNREFUSED 127.0.0.1:54016. Is the remote port correct?
[19:27:01.809] > local-server-1> ssh child died, shutting down
[19:27:01.809] Failed to set up socket for dynamic port forward to remote port 43703: Socket closed. Is the remote port correct?
[19:27:01.812] Local server exit: 0

So, we’re going to need a better SSH server. And also, personally, I don’t want to have to run fly proxy (the flyctl command, not the TCP/HTTP proxy running in front of fly apps) every time I want to connect to my remote dev environment.

Oh and the SSH keys expire after 24 hours. And you can’t configure the SSH server, since it’s built-in (unless I missed something).

So combine that with slow start/stop times and things don’t look too good. (Especially since, it’s not clear how we’d start/stop individual instances. Playing with fly regions and fly scale to achieve that sounds dangerous!)

Oh no! The all is lost moment!

Enter fly.io machines

The best way to describe fly.io machines is just “firecracker microVMs as a service”, with no Nomad/Consul in-between.

We’ll need a new fly.io app for that — you can’t just add machines to a regular app for now (or ever? I’m not the PM here).

$ fly apps create
? App Name: axum-machine
? Select Organization: Amos Wenger (personal)
New app created: axum-machine

Because I don’t like specifying -a / --app every time, I’ll just edit fly.toml and change the app = line to read “axum-machine” instead of “hello-axum”. The rest of the file doesn’t matter for machines.

And then we can run the same Docker image, but as a fly machine:

$ fly machines run --port 80:8080/tcp:http --port 443:8080/tcp:http:tls --region cdg --size shared-cpu-1x hello-axum
Searching for image 'hello-axum' locally...
image found: sha256:3f93ceb9158f5e123253060d58d607f7c2a7e2f93797b49b4edbbbcc8e1b3840
==> Pushing image to fly
The push refers to repository [registry.fly.io/axum-machine]
02f75279051e: Layer already exists
4e38e245312b: Layer already exists
85ade8c6ca76: Layer already exists
ad6562704f37: Layer already exists
deployment-1655573668: digest: sha256:1ddfda6a6d8d84d804602653501db1c9720677b6e04e31008d3256c53ec09723 size: 1159
--> Pushing image done
Image: registry.fly.io/axum-machine:deployment-1655573668
Image size: 152 MB
Machine is launching...
Success! A machine has been successfully launched, waiting for it to be started
 Machine ID: 217814d9c9ee89
 Instance ID: 01G5VY2TKH0A1MQWSX05S1GPK8
 State: starting
Waiting on firecracker VM...
Waiting on firecracker VM...
Waiting on firecracker VM...
Machine started, you can connect via the following private ip
  fdaa:0:446c:a7b:5b66:d530:1a4b:2

Note that pushing the image was instant, since it already lived in fly’s registry.

You can see there’s no mention of allocations there or anything: it just starts one VM, as requested, and gives us its private IPv6 address.

That’ll only work if we set up private networking, which I can’t be bothered to do right now.

Instead, let’s check we can still SSH into it with the default SSH server:

$ fly ssh console
Connecting to top1.nearest.of.axum-machine.internal... complete
# whoami
root

So far so good.

$ fly status
App
  Name     = axum-machine
  Owner    = personal
  Version  = 0
  Status   = pending
  Hostname = axum-machine.fly.dev

Machines
ID              NAME                    REGION  STATE   CREATED
217814d9c9ee89  ancient-snowflake-1933  cdg     started 2022-06-18T17:34:30Z

That works too, and shows our machine running. Neat!

We also have:

$ fly m list
1 machines have been retrieved.
View them in the UI here (https://fly.io/apps/axum-machine/machines/)

axum-machine
ID              IMAGE                                   CREATED                 STATE   REGION  NAME                    IP ADDRESS
217814d9c9ee89  axum-machine:deployment-1655573668      2022-06-18T17:34:30Z    started cdg     ancient-snowflake-1933  fdaa:0:446c:a7b:5b66:d530:1a4b:2

..which has more detail.

Our app has no public IP right now, so hitting the domain with curl won’t work.

But we can allocate one - I’ll go IPv6 because I have it, and IPv4 addresses are a precious commodity.

$ fly ips allocate-v6
TYPE ADDRESS           REGION CREATED AT
v6   2a09:8280:1::48d5 global 1s ago

And now this works!

$ curl -i https://axum-machine.fly.dev
HTTP/2 200
content-type: text/plain; charset=utf-8
content-length: 16
date: Sat, 18 Jun 2022 17:39:27 GMT
server: Fly/09a15cede3 (2022-06-17)
via: 2 fly.io
fly-request-id: 01G5VYBX04VT7JDNQF626KGZ52-cdg

hello from axum

And here’s the neat thing: we can stop machines.

$ fly m stop 217814d9c9ee89
217814d9c9ee89 has been successfully stopped
$ fly m status 217814d9c9ee89
Success! A machine has been retrieved
 Machine ID: 217814d9c9ee89
 Instance ID: 01G5VY2TKH0A1MQWSX05S1GPK8
 State: stopped

Event Logs
MACHINE STATUS  EVENT TYPE      SOURCE  TIMESTAMP
stopped         exit            flyd    2022-06-18T17:40:38.517Z
stopping        stop            user    2022-06-18T17:40:35.245Z
started         start           flyd    2022-06-18T17:34:41.353Z
created         launch          user    2022-06-18T17:34:30.538Z

We can see in the event logs that it did stop alright.

And now if we try to run our curl again…

$ curl -i https://axum-machine.fly.dev
HTTP/2 200
content-type: text/plain; charset=utf-8
content-length: 16
date: Sat, 18 Jun 2022 17:41:46 GMT
server: Fly/09a15cede3 (2022-06-17)
via: 2 fly.io
fly-request-id: 01G5VYG3JYZFJ0871A26DCYGKT-cdg

hello from axum

It… still works?

Surprise! Shock! Awe! Predictable plot twist!

$ fly m status 217814d9c9ee89
Success! A machine has been retrieved
 Machine ID: 217814d9c9ee89
 Instance ID: 01G5VY2TKH0A1MQWSX05S1GPK8
 State: started

Event Logs
MACHINE STATUS  EVENT TYPE      SOURCE  TIMESTAMP
started         start           flyd    2022-06-18T17:41:46.075Z
starting        start           user    2022-06-18T17:41:45.695Z
stopped         exit            flyd    2022-06-18T17:40:38.517Z
stopping        stop            user    2022-06-18T17:40:35.245Z
started         start           flyd    2022-06-18T17:34:41.353Z
created         launch          user    2022-06-18T17:34:30.538Z

Huh, it started up again.

Amos don’t feign surprise. You worked on that feature. You know full well what it does.

Okay okay, alright. So if you hit a public port for an app that has machines, it’ll try to start a machine to handle the connection (raw TCP) / request (HTTP).

Which we can definitely use to our advantage.

What are you thinking? Expose port 22?

Well yes! Let’s try that.

$ fly m remove --force 217814d9c9ee89
machine 217814d9c9ee89 was found and is currently in started state, attempting to destroy...
217814d9c9ee89 has been destroyed

$ fly machines run --app axum-machine --port 22:22/tcp --region cdg --size shared-cpu-1x hello-axum
Searching for image 'hello-axum' locally...
image found: sha256:3f93ceb9158f5e123253060d58d607f7c2a7e2f93797b49b4edbbbcc8e1b3840
==> Pushing image to fly
The push refers to repository [registry.fly.io/axum-machine]
02f75279051e: Layer already exists
4e38e245312b: Layer already exists
85ade8c6ca76: Layer already exists
ad6562704f37: Layer already exists
deployment-1655574325: digest: sha256:1ddfda6a6d8d84d804602653501db1c9720677b6e04e31008d3256c53ec09723 size: 1159
--> Pushing image done
Image: registry.fly.io/axum-machine:deployment-1655574325
Image size: 152 MB
Machine is launching...
Success! A machine has been successfully launched, waiting for it to be started
 Machine ID: 5918536ef46383
 Instance ID: 01G5VYPX14END6ZPAHBB411304
 State: starting
Waiting on firecracker VM...
Waiting on firecracker VM...
Machine started, you can connect via the following private ip
  fdaa:0:446c:a7b:5adc:24:e81f:2

And then:

$ ssh -vvv -i /tmp/id_rsa root@axum-machine.fly.dev
OpenSSH_8.2p1 Ubuntu-4ubuntu0.5, OpenSSL 1.1.1f  31 Mar 2020
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolving "axum-machine.fly.dev" port 22
debug2: ssh_connect_direct
debug1: Connecting to axum-machine.fly.dev [2a09:8280:1::48d5] port 22.
debug1: Connection established.
debug1: identity file /tmp/id_rsa type -1
debug1: identity file /tmp/id_rsa-cert type 7
debug1: Local version string SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5

Mh, it’s stuck.

Let’s check the app logs…

$ 2022-06-18T17:47:43Z proxy[5918536ef46383] cdg [info]Machine not ready yet (11.072820024s since start requested)
2022-06-18T17:47:44Z proxy[5918536ef46383] cdg [info]Machine not ready yet (15.250221892s since start requested)
2022-06-18T17:47:45Z proxy[5918536ef46383] cdg [info]Machine not ready yet (33.956303928s since start requested)
2022-06-18T17:47:47Z proxy[5918536ef46383] cdg [info]Machine not ready yet (5.409191838s since start requested)
2022-06-18T17:47:48Z proxy[5918536ef46383] cdg [info]Machine not ready yet (10.043353267s since start requested)
2022-06-18T17:47:48Z proxy[5918536ef46383] cdg [info]Machine not ready yet (16.080325672s since start requested)
2022-06-18T17:47:50Z proxy[5918536ef46383] cdg [info]Machine not ready yet (38.962990983s since start requested)
^C%

Oh lord. Is nothing listening on port 22?

Let’s check…

$ fly ssh console
Connecting to top1.nearest.of.axum-machine.internal... complete
# ss -lpn
Netid    State     Recv-Q    Send-Q                          Local Address:Port          Peer Address:Port    Process
nl       UNCONN    0         0                                           0:0                         *
(cut)
nl       UNCONN    0         0                                          18:0                         *
tcp      LISTEN    0         0                                           *:8080                     *:*        users:(("hello-axum",pid=508,fd=9))
tcp      LISTEN    0         0            [fdaa:0:446c:a7b:5adc:24:e81f:2]:22                       *:*        users:(("hallpass",pid=509,fd=6))
v_str    LISTEN    0         0                                           3:10000                    *:*        users:(("init",pid=1,fd=9))

Ah! There is something listening on port 22, called “hallpass”. But it’s listening on… the private IPv6 address. Not the special 0.0.0.0 / :: address.

So that won’t work.

No problem then, we’ll just run our own SSH server!

Let’s also add a non-root user, for no good reason other than… that’s what I’m used to! I usually have a non-root user, with passwordless sudo, key-only authentication for SSH. It’s not really for security, more for not accidentally clobbering system files without sudo.

I also switched to an Ubuntu 20.04 base, something I feel a little more comfortable using than Debian:

# in `hello-axum/Dockerfile`
# syntax = docker/dockerfile:1.4

################################################################################
FROM ubuntu:20.04

RUN set -eux; \
		export DEBIAN_FRONTEND=noninteractive; \
	  apt update; \
		apt install --yes --no-install-recommends \
			bind9-dnsutils iputils-ping iproute2 curl ca-certificates htop \
			curl wget ca-certificates git-core \
			openssh-server openssh-client \
			sudo less zsh \
			; \
		apt clean autoclean; \
		apt autoremove --yes; \
		rm -rf /var/lib/{apt,dpkg,cache,log}/; \
		echo "Installed base utils!"

RUN set -eux; \
		useradd -ms /usr/bin/zsh amos; \
		usermod -aG sudo amos; \
		echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers; \
		echo "added user"

RUN set -eux; \
		echo "Port 22" >> /etc/ssh/sshd_config; \
		echo "AddressFamily inet" >> /etc/ssh/sshd_config; \
		echo "ListenAddress 0.0.0.0" >> /etc/ssh/sshd_config; \
		echo "PasswordAuthentication no" >> /etc/ssh/sshd_config; \
		echo "ClientAliveInterval 30" >> /etc/ssh/sshd_config; \
		echo "ClientAliveCountMax 10" >> /etc/ssh/sshd_config; \
		echo "SSH server set up"

USER amos

RUN set -eux; \
		mkdir ~/.ssh; \
		curl https://github.com/fasterthanlime.keys | tee -a ~/.ssh/authorized_keys

WORKDIR app

CMD ["bash", "-c", "sudo service ssh start; echo 'SSH server started'; sleep infinity"]

(Note that we’re also no longer building any Rust. Also, the ClientAliveInterval config there? Helps fight against fly’s default TCP idle timeout. It makes sure something is sent on the wire regularly as long as you’re connected, even if you’re not actively doing stuff in your SSH session.)

Let’s build it up again:

$ docker build -t hello-axum .

And create a new machine, making sure we expose port 22 this time.

Note: there’s a way to replace machine, by passing --id ID to fly m run, but at the time of this writing there’s state update issues around it, so until those get fixed, we’ll just go ahead and remove / recreate machines. It makes no difference other than the ID not being re-used.

$ fly m remove --force 5918536ef46383
(cut)

$ fly m run -p 22:22/tcp -r cdg -s shared-cpu-8x hello-axum
(cut)

And… voilà!

$ ssh axum-machine.fly.dev whoami
amos

Now we can actually log into the machine with VS Code, and it doesn’t complain!

All we have to do is add this to our local ~/.ssh/config

Host hello-axum
	HostName axum-machine.fly.dev

And then we get to pick which machine to connect to:

vscode’s SSH picker, showing three hosts: tails, brw, and hello-axum

And we can edit remote files, open arbitrary terminals, work just as we would normally do in VS Code, except… remotely.

vscode screenshot, focused on the integrated terminal. I ran whoami, it said amos. I ran unamae -a, it said 5.12.2, cat /etc/os-release shows Ubuntu 20.04.4, and cat /proc/cpuinfo shows 8 cores. df -h shows that /dev/vda is 8GB, with 543MB used

And latency is less of a concern than if we used something like vim over ssh, because it doesn’t need to wait for single keystrokes to be sent and then for the terminal to echo back. It’s a little more sophisticated than that.

Although, chances are there’s a fly.io region where latency isn’t too bad for you. For me it’s ~10ms:

$ ping6 axum-machine.fly.dev
PING6(56=40+8+8 bytes) [redacted] --> 2a09:8280:1::48d5
16 bytes from 2a09:8280:1::48d5, icmp_seq=0 hlim=52 time=15.792 ms
16 bytes from 2a09:8280:1::48d5, icmp_seq=1 hlim=52 time=13.238 ms
16 bytes from 2a09:8280:1::48d5, icmp_seq=2 hlim=52 time=8.906 ms
^C
--- axum-machine.fly.dev ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 8.906/12.645/15.792/2.842 ms

VS Code also knows how to forward ports automatically (and manually, if the detection fails), so if we start a little server over there:

$ sudo apt update && sudo apt install -y python
(cut)

$ cd /etc
$ python -m SimpleHTTPServer

vscode screenshot, running python’s SimpleHTTPServer in the built-in terminal. It says it’s listening on port 8000

Then VS Code automatically forwards the port to localhost:

vscode screenshot, the Ports tab is focused, it detected that a process was listening on port 8080 and is forwarding it. it says it’s accessible at localhost:8000

And we can open it from a local desktop browser:

chrome screenshot, showing Directory listing for /, which in this case is /etc, which would be a bad idea if this was exposed to the internet. Thankfully it’s just proxied over SSH

And with a couple VS Code extensions I like, like Resource Monitor, that’s a pretty ideal setup for me.

Heck, you could even probably figure out a way to mount some folder on the remote machine to your local machine, through something like sshfs, which is apparently no longer maintained? So maybe a maintained alternative instead.

Oh and you’d want a volume. You can create those with fly volumes (or fly vol for short) and mount them by passing --volume vol_name:/path/on/disk.

That CLI option is hidden from the flyctl docs right now. It’ll be our little secret!

Caveats are: it’s experimental, still, and you can only use one volume (as opposed to “classic” fly apps).

One thing that’s neat in that kind of environment is that you can run Docker easily! It’s kind of a hassle to add to our sample Dockerfile, but I’m typing this from my remote environment and I can promise it does in fact run docker:

$ docker info
(cut)
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  compose: Docker Compose (Docker Inc., v2.6.0)
(cut)
Server:
 Server Version: 20.10.17
 Storage Driver: overlay2
 (cut)
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.12.2
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.63GiB
 (cut)

Also, also! Again because this is a real VM, not just a docker container, we can install something like perf!

We have to build it from sources, but that’s no problem:

KERNEL_VERSION=$(uname -r | sed -r 's/(^[^-]+).*/\1/' | sed -r 's/\.0//g')
echo "Installing perf for kernel ${KERNEL_VERSION}"

mkdir ~/kernel-sources
cd ~/kernel-sources
curl --fail --location "https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-${KERNEL_VERSION}.tar.xz" | tar -xJ --strip-components=1
sudo apt install --yes libiberty-dev binutils-dev flex bison libelf-dev libunwind-dev liblzma-dev libzstd-dev libdw-dev
sudo make -C tools/ perf_install prefix=/usr/

And then we see where CPU time is being spent with perf top, for example!

vscode screenshot showing ‘perf top’ running in the integrated terminal. the top symbol is lapic_next_deadline. I’m not sure what it does. The machine is fairly idle at this point.

See Brendan Gregg’s perf page for more info.

There’s only one problem with our little setup, and it’s price-related.

The machine only stops if you call fly m stop --id MACHINE_ID.

I’d like it to stop when it’s just “not used for a while”.

And we can solve that problem… with Rust.

Ah! Finally.

A naive TCP proxy with tokio

This is what I use “in production”, so to speak. It’s absolutely not the only way to do this, and in fact we’ll see if we have time to do it a few other fun ways, but it’s simple and straight-forward, and I like it.

We don’t need axum for this, since we only want to speak TCP, not HTTP.

// in `hello-axum/src/main.rs`

use std::{
    process::Stdio,
    sync::{
        atomic::{AtomicU64, Ordering},
        Arc,
    },
    time::{Duration, Instant},
};

use tokio::{
    net::{TcpListener, TcpStream},
    process::Command,
    time::sleep,
};

#[tokio::main]
async fn main() {
    let status = Command::new("service")
        .arg("ssh")
        .arg("start")
        .stdin(Stdio::null())
        .stdout(Stdio::inherit())
        .stderr(Stdio::inherit())
        .status()
        .await
        .unwrap();
    assert!(status.success());

    let num_conns: Arc<AtomicU64> = Default::default();

    tokio::spawn({
        let num_conns = num_conns.clone();
        let mut last_activity = Instant::now();

        async move {
            loop {
                if num_conns.load(Ordering::SeqCst) > 0 {
                    last_activity = Instant::now();
                } else {
                    let idle_time = last_activity.elapsed();
                    println!("Idle for {idle_time:?}");
                    if idle_time > Duration::from_secs(60) {
                        println!("Stopping machine. Goodbye!");
                        std::process::exit(0)
                    }
                }
                sleep(Duration::from_secs(5)).await;
            }
        }
    });

    let listener = TcpListener::bind("[::]:2222").await.unwrap();
    while let Ok((mut ingress, _)) = listener.accept().await {
        let num_conns = num_conns.clone();
        tokio::spawn(async move {
            // We'll tell OpenSSH to listen on this IPv4 address.
            let mut egress = TcpStream::connect("127.0.0.2:22").await.unwrap();
            // did you know: loopback is 127.0.0.1/8, it goes all the way to
            // 127.255.255.254 (and 127.255.255.255 for broadcast)

            num_conns.fetch_add(1, Ordering::SeqCst);

            match tokio::io::copy_bidirectional(&mut ingress, &mut egress).await {
                Ok((to_egress, to_ingress)) => {
                    println!(
                        "Connection ended gracefully ({to_egress} bytes from client, {to_ingress} bytes from server)"
                    );
                }
                Err(err) => {
                    println!("Error while proxying: {}", err);
                }
            }
            num_conns.fetch_sub(1, Ordering::SeqCst);
        });
    }
}

Wait… stopping the machine is just std::process::exit?

Yeah! If our docker image’s “CMD” exits, the machine is stopped. In this case, it’s much easier to tell from the inside whether the machine needs to be stopped.

(If we could only tell from the outside, we’d use the machines API to stop it instead.)

Anyway, here’s our adjusted Dockerfile:

# in `hello-axum/Dockerfile`
# syntax = docker/dockerfile:1.4

################################################################################
# Let's just make our own Rust builder image based on ubuntu:20.04 to avoid
# any libc version problems
FROM ubuntu:20.04 AS builder

# Install base utils: curl to grab rustup, gcc + build-essential for linking.
# we could probably reduce that a bit but /shrug
RUN set -eux; \
		export DEBIAN_FRONTEND=noninteractive; \
		apt update; \
		apt install --yes --no-install-recommends \
			curl ca-certificates \
			gcc build-essential \
			; \
		apt clean autoclean; \
		apt autoremove --yes; \
		rm -rf /var/lib/{apt,dpkg,cache,log}/; \
		echo "Installed base utils!"

# Install rustup
RUN set -eux; \
		curl --location --fail \
			"https://static.rust-lang.org/rustup/dist/x86_64-unknown-linux-gnu/rustup-init" \
			--output rustup-init; \
		chmod +x rustup-init; \
		./rustup-init -y --no-modify-path; \
		rm rustup-init;

# Add rustup to path, check that it works
ENV PATH=${PATH}:/root/.cargo/bin
RUN set -eux; \
		rustup --version;

# Build some code!
# Careful: now we need to cache `/root/.cargo/` rather than `/usr/local/cargo`
# since rustup installed things differently than in the rust build image
WORKDIR /app
COPY . .
RUN --mount=type=cache,target=/app/target \
		--mount=type=cache,target=/root/.cargo/registry \
		--mount=type=cache,target=/root/.cargo/git \
		--mount=type=cache,target=/root/.rustup \
		set -eux; \
		rustup install stable; \
	 	cargo build --release; \
		objcopy --compress-debug-sections target/release/hello-axum ./hello-axum

################################################################################
FROM ubuntu:20.04

RUN set -eux; \
		export DEBIAN_FRONTEND=noninteractive; \
	  apt update; \
		apt install --yes --no-install-recommends \
			bind9-dnsutils iputils-ping iproute2 curl ca-certificates htop \
			curl wget ca-certificates git-core \
			openssh-server openssh-client \
			sudo less zsh \
			; \
		apt clean autoclean; \
		apt autoremove --yes; \
		rm -rf /var/lib/{apt,dpkg,cache,log}/; \
		echo "Installed base utils!"

RUN set -eux; \
		useradd -ms /usr/bin/zsh amos; \
		usermod -aG sudo amos; \
		echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers; \
		echo "added user"

# Note that we've changed the `ListenAddress` here from `0.0.0.0` to
# `127.0.0.2`. It's not really necessary but it's neat that 127.0.0.1 is a /8.
RUN set -eux; \
		echo "Port 22" >> /etc/ssh/sshd_config; \
		echo "AddressFamily inet" >> /etc/ssh/sshd_config; \
		echo "ListenAddress 127.0.0.2" >> /etc/ssh/sshd_config; \
		echo "PasswordAuthentication no" >> /etc/ssh/sshd_config; \
		echo "ClientAliveInterval 30" >> /etc/ssh/sshd_config; \
		echo "ClientAliveCountMax 10" >> /etc/ssh/sshd_config; \
		echo "SSH server set up"

USER amos

# Don't forget to change that if you don't want to give /me/ access to your
# remote dev env! Otherwise I'll ssh in there and fix your code 😈
RUN set -eux; \
		mkdir ~/.ssh; \
		curl https://github.com/fasterthanlime.keys | tee -a ~/.ssh/authorized_keys

WORKDIR app

COPY --from=builder /app/hello-axum ./hello-axum

# Because our top-level process starts the ssh daemon itself, for simplicity,
# let's run it as root. It could drop privileges after that but we already have
# passwordless sudo set up on the machine so double-shrug.
USER root
CMD ["./hello-axum"]

After a quick docker build -t hello-axum ., let’s start it up again, mapping edge port 22 to machine port 2222 instead:

$ fly m run -p 22:2222/tcp -r cdg -s shared-cpu-8x hello-axum
(cut)

And that’s basically all it takes! You can stop reading the article now!

Right now? But… your brand.

Oh I’ll keep going. But you could stop reading the article now. It’s missing a volume (explained above), which means right now every time it’s stopped, all our data disappears. So we definitely want that.

And because we can only have one volume, I have my “pseudo-init” process create a symlink from /var/lib/docker to /home/amos/docker, and change some permissions, also start the docker daemon, things like that.

Oh, I also have the PROXY protocol handler set up on my machine, which I parse with the ppp crate so I’m able to log the real client IPs that try to connect to my remote dev environment, even though these are all TCP connections.

Mh? As opposed to what?

Well if they were HTTP connections we’d get the real IP as the fly-client-ip header. But with TCP there’s not really a concept of “headers” / “custom metadata”, hence the PROXY protocol.

Oh and I didn’t really show it in action: here’s what the logs look like when I sign off for over a minute:

2022-06-19T19:24:18Z proxy[e148e394a72e89] cdg [info]Machine became reachable in 12.924218ms
2022-06-19T19:25:21Z app[e148e394a72e89] cdg [info]Connection ended gracefully (259121 bytes from client, 343897 bytes from server)
2022-06-19T19:25:22Z app[e148e394a72e89] cdg [info]Idle for 5.001673407s
2022-06-19T19:25:27Z app[e148e394a72e89] cdg [info]Idle for 10.002938289s
2022-06-19T19:25:32Z app[e148e394a72e89] cdg [info]Idle for 15.004794068s
2022-06-19T19:25:37Z app[e148e394a72e89] cdg [info]Idle for 20.005997194s
2022-06-19T19:25:42Z app[e148e394a72e89] cdg [info]Idle for 25.00744559s
2022-06-19T19:25:47Z app[e148e394a72e89] cdg [info]Idle for 30.008603681s
2022-06-19T19:25:52Z app[e148e394a72e89] cdg [info]Idle for 35.009784886s
2022-06-19T19:25:57Z app[e148e394a72e89] cdg [info]Idle for 40.010062697s
2022-06-19T19:26:02Z app[e148e394a72e89] cdg [info]Idle for 45.011428658s
2022-06-19T19:26:07Z app[e148e394a72e89] cdg [info]Idle for 50.012635341s
2022-06-19T19:26:12Z app[e148e394a72e89] cdg [info]Idle for 55.013845891s
2022-06-19T19:26:17Z app[e148e394a72e89] cdg [info]Idle for 60.014006722s
2022-06-19T19:26:17Z app[e148e394a72e89] cdg [info]Stopping machine. Goodbye!

Alright cool! Seems like our work is done here? Yet you wanted to continue, somehow?

Well yes, because see, if there’s one thing I’ve learned from amateur microbenchmarks doing bullshit comparisons between programming languages…

The salt. God, just sign off amos.

…it’s that syscalls are bad. Or slow. Whichever. And right now we do a bunch of syscalls:

$ e148e394a72e89% sudo strace -ff -p $(pidof hello-axum) 2>&1 | head -30
strace: Process 586 attached with 9 threads
[pid   599] futex(0x7f8ece5a9608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   598] epoll_wait(3,  <unfinished ...>
[pid   597] futex(0x7f8ece9b7608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   596] futex(0x7f8ecebbb608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   595] futex(0x7f8ecedbc608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   594] futex(0x7f8ecefbd608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   593] futex(0x7f8ecf1be608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   592] futex(0x7f8ecf3bf608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   586] futex(0x7f8ecf3c1448, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   598] <... epoll_wait resumed>[{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 336) = 1
[pid   598] recvfrom(12, "\30\317\332\271\354+\345:\3231\223\330\303\333x\177\347%\332[\316\241\235\307\277\200\34~\322\262s\337"..., 8192, 0, NULL, NULL) = 196
[pid   598] sendto(11, "\30\317\332\271\354+\345:\3231\223\330\303\333x\177\347%\332[\316\241\235\307\277\200\34~\322\262s\337"..., 196, MSG_NOSIGNAL, NULL, 0) = 196
[pid   598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid   598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 334) = 1
[pid   598] recvfrom(12, "\306\214e\204\242x,\315\34\3427\7\241{I\23f\251\321\235\36\262\35#V\372\246\344\277S\4\337"..., 8192, 0, NULL, NULL) = 212
[pid   598] sendto(11, "\306\214e\204\242x,\315\34\3427\7\241{I\23f\251\321\235\36\262\35#V\372\246\344\277S\4\337"..., 212, MSG_NOSIGNAL, NULL, 0) = 212
[pid   598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid   598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 333) = 1
[pid   598] recvfrom(12, "K\223\332\24h\346#N\37\234t\364-\326\v\221p\320\254\363m<\323\254\206\32\250'\362\346\207\246"..., 8192, 0, NULL, NULL) = 180
[pid   598] sendto(11, "K\223\332\24h\346#N\37\234t\364-\326\v\221p\320\254\363m<\323\254\206\32\250'\362\346\207\246"..., 180, MSG_NOSIGNAL, NULL, 0) = 180
[pid   598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid   598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 332) = 1
[pid   598] recvfrom(12, "L\361W\16\244\r\254\244\313\360\357\6n\v\26.\362\364\2068\24\262\23\345\22\263\365z]\37\5~"..., 8192, 0, NULL, NULL) = 164
[pid   598] sendto(11, "L\361W\16\244\r\254\244\313\360\357\6n\v\26.\362\364\2068\24\262\23\345\22\263\365z]\37\5~"..., 164, MSG_NOSIGNAL, NULL, 0) = 164
[pid   598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid   598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 330) = 1
[pid   598] recvfrom(12, "\362\275LJk\200\25*\367\22\370\345\214A\317nX\32L\217;\270gX{\254fZ\206sqL"..., 8192, 0, NULL, NULL) = 140
[pid   598] sendto(11, "\362\275LJk\200\25*\367\22\370\345\214A\317nX\32L\217;\270gX{\254fZ\206sqL"..., 140, MSG_NOSIGNAL, NULL, 0) = 140
[pid   598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

I’m only showing 30 lines here, but it scrolls by really fast.

I mean, yes. copy_bidirectional is reading data from a socket and copying to the other socket. And also reading from the other socket and copying it to the first socket. What did you expect?

Nothing, nothing, it’s a perfectly reasonable way to do I/O: we have user-space buffers that the kernel copies data into, and out of.

It’s just… we have more modern equivalents now.

Such as?

Well, I don’t know if this’ll work, but let’s give it a shot.

A wonderful TCP proxy with tokio-uring

Okay, I hope things will go really smoothly, because I don’t have a lot of time left to write this article.

io-uring is a different way to do I/O. I’ll explain what I’ve understood about it, and let The Internet correct me.

So the olden way is just do blocking syscalls. First you allocate a buffer, then you do a syscall (probably via some libc wrapper, like read or write), passing the address (and size) of the buffer you’ve allocated, and when it returns, if there were no errors, you have some data in your buffer!

And you can make that scale by having more threads! Since every thread blocks on… having more data to read from somewhere, or a write finishing (it might end up in a kernel buffer, but that’s fine).

And then there’s non-blocking I/O, which is much the same, except you set everything (file descriptors, sockets) to “non-blocking mode”, and when you call “read” and “write”, IF they don’t have data that’s immediately available, if the call “would block”, they return “EWOULDBLOCK”.

But then how do you know when to call read & write then? In a loop?

In a loop yes, but first you register your interest in some resource being “ready”, and then the only blocking syscall you do (from your async runtime) is one that waits for the next readiness event. (And there might be multiple events because multiple resources might become ready “at the same time”).

So from a single thread you do something like:

Open a, set as non-blocking, register interest
Open b, set as non-blocking, register interest
Wait for next readiness event
We have readiness events!
One of them is “a is ready to read from”
Try to read from a, it either succeeds immediately or returns EWOULDBLOCK (spurious wake-ups happen if I recall correctly?)
Wait for next readiness event
etc.

Okay, and that’s what regular tokio does?

Exactly. And then there’s io-uring, in which you don’t do “one syscall per I/O operation”, instead you submit items to a ring buffer and you can monitor completion from another ring buffer, at least I think so, I’m a bit fuzzy on the details still.

Ah, so, fewer syscalls overall! It sounds like a great fit for, like… highly concurrent stuff?

Yeah, which our thing is not… we’re just doing bidirectional copy between two sockets. So it’s probably not even much of an improvement, but hey, I’ve never tried it before, and all we need is a 5.11+ kernel and everyone’s always saying to try new stuff. THIS IS ME TRYING.

So we’ll just want to add tokio-uring:

# in `hello-axum/Cargo.toml`

[package]
name = "hello-axum"
version = "0.1.0"
edition = "2021"

[dependencies]
tokio = { version = "1.19.2", features = ["full"] }
tokio-uring = "0.3.0"

And then… it’s easier to explain this one in the comments, so read the comments!

use std::{
    process::Stdio,
    rc::Rc,
    sync::atomic::{AtomicU64, Ordering},
    time::{Duration, Instant},
};

// we can still use regular tokio stuff!
use tokio::{process::Command, time::sleep};

// but we want the uring versions of TCP sockets.
use tokio_uring::{
    buf::IoBuf,
    net::{TcpListener, TcpStream},
};

// can't use a regular main function because we need to start a
// `tokio-uring` runtime, which manages both the main tokio runtime
// and the uring runtime.
fn main() {
    // nobody's stopping us from defining our own main function though.
    tokio_uring::start(main_inner());
}

async fn main_inner() {
    // this is regular tokio stuff, still works fine.
    let status = Command::new("service")
        .arg("ssh")
        .arg("start")
        .stdin(Stdio::null())
        .stdout(Stdio::inherit())
        .stderr(Stdio::inherit())
        .status()
        .await
        .unwrap();
    assert!(status.success());

    let num_conns: Rc<AtomicU64> = Default::default();

    // We can still spawn stuff, but with tokio_uring's `spawn`. The future
    // we send doesn't have to be `Send`, since it's all single-threaded.
    tokio_uring::spawn({
        let num_conns = num_conns.clone();
        let mut last_activity = Instant::now();

        async move {
            loop {
                if num_conns.load(Ordering::SeqCst) > 0 {
                    last_activity = Instant::now();
                } else {
                    let idle_time = last_activity.elapsed();
                    println!("Idle for {idle_time:?}");
                    if idle_time > Duration::from_secs(60) {
                        println!("Stopping machine. Goodbye!");
                        std::process::exit(0)
                    }
                }
                sleep(Duration::from_secs(5)).await;
            }
        }
    });

    // tokio-uring's TcpListener wants a `SocketAddr`, not a `ToAddrs` or
    // something, so let's parse it ahead of time.
    let addr = "[::]:2222".parse().unwrap();

    // also it doesn't return a future?
    let listener = TcpListener::bind(addr).unwrap();
    while let Ok((ingress, _)) = listener.accept().await {
        println!("Accepted connection");

        let num_conns = num_conns.clone();
        tokio_uring::spawn(async move {
            // same deal, we need to parse first. if you're puzzled why there's
            // no mention of `SocketAddr` anywhere, it's inferred from what
            // `TcpStream::connect` wants.
            let egress_addr = "127.0.0.2:22".parse().unwrap();
            let egress = TcpStream::connect(egress_addr).await.unwrap();

            num_conns.fetch_add(1, Ordering::SeqCst);

            // `read` and `write` take owned buffers (more on that later), and
            // there's no "per-socket" buffer, so they actually take `&self`.
            // which means we don't need to split them into a read half and a
            // write half like we'd normally do with "regular tokio". Instead,
            // we can send a reference-counted version of it. also, since a
            // tokio-uring runtime is single-threaded, we can use `Rc` instead of
            // `Arc`.
            let egress = Rc::new(egress);
            let ingress = Rc::new(ingress);

            // We need to copy in both directions...
            let mut from_ingress = tokio_uring::spawn(copy(ingress.clone(), egress.clone()));
            let mut from_egress = tokio_uring::spawn(copy(egress.clone(), ingress.clone()));

            // Stop as soon as one of them errors
            let res = tokio::try_join!(&mut from_ingress, &mut from_egress);
            if let Err(e) = res {
                println!("Connection error: {}", e);
            }
            // Make sure the reference count drops to zero and the socket is
            // freed by aborting both tasks (which both hold a `Rc<TcpStream>`
            // for each direction)
            from_ingress.abort();
            from_egress.abort();

            num_conns.fetch_sub(1, Ordering::SeqCst);
        });
    }
}

async fn copy(from: Rc<TcpStream>, to: Rc<TcpStream>) -> Result<(), std::io::Error> {
    let mut buf = vec![0u8; 1024];
    loop {
        // things look weird: we pass ownership of the buffer to `read`, and we get
        // it back, _even if there was an error_. There's a whole trait for that,
        // which `Vec<u8>` implements!
        let (res, buf_read) = from.read(buf).await;
        // Propagate errors, see how many bytes we read
        let n = res?;
        if n == 0 {
            // A read of size zero signals EOF (end of file), finish gracefully
            return Ok(());
        }

        // The `slice` method here is implemented in an extension trait: it
        // returns an owned slice of our `Vec<u8>`, which we later turn back
        // into the full `Vec<u8>`
        let (res, buf_write) = to.write(buf_read.slice(..n)).await;
        res?;

        // Later is now, we want our full buffer back.
        // That's why we declared our binding `mut` way back at the start of `copy`,
        // even though we moved it into the very first `TcpStream::read` call.
        buf = buf_write.into_inner();
    }
}

A docker build, fly m remove --force, fly m run later…. it works!

Let’s take a look at the syscalls we have now:

59185369a43383% sudo strace -ff -p $(pidof hello-axum) 2>&1 | head -30
strace: Process 584 attached with 3 threads
[pid   584] epoll_wait(3, 0x56361d15d240, 1024, 951) = -1 EINTR (Interrupted system call)
[pid   584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 947) = 1
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 946) = 2
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 946) = 1
[pid   584] epoll_wait(3, 0x56361d15d240, 1024, 946) = -1 EINTR (Interrupted system call)
[pid   584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 945) = 1
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 945) = 2
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 943) = 1
[pid   584] epoll_wait(3, 0x56361d15d240, 1024, 943) = -1 EINTR (Interrupted system call)
[pid   584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 943) = 1
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 943) = 2
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 943) = 1
[pid   584] epoll_wait(3, 0x56361d15d240, 1024, 943) = -1 EINTR (Interrupted system call)
[pid   584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 942) = 1
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 941) = 2

Well, I see io_uring_enter in there, so it’s definitely Doing The Thing(TM), but it’s still scrolling by really fast.

Say, Amos?

Yes Bear?

Where do you think strace’s output is sent to?

Well, to my terminal. I’m looking right at it.

Right, and how does it end up in your terminal?

Oh. OHHHhhhhhhhhhhhhh right. As strace outputs more data it’s sent over SSH back to me, which results in more syscalls, which results in more strace output, which results in more data being sent over SSH back to me, which…

Okay if we really want to snoop without disrupting hello-axum’s operation (guess who’s regretting picking that name now!), we probably want to connect through hallpass instead:

$ fly ssh console
Connecting to top1.nearest.of.axum-machine.internal... complete
# bash
root@59185369a43383:/# strace -p $(pidof hello-axum) -ff
strace: Process 584 attached with 3 threads
[pid   584] epoll_wait(3, [], 1024, 565) = 0
[pid   584] epoll_wait(3, [], 1024, 21) = 0
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 2536) = 1
[pid   584] epoll_wait(3, [], 1024, 2536) = 0
[pid   584] epoll_wait(3, [], 1024, 2430) = 0
[pid   584] epoll_wait(3, [], 1024, 30) = 0
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 1631) = 1
[pid   584] epoll_wait(3, [], 1024, 1631) = 0
[pid   584] epoll_wait(3,

Better! Now we only see the SSH keepalives (set through ClientAliveInterval up there, your browser has a search function I believe in you), and whatever chatter is exchanged between vscode and vscode server.

Okay, now our work is like, super done. Right?

Mhhhhhhhh one could say so, yes. But we’re still doing some syscalls. You know what’s better than some syscalls?

…no syscalls?

An eBPF thingy with aya

See bear, we never actually do anything useful with the data we proxy back and forth. It’s not like we were doing HTTP, or if we were the SSH server ourselves.

We’re merely acting as a pipe, copying stuff back and forth in both directions.

And at first I thought about using syscalls like splice, but I realized, that’s not even a step up from io-uring. The io-uring solution is more general and completely replaces splice as far as I’m concerned.

All we really need to do, is know if there are packets being sent to/from OpenSSH’s port. If there are: we have activity, let’s stay up! If not, let’s go to sleep.

And you know what’s a great way to snoop on network traffic?

Is it.. is in the title? Is it BPF?

Yes! Or eBPF if you want to nitpick, but rest assured I have no plans to learn classic BPF any time soon.

So! Let’s get started. We’ll actually want two programs here:

A BPF program, that we’ll compile and link for the BPF target (it’ll end up being bytecode in an ELF file)
A regular Linux executable that will be in charge of loading our BPF program and attaching it to a network interface.

So let’s make a new project:

$ cd hello-axum/
$ cargo new flyremote-bpf
     Created binary (application) `flyremote-bpf` package

Add a couple dependencies:

# in `hello-axum/flyremote-bpf/Cargo.toml`

[package]
name = "flyremote-bpf"
version = "0.1.0"
edition = "2021"

# these are important too!
[profile.release]
lto = true
panic = "abort"
codegen-units = 1

[dependencies]
aya-bpf = { git = "https://github.com/aya-rs/aya", branch = "main" }
aya-log-ebpf = { git = "https://github.com/aya-rs/aya-log", branch = "main" }

Make sure we use nightly Rust for this one:

# in `hello-axum/flyremote-bpf/rust-toolchain.toml`

[toolchain]
channel="nightly"

There’s many different BPF program types. We could look at every single packet going through an interface (and mess with them), but here we really just need to listen to some events: when has a connection just been made? When has it been closed? And of course, on what address/port.

Here’s a good starting point:

// in `hello-axum/flyremote-bpf/src/main.rs`

// We won't have an allocator, so we can't bring the Rust standard library
// with us here. Besides, it probably wouldn't pass the BPF verifier.
#![no_std]
#![no_main]
use aya_bpf::{macros::sock_ops, programs::SockOpsContext};
// This works a little like `tracing`!
use aya_log_ebpf::info;

// The proc macro here does the heavy lifting. There's a bunch of linker fuckery
// at hand here that would be fascinating, but that I won't get into.
#[sock_ops(name = "flyremote")]
pub fn flyremote(ctx: SockOpsContext) -> u32 {
    match unsafe { try_flyremote(ctx) } {
        Ok(ret) => ret,
        Err(ret) => ret,
    }
}

// This gets called for every "socket operation" event.
unsafe fn try_flyremote(ctx: SockOpsContext) -> Result<u32, u32> {
    // transmuting from a `u32` to a `[u8; 4]` - should be okay.
    let local_ip4: [u8; 4] = core::mem::transmute([ctx.local_ip4()]);
    let remote_ip4: [u8; 4] = core::mem::transmute([ctx.remote_ip4()]);

    // log some stuff
    info!(
        &ctx,
        "op ({} {}), local port {}, remote port {}, local ip4 = {}.{}.{}.{} remote ip4 = {}.{}.{}.{}",
        op_name(ctx.op()),
        ctx.op(),
        ctx.local_port(),
        // this value is big-endian (but local_port is native-endian)
        u32::from_be(ctx.remote_port()),
        local_ip4[0],
        local_ip4[1],
        local_ip4[2],
        local_ip4[3],
        remote_ip4[0],
        remote_ip4[1],
        remote_ip4[2],
        remote_ip4[3],
    );

    // that's `BPF_SOCK_OPS_STATE_CB_FLAG` - so we receive "state_cb" events,
    // when a socket changes state.
    // this may fail, so it returns a `Result`, but I wouldn't know what to do
    // if it failed anyway.
    let _ = ctx.set_cb_flags(1 << 2);

    // if this is a "state_cb" event, show the old state and new state, which
    // are the first two arguments (we have up to 4 arguments)
    if ctx.op() == 10 {
        info!(
            &ctx,
            "state transition: {} {} => {} {}",
            ctx.arg(0),
            state_name(ctx.arg(0)),
            ctx.arg(1),
            state_name(ctx.arg(1)),
        );
    }

    Ok(0)
}

// gleaned from `bpf.h`
fn op_name(op: u32) -> &'static str {
    match op {
        0 => "void",
        1 => "timeout_init",
        2 => "rwnd_init",
        3 => "tcp_connect_cb",
        4 => "active_established_cb",
        5 => "passive_established_cb",
        6 => "needs_ecn",
        7 => "base_rtt",
        8 => "rto_cb",
        9 => "retrans_cb",
        10 => "state_cb",
        _ => "unknown",
    }
}

// gleaned from `bpf.h` too
fn state_name(op: u32) -> &'static str {
    match op {
        1 => "established",
        2 => "syn-sent",
        3 => "syn-recv",
        4 => "fin-wait1",
        5 => "fin-wait2",
        6 => "time-wait",
        7 => "close",
        8 => "close-wait",
        9 => "last-ack",
        10 => "listen",
        11 => "closing",
        12 => "new-syn-recv",
        _ => "unknown",
    }
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::hint::unreachable_unchecked() }
}

We can now build this for the BPF target using Rust nightly, doing a release build, and asking rustc to build libcore (the smol part of libstd) from scratch, since there’s no prebuilt target for bpfel-unknown-none as far as I know:

$ cd hello-axum/flyremote-bpf/
$ cargo +nightly build --verbose --target bpfel-unknown-none -Z build-std=core --release
       Fresh unicode-ident v1.0.1
       Fresh core v0.0.0 (/home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core)
       Fresh rustc-std-workspace-core v1.99.0 (/home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/rustc-std-workspace-core)
       Fresh proc-macro2 v1.0.39
       (cut)
       Fresh aya-log-ebpf v0.1.0 (https://github.com/aya-rs/aya-log?branch=main#1b0d3da1)
   Compiling flyremote-bpf v0.1.0 (/home/amos/bearcove/flyremote-bpf)
     Running `rustc --crate-name flyremote_bpf --edition=2021 src/main.rs (cut.)`
    Finished release [optimized] target(s) in 0.88s

We can inspect the result with llvm-objdump:

$ llvm-objdump -t target/bpfel-unknown-none/release/flyremote-bpf

target/bpfel-unknown-none/release/flyremote-bpf:        file format elf64-bpf

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 flyremote_bpf-8df4772bd494bad9
0000000000001890 l       sockops/flyremote      0000000000000000 LBB0_30
(cut)
0000000000000000 g     F sockops/flyremote      0000000000002a48 flyremote
0000000000000000 g     O maps   000000000000001c AYA_LOG_BUF
0000000000000040 g     F .text  0000000000000058 .hidden memcpy
000000000000001c g     O maps   000000000000001c AYA_LOGS
0000000000000000 g     F .text  0000000000000040 .hidden memset

To load it into the kernel, we’ll need a regular Linux executable. For us it’ll be hello-axum (really regretting that name now, it’s not axum-powered at all anymore).

We’ll need these dependencies:

# in `hello-axum/Cargo.toml`

[package]
name = "hello-axum"
version = "0.1.0"
edition = "2021"

[dependencies]
aya = { version = ">=0.11", features=["async_tokio"] }
aya-log = "0.1"
clap = { version = "3.1", features = ["derive"] }
color-eyre = "0.6.1"
log = "0.4"
simplelog = "0.12"
tokio = { version = "1.19.2", features = ["full"] }

And then: we include the bytecode as part of our executable, do some syscalls to load it, grab a handle to the flyremote program in there, attach it to the default cgroup (/sys/fs/cgroup/unified), also set up some logging, and off we go:

// in `hello-axum/src/main.rs`

use aya::programs::SockOps;
use aya::{include_bytes_aligned, Bpf};
use aya_log::BpfLogger;
use clap::Parser;
use log::info;
use simplelog::{ColorChoice, ConfigBuilder, LevelFilter, TermLogger, TerminalMode};
use tokio::signal;

#[derive(Debug, Parser)]
struct Opt {
    #[clap(short, long, default_value = "/sys/fs/cgroup/unified")]
    cgroup_path: String,
}

#[tokio::main]
async fn main() -> color_eyre::Result<()> {
    color_eyre::install()?;

    let opt = Opt::parse();

    TermLogger::init(
        LevelFilter::Debug,
        ConfigBuilder::new()
            .set_target_level(LevelFilter::Error)
            .set_location_level(LevelFilter::Error)
            .build(),
        TerminalMode::Mixed,
        ColorChoice::Auto,
    )?;

    let mut bpf = Bpf::load(include_bytes_aligned!(
        "../flyremote-bpf/target/bpfel-unknown-none/release/flyremote-bpf"
    ))?;
    BpfLogger::init(&mut bpf)?;
    let program: &mut SockOps = bpf.program_mut("flyremote").unwrap().try_into()?;
    let cgroup = std::fs::File::open(opt.cgroup_path)?;
    program.load()?;
    program.attach(cgroup)?;

    info!("Waiting for Ctrl-C...");
    signal::ctrl_c().await?;
    info!("Exiting...");

    Ok(())
}

Before we get into Dockerfile business again, I can try to run it on my local machine:

$ cargo run
   Compiling hello-axum v0.1.0 (/home/amos/bearcove/hello-axum)
    Finished dev [unoptimized + debuginfo] target(s) in 3.48s
     Running `target/debug/hello-axum`
09:34:56 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:106] [FEAT PROBE] BPF program name support: true
09:34:56 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:109] [FEAT PROBE] BTF support: false
Error:
   0: map error
   1: the `bpf_map_freeze` syscall failed with code -1
   2: Operation not permitted (os error 1)

Location:
   src/main.rs:35

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Oh, uh.

Seems like you need to be root to run this?

Well… there is such a thing as unprivileged BPF, just need to tune a few sysctls and reboot, but most distributions disable it by default because it has complicated security implications.

It doesn’t really matter to us in our microVM, since we’re already running the top-level process as root, so, let’s just run it as root here too:

$ cargo build --quiet && sudo ./target/debug/hello-axum
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:106] [FEAT PROBE] BPF program name support: true
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:109] [FEAT PROBE] BTF support: true
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:113] [FEAT PROBE] BTF func support: true
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:116] [FEAT PROBE] BTF global func support: true
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:122] [FEAT PROBE] BTF var and datasec support: true
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:128] [FEAT PROBE] BTF float support: false
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:131] [FEAT PROBE] BTF decl_tag support: false
09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:134] [FEAT PROBE] BTF type_tag support: false
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:270] relocating program flyremote function flyremote
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:327] relocating call to callee address 64 (relocation)
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:348] callee is memcpy
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:270] relocating program flyremote function memcpy
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:363] finished relocating program flyremote function memcpy
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:327] relocating call to callee address 64 (relocation)
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:348] callee is memcpy
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:327] relocating call to callee address 64 (relocation)
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:348] callee is memcpy
09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:363] finished relocating program flyremote function flyremote
09:37:04 [INFO] hello_axum: [src/main.rs:44] Waiting for Ctrl-C...

The program waits for new sockops events. I have an SSH server running here on 127.0.0.2 port 22, so if I try to connect to it from another tab:

$ ssh 127.0.0.2
(omitted: fingerprint stuff, etc.)

…we see:

09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (tcp_connect_cb 3), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (rwnd_init 2), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (timeout_init 1), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (needs_ecn 6), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (rwnd_init 2), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (timeout_init 1), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (needs_ecn 6), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:52 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 2 syn-sent => 1 established
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (active_established_cb 4), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (passive_established_cb 5), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:52 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 3 syn-recv => 1 established

Note that we see both the “client” and the “server” socket here being established, since I’m connecting from localhost.

The client socket goes from 127.0.0.1:59920 to 127.0.0.2:22 (where the SSH server lives), and the server socket goes from 127.0.0.2:22 to 127.0.0.1:59920, the exact opposite. tcp_connect_cb is only for “outgoing” connections, and eventually we get an active_established_cb. For the other direction, we eventually have a passive_established_cb.

So, for our actual program, we’ll want to watch for passive_established_cb with local port 22.

And when I disconnect, we get this:

09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 1 established => 4 fin-wait1
09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 1 established => 8 close-wait
09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 4 fin-wait1 => 5 fin-wait2
09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2
09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 5 fin-wait2 => 7 close
09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 8 close-wait => 9 last-ack
09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1
09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 9 last-ack => 7 close

Closing a TCP socket is more involved than it first appears! You can check the TCP state diagram if it’s not burned into your brain by virtue of suffering it for long enough.

This is tremendous progress, and if it runs on our fly.io machine, we’ll have finally achieved O(0) syscalls.

Or will we? Won’t we need to… query some state, still?

Yes! How do you think logging works, currently?

I don’t know, it’s all hidden by magic proc macros.

Well, I’ll tell you! When we did llvm-objdump on our BPF program, we saw these lines, which correspond to exported symbols:

0000000000000000 g     O maps   000000000000001c AYA_LOG_BUF
000000000000001c g     O maps   000000000000001c AYA_LOGS

And in our “driver” program we had this line:

    BpfLogger::init(&mut bpf)?;

I bet if we dig into what this BpfLogger thing actually does, we’ll have our answer:

// in `aya-log/src/lib.rs`


impl BpfLogger {
    /// Starts reading log records created with `aya-log-ebpf` and logs them
    /// with the default logger. See [log::logger].
    pub fn init(bpf: &mut Bpf) -> Result<BpfLogger, Error> {
        BpfLogger::init_with_logger(bpf, DefaultLogger {})
    }
    /// Starts reading log records created with `aya-log-ebpf` and logs them
    /// with the given logger.
    pub fn init_with_logger<T: Log + 'static>(
        bpf: &mut Bpf,
        logger: T,
    ) -> Result<BpfLogger, Error> {
        let logger = Arc::new(logger);
        let mut logs: AsyncPerfEventArray<_> = bpf.map_mut("AYA_LOGS")?.try_into()?;

        for cpu_id in online_cpus().map_err(Error::InvalidOnlineCpu)? {
            let mut buf = logs.open(cpu_id, None)?;

            let log = logger.clone();
            tokio::spawn(async move {
                let mut buffers = (0..10)
                    .map(|_| BytesMut::with_capacity(LOG_BUF_CAPACITY))
                    .collect::<Vec<_>>();

                loop {
                    let events = buf.read_events(&mut buffers).await.unwrap();

                    #[allow(clippy::needless_range_loop)]
                    for i in 0..events.read {
                        let buf = &mut buffers[i];
                        log_buf(buf, &*log).unwrap();
                    }
                }
            });
        }

        Ok(BpfLogger {})
    }
}

AhAH! THere it is! It’s listening for changes to a “perf event array”! I guess that’s one way for BPF programs to communicate with userspace.

Userspace?

Our “regular Linux executable” that instructs the kernel to load our BPF program and what to attach it to etc.

Oh, right.

And we’ve also learned that aya-log messages can be 8K at most. Interesting.

So… I guess we can just do the same to send messages when a new connection (to port 22) is established and when it’s closed?

Let’s try it!

Our new program becomes this:

// in `hello-axum/flyremote-bpf/src/main.rs`

#![no_std]
#![no_main]
use aya_bpf::{
    macros::{map, sock_ops},
    maps::PerfEventArray,
    programs::SockOpsContext,
};
use aya_log_ebpf::info;

// This is what we'll send over our "perf event array"
#[repr(C)]
pub struct ConnectionEvent {
    // 1 = connected, 2 = disconnected
    pub action: u32,
}

// We could probably make a Rust enum work here, but I don't feel like fighting
// the verifier too much today.
const ACTION_CONNECTED: u32 = 1;
const ACTION_DISCONNECTED: u32 = 2;

// Just like aya-log does, but this only has events we care about
#[map(name = "EVENTS")]
static mut EVENTS: PerfEventArray<ConnectionEvent> =
    PerfEventArray::<ConnectionEvent>::with_max_entries(1024, 0);

#[sock_ops(name = "flyremote")]
pub fn flyremote(ctx: SockOpsContext) -> u32 {
    match unsafe { try_flyremote(ctx) } {
        Ok(ret) => ret,
        Err(ret) => ret,
    }
}

unsafe fn try_flyremote(ctx: SockOpsContext) -> Result<u32, u32> {
    if ctx.local_port() != 22 {
        // don't care if it's not SSH-server-relevant
        return Ok(0);
    }

    // constants gotten from `bpf.h`
    const OP_PASSIVE_ESTABLISHED_CB: u32 = 5;
    const OP_STATE_CB: u32 = 10;

    const STATE_CLOSE: u32 = 7;

    match ctx.op() {
        OP_PASSIVE_ESTABLISHED_CB => {
            info!(&ctx, "Connection accepted!");

            // subscribe to `state_cb` events
            let _ = ctx.set_cb_flags(1 << 2);

            // notify userspace
            let ev = ConnectionEvent {
                action: ACTION_CONNECTED,
            };
            EVENTS.output(&ctx, &ev, 0);
        }
        OP_STATE_CB => {
            let new_state = ctx.arg(1);
            if new_state == STATE_CLOSE {
                info!(&ctx, "Connection closed!");

                // notify userspace
                let ev = ConnectionEvent {
                    action: ACTION_DISCONNECTED,
                };
                EVENTS.output(&ctx, &ev, 0);
            }
        }
        _ => {
            // ignore
        }
    }

    Ok(0)
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::hint::unreachable_unchecked() }
}

We can try it out without changing our userspace program, thanks to aya-log.

$ (cd flyremote-bpf && cargo +nightly build --verbose --target bpfel-unknown-none -Z build-std=core --release)
(cut)
$ cargo build --quiet && sudo ./target/debug/hello-axum
(cut)
10:01:30 [INFO] hello_axum: [src/main.rs:45] Waiting for Ctrl-C...
10:01:36 [INFO] flyremote_bpf: [src/main.rs:49] Connection accepted!
10:01:37 [INFO] flyremote_bpf: [src/main.rs:63] Connection closed!

Wonderful! Now if we just subscribe to our perf event array…

// in `hello-axum/src/main.rs`

use std::{
    fs::File,
    sync::{
        atomic::{AtomicU64, Ordering},
        Arc,
    },
    time::{Duration, Instant},
};

use aya::{include_bytes_aligned, util::online_cpus, Bpf};
use aya::{maps::perf::AsyncPerfEventArray, programs::SockOps};
use aya_log::BpfLogger;
use bytes::BytesMut;
use tokio::{signal, time::sleep};

// This is what we'll receive over our "perf event array". We'd normally
// have a "common" crate we pull from both the bpf-nostd world and the
// userspace-yesstd world, but for this example we're just copying it wholesale.
#[repr(C)]
#[derive(Clone, Copy)]
pub struct ConnectionEvent {
    // 1 = connected, 2 = disconnected
    pub action: u32,
}

const ACTION_CONNECTED: u32 = 1;
const ACTION_DISCONNECTED: u32 = 2;

// Because we used `repr(C)` we can treat it as POD (plain old data)
unsafe impl aya::Pod for ConnectionEvent {}

#[tokio::main]
async fn main() -> color_eyre::Result<()> {
    color_eyre::install()?;

    let mut bpf = Bpf::load(include_bytes_aligned!(
        "../flyremote-bpf/target/bpfel-unknown-none/release/flyremote-bpf"
    ))?;
    BpfLogger::init(&mut bpf)?;

    let num_conns: Arc<AtomicU64> = Default::default();

    let mut perf_array = AsyncPerfEventArray::try_from(bpf.map_mut("EVENTS")?)?;
    for cpu_id in online_cpus()? {
        let mut buf = perf_array.open(cpu_id, None)?;

        let num_conns = num_conns.clone();
        tokio::spawn(async move {
            let mut buffers = (0..10)
                .map(|_| BytesMut::with_capacity(1024))
                .collect::<Vec<_>>();

            loop {
                let events = buf.read_events(&mut buffers).await.unwrap();
                for buf in &mut buffers[..events.read] {
                    let ev = unsafe { (buf.as_ptr() as *const ConnectionEvent).read_unaligned() };
                    match ev.action {
                        ACTION_CONNECTED => {
                            println!("Connection accepted!");
                            num_conns.fetch_add(1, Ordering::SeqCst);
                        }
                        ACTION_DISCONNECTED => {
                            println!("Connection closed!");
                            num_conns.fetch_sub(1, Ordering::SeqCst);
                        }
                        unknown => {
                            println!("Unknown action: {}", unknown);
                        }
                    }
                }
            }
        });
    }

    tokio::spawn(async move {
        let mut last_activity = Instant::now();

        loop {
            if num_conns.load(Ordering::SeqCst) > 0 {
                last_activity = Instant::now();
            } else {
                let idle_time = last_activity.elapsed();
                println!("Idle for {idle_time:?}");
                if idle_time > Duration::from_secs(60) {
                    println!("Stopping machine. Goodbye!");
                    std::process::exit(0)
                }
            }
            sleep(Duration::from_secs(5)).await;
        }
    });

    let program: &mut SockOps = bpf.program_mut("flyremote").unwrap().try_into()?;
    let cgroup = File::open("/sys/fs/cgroup/unified")?;
    program.load()?;
    program.attach(cgroup)?;

    println!("Waiting for Ctrl-C...");
    signal::ctrl_c().await?;
    println!("Exiting...");

    Ok(())
}

It does the thing!

$ cargo build --quiet && sudo ./target/debug/hello-axum
Idle for 19.527µs
Waiting for Ctrl-C...
(in another terminal: ssh 127.0.0.2)
Connection accepted!
(in another terminal: Ctrl-D to close out of SSH)
Connection closed!
Idle for 5.001708865s
Idle for 10.003602174s
Idle for 15.004068679s
Idle for 20.005524839s
Idle for 25.006052848s
Idle for 30.007529878s
Idle for 35.008838041s
Idle for 40.010259957s
Idle for 45.011105232s
Idle for 50.012581951s
Idle for 55.013017848s
Idle for 60.01454433s
Stopping machine. Goodbye!

Impressive. Very nice. But how the heck is that gonna run “in the cloud”?

Well Bear, I’ve been doing all of this “in the cloud” already, as I’ve said before. I’ve tested that code from my actual remote dev environment on fly.io. Because the kernel they provide is BPF-enabled. That’s how I know it’ll work.

“All we need to do (TM)”, is to re-add starting the SSH server before we do any of this, and have it listen on… let’s say port 2222 this time, on IPv4.

Wait, why are we changing ports?

Well because we need to listen on 0.0.0.0 this time. Remember, fly-proxy is the one exposing “edge port 22” to some internal port inside the VM. It’s actually connecting through the eth0 interface:

$ ip addr show dev eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc pfifo_fast state UP group default qlen 1000
    link/ether de:ad:f9:57:5a:f4 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.210/29 brd 172.19.0.215 scope global eth0
       valid_lft forever preferred_lft forever
    inet 172.19.0.211/29 brd 172.19.0.215 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 2604:1380:71:1403:0:ae09:fd49:1/127 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fdaa:0:6964:a7b:5b66:ae09:fd49:2/112 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::dcad:f9ff:fe57:5af4/64 scope link
       valid_lft forever preferred_lft forever

…but let’s not rely on that. We really do want OpenSSH to listen on 0.0.0.0 (“all interfaces”) now, and port 22 is already taken by hallpass for a given interface, so the bind would fail.

So let’s have it listen on port 2222 instead (left as an exercise to the reader), and make sure to adjust our BPF program so it monitors connections to port 2222 instead of 22 as well (also left as an exercise).

The Dockerfile, I’ll help with. But really it’s what we’ve already done, just in Docker: we only need to mess with the very last part of the “builder” target

# in `hello-axum/Dockerfile`

# syntax = docker/dockerfile:1.4

################################################################################
FROM ubuntu:20.04 AS builder

# (omitted: install base utils, install rustup, add rustup to path)

# Build some code!
WORKDIR /app
COPY . .
RUN --mount=type=cache,target=/app/target \
		--mount=type=cache,target=/root/.cargo/registry \
		--mount=type=cache,target=/root/.cargo/git \
		--mount=type=cache,target=/root/.rustup \
		set -eux; \
		rustup install nightly; \
		rustup component add rust-src --toolchain nightly; \
		cargo +nightly install bpf-linker; \
		(cd flyremote-bpf && cargo +nightly build --verbose --target bpfel-unknown-none -Z build-std=core --release); \
	 	cargo +nightly build --release; \
		objcopy --compress-debug-sections target/release/hello-axum ./hello-axum

# (omitted: other targets)

Couple notes here: we need nightly to build the BPF program anyway, so I’ve chosen to use it to build the main program too, just so we don’t have to install two different toolchains. We need the rust-src component so we add that. This is definitely the wrong place to install bpf-linker (we want to do that before COPY . .), but you can figure that one out.

By now you should know how to forcefully remove the old machine and run a new one, so I won’t show that part - instead I’ll show some logs that provide IRREFUTABLE PROOF that it’s working as intended:

$ fly logs
(cut)
2022-06-20T10:32:59Z app[73d8d463ce7589] cdg [info]Idle for 55.013385687s
2022-06-20T10:33:04Z app[73d8d463ce7589] cdg [info]Idle for 60.014628076s
2022-06-20T10:33:04Z app[73d8d463ce7589] cdg [info]Stopping machine. Goodbye!
2022-06-20T10:35:09Z proxy[73d8d463ce7589] cdg [info]Machine started in 378.712588ms
2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info] * Starting OpenBSD Secure Shell server sshd
2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]   ...done.
2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]Idle for 351ns
2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]Waiting for Ctrl-C...
2022-06-20T10:35:09Z proxy[73d8d463ce7589] cdg [info]Machine became reachable in 162.022275ms
2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]Connection accepted!
2022-06-20T10:35:24Z app[73d8d463ce7589] cdg [info]Connection closed!
2022-06-20T10:35:29Z app[73d8d463ce7589] cdg [info]Idle for 5.000919043s
2022-06-20T10:35:33Z app[73d8d463ce7589] cdg [info]Connection accepted!
2022-06-20T10:35:37Z app[73d8d463ce7589] cdg [info]Connection closed!
2022-06-20T10:35:39Z app[73d8d463ce7589] cdg [info]Idle for 5.001182727s
2022-06-20T10:35:44Z app[73d8d463ce7589] cdg [info]Idle for 10.00136033s
2022-06-20T10:35:49Z app[73d8d463ce7589] cdg [info]Idle for 15.002518752s
2022-06-20T10:35:54Z app[73d8d463ce7589] cdg [info]Idle for 20.003763466s
2022-06-20T10:35:59Z app[73d8d463ce7589] cdg [info]Idle for 25.00504594s
2022-06-20T10:36:04Z app[73d8d463ce7589] cdg [info]Idle for 30.006257662s
2022-06-20T10:36:09Z app[73d8d463ce7589] cdg [info]Idle for 35.007527924s
2022-06-20T10:36:14Z app[73d8d463ce7589] cdg [info]Idle for 40.008764092s
2022-06-20T10:36:19Z app[73d8d463ce7589] cdg [info]Idle for 45.010007093s
2022-06-20T10:36:24Z app[73d8d463ce7589] cdg [info]Idle for 50.011195821s
2022-06-20T10:36:29Z app[73d8d463ce7589] cdg [info]Idle for 55.011391438s
2022-06-20T10:36:34Z app[73d8d463ce7589] cdg [info]Idle for 60.01265106s
2022-06-20T10:36:34Z app[73d8d463ce7589] cdg [info]Stopping machine. Goodbye!

Ah… bliss.

Aren’t you forgetting something?

Oh, right! How many syscalls are we actually doing? Since it’s the Sole Measure of goodness in this wretched world?

To test this, I’m going to connect from VSCode, and run something that constantly outputs text, like… ok maybe not yes, but watch whoami.

And then from fly ssh console, we just run strace…

$ strace -ff -p $(pidof hello-axum)
strace: Process 583 attached with 9 threads
[pid   596] futex(0x7fb0baa0d618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   595] futex(0x7fb0bac11618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   594] epoll_wait(3,  <unfinished ...>
[pid   592] futex(0x7fb0bb21d618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   591] futex(0x7fb0bb41e618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   590] futex(0x7fb0bb622618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   589] futex(0x7fb0bb823618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   583] futex(0x7fb0bb825498, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   593] futex(0x7fb0bb019618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   594] <... epoll_wait resumed>[], 1024, 1718) = 0
[pid   594] epoll_wait(3, [], 1024, 30) = 0
[pid   594] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   594] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 1824) = 1
[pid   594] epoll_wait(3, [], 1024, 1824) = 0
[pid   594] epoll_wait(3, [], 1024, 3134) = 0
[pid   594] epoll_wait(3, [], 1024, 37) = 0
[pid   594] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   594] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 919) = 1
[pid   594] epoll_wait(3, [], 1024, 919) = 0
[pid   594] epoll_wait(3,

Huh. It’s vewwy vewwy quiet. Let’s try to disconnect?

[], 1024, 2775) = 0
[pid   594] epoll_wait(3,
[{EPOLLIN, {u32=2, u64=2}}, {EPOLLIN, {u32=10, u64=10}}], 1024, 2173) = 2
[pid   594] futex(0x7fb0bb823618, FUTEX_WAKE_PRIVATE, 1) = 1
[pid   589] <... futex resumed>)        = 0
[pid   594] write(1, "Connection closed!\n", 19 <unfinished ...>
[pid   589] futex(0x7fb0bb019618, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   594] <... write resumed>)        = 19
[pid   589] <... futex resumed>)        = 1
[pid   593] <... futex resumed>)        = 0
[pid   594] futex(0x7fb0bae15618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   593] epoll_wait(3,  <unfinished ...>
[pid   589] futex(0x7fb0bb823618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid   593] <... epoll_wait resumed>[], 1024, 1370) = 0
[pid   593] epoll_wait(3, [], 1024, 49) = 0
[pid   593] write(1, "Idle for 5.001369468s\n", 22) = 22
[pid   593] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   593] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 1869) = 1
[pid   593] epoll_wait(3,

And now reconnect?

[], 1024, 1869) = 0
[pid   593] epoll_wait(3, [], 1024, 3070) = 0
[pid   593] epoll_wait(3, [], 1024, 57) = 0
[pid   593] write(1, "Idle for 10.002899968s\n", 23) = 23
[pid   593] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid   593] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 964) = 1
[pid   593] epoll_wait(3, [], 1024, 964) = 0
[pid   593] epoll_wait(3, [{EPOLLIN, {u32=2, u64=2}}, {EPOLLIN, {u32=10, u64=10}}], 1024, 4031) = 2
[pid   593] futex(0x7fb0bb823618, FUTEX_WAKE_PRIVATE, 1) = 1
[pid   593] write(1, "Connection accepted!\n", 21) = 21
[pid   589] <... futex resumed>)        = 0
[pid   593] epoll_wait(3,  <unfinished ...>
[pid   589] futex(0x7fb0bb823618, FUTEX_WAIT_PRIVATE, 1, NULL

Wonderful.

Say amos? Isn’t that completely overkill?

Oh, almost definitely. I don’t think anybody needs to optimize their remote dev environment that much: we definitely could’ve lived with the additional overhead of manually copying bufferfuls from kernelspace to userspace and back.

But isn’t cool that we can?

Sure, sure. But I mean… using BPF for this?

Oh yeah that’s hella overkill too. And again I’m just flexing at this point (teaching, I mean teaching).

Yeah I was gonna say… there’s probably other ways to know what connections OpenSSH’s process has anyway, right? Doesn’t the kernel keep track of that?

It 100% absolutely does, and so there’s a much simpler solution for all that, that we could’ve done with a bash script.

Ooooh are we writing a bash script?

Ohhhhh no. No no no. Not today.

Simply polling procfs

See, the kernel is kind enough to expose all kinds of information simply through procfs. So if you’re willing to read a couple files, you can get this, for example:

root@73d8d463ce7589:/# cat "/proc/$(pidof -s sshd)/net/tcp"
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
   0: 00000000:08AE 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 10244 1 000000007d0c6713 100 0 0 10 0
   1: 0100007F:8019 00000000:0000 0A 00000000:00000000 00:00000000 00000000  1000        0 7194 1 000000000b0ca858 100 0 0 10 0
   2: 1A0413AC:08AE 47851C93:E44A 01 00000000:00000000 02:000A7F5E 00000000     0        0 13366 2 00000000255b322e 21 4 30 10 50
   3: 0100007F:8019 0100007F:E76C 01 00000000:00000000 00:00000000 00000000  1000        0 7682 1 000000004d3b51d2 21 4 2 10 -1
   4: 0100007F:8019 0100007F:E76A 01 00000000:00000000 00:00000000 00000000  1000        0 7680 1 00000000eb58348a 20 4 30 10 -1
   5: 0100007F:E76C 0100007F:8019 01 00000000:00000000 00:00000000 00000000  1000        0 13405 1 00000000d63dc6ce 21 4 18 10 -1
   6: 0100007F:E76A 0100007F:8019 01 00000000:00000000 00:00000000 00000000  1000        0 13403 1 0000000057ff1c3b 20 4 10 10 -1

Mhhh what are we looking at.

Well it says so! It’s a list of TCP connections: there’s the local address, the remote address, and some more information.

Okay but those addresses look… I mean.. they look a little like MAC addresses?

Oh no no, they’re just hex. So for example we have sshd listen on port 2222 now, and in hex, that’s?

Oh, 0x8AE!

Exactly! And so that one is our active connection:

root@73d8d463ce7589:/# grep "08AE" "/proc/$(pidof -s sshd)/net/tcp"
   0: 00000000:08AE 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 10244 1 000000007d0c6713 100 0 0 10 0
   2: 1A0413AC:08AE 47851C93:E44A 01 00000000:00000000 02:000A50A1 00000000     0        0 13366 2 00000000255b322e 21 4 30 10 50

Wait, no, there’s two of them.

Ah, maybe the first one is the listening socket?

Probably! That seems right since it has all-zero values for a bunch of fields.

In fact, let’s try disconnecting…

root@73d8d463ce7589:/# grep "08AE" "/proc/$(pidof -s sshd)/net/tcp"
   0: 00000000:08AE 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 10244 1 000000007d0c6713 100 0 0 10 0

Yup! Sounds about right.

Okay, I see what you mean about the bash script now. Because it’s just a file, we could like… grep for 08AE, exclude the 00000000 thingy, and then have a counter that keeps track of how long it’s been without connections… and then exit if it’s been a while…

Yes, and since bash is notoriously awful at both strings AND numbers (do not @ me), and time, and conditionals, everything in fact, and because there’s a neat procfs crate, and because this is still my blog and I stil make the rules here, I’m just gonna write Rust.

Let’s gooooo:

$ cargo add procfs
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding procfs v0.12.0 to dependencies.
             Features:
             + chrono
             + flate2
             - backtrace

Our Cargo.toml becomes this, very lean, no need for tokio anymore:

# in `hello-axum/Cargo.toml`

[package]
name = "hello-axum"
version = "0.1.0"
edition = "2021"

[dependencies]
color-eyre = "0.6.1"
procfs = "0.12.0"

// in `hello-axum/src/main.rs`

use std::{
    process::{Command, Stdio},
    thread::sleep,
    time::{Duration, Instant},
};

use procfs::net::TcpState;

fn main() -> color_eyre::Result<()> {
    color_eyre::install()?;

    let status = Command::new("service")
        .arg("ssh")
        .arg("start")
        .stdin(Stdio::null())
        .stdout(Stdio::inherit())
        .stderr(Stdio::inherit())
        .status()?;
    assert!(status.success());

    let mut last_activity = Instant::now();

    loop {
        if count_conns()? > 0 {
            last_activity = Instant::now();
        } else {
            let idle_time = last_activity.elapsed();
            println!("Idle for {idle_time:?}");
            if idle_time > Duration::from_secs(60) {
                println!("Stopping machine. Goodbye!");
                std::process::exit(0)
            }
        }
        sleep(Duration::from_secs(5));
    }
}

fn count_conns() -> color_eyre::Result<usize> {
    Ok(procfs::net::tcp()?
        .into_iter()
        // don't count listen, only established
        .filter(|entry| matches!(entry.state, TcpState::Established))
        .filter(|entry| matches!(entry.local_address.port(), 2222))
        .count())
}

The build section of our Dockerfile is simple again, too:

# Build some code!
WORKDIR /app
COPY . .
RUN --mount=type=cache,target=/app/target \
		--mount=type=cache,target=/root/.cargo/registry \
		--mount=type=cache,target=/root/.cargo/git \
		--mount=type=cache,target=/root/.rustup \
		set -eux; \
		rustup install stable; \
	 	cargo build --release; \
		objcopy --compress-debug-sections target/release/hello-axum ./hello-axum

And just as before… it works:

$ fly logs
2022-06-20T11:06:04Z proxy[06e82219b74987] cdg [info]Machine started in 492.723163ms
2022-06-20T11:06:04Z app[06e82219b74987] cdg [info] * Starting OpenBSD Secure Shell server sshd
2022-06-20T11:06:04Z app[06e82219b74987] cdg [info]   ...done.
2022-06-20T11:06:04Z app[06e82219b74987] cdg [info]Idle for 125.596µs
2022-06-20T11:06:04Z proxy[06e82219b74987] cdg [info]Machine became reachable in 81.028264ms
2022-06-20T11:06:14Z app[06e82219b74987] cdg [info]Idle for 5.000259697s
2022-06-20T11:06:19Z app[06e82219b74987] cdg [info]Idle for 10.001721387s
2022-06-20T11:06:24Z app[06e82219b74987] cdg [info]Idle for 15.002035967s
2022-06-20T11:06:29Z app[06e82219b74987] cdg [info]Idle for 20.002387236s
2022-06-20T11:06:34Z app[06e82219b74987] cdg [info]Idle for 25.002711273s
2022-06-20T11:06:39Z app[06e82219b74987] cdg [info]Idle for 30.003033687s
2022-06-20T11:06:44Z app[06e82219b74987] cdg [info]Idle for 35.003318902s
2022-06-20T11:06:49Z app[06e82219b74987] cdg [info]Idle for 40.003605699s
2022-06-20T11:06:54Z app[06e82219b74987] cdg [info]Idle for 45.003890203s
2022-06-20T11:06:59Z app[06e82219b74987] cdg [info]Idle for 50.004175949s
2022-06-20T11:07:04Z app[06e82219b74987] cdg [info]Idle for 55.004478496s
2022-06-20T11:07:09Z app[06e82219b74987] cdg [info]Idle for 60.004781203s
2022-06-20T11:07:09Z app[06e82219b74987] cdg [info]Stopping machine. Goodbye!

This is boring. Everything just works.

Right?? And yet I’m thrilled about it. I’m thrilled to know that, with Rust, I can do any of:

blocking I/O with threads
non-blocking I/O with tokio
io-uring with tokio-uring
eBPF with aya
just read procfs

And they all, boringly enough, work just fine. And I feel good about leaving them in production and never touching them again.

To me, that is the dream.

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

My ideal Rust workflow

Oct 26, 2021

57 min #rust · #release-engineering

Writing Rust is pretty neat. But you know what’s even neater? Continuously testing Rust, releasing Rust, and eventually, shipping Rust to production. And for that, we want more than plug-in for a code editor.

We want… a workflow.