Remote development with Rust on fly.io
👋 This page was last updated ~2 years ago. Just so you know.
Disclaimer:
At the time of this writing, I benefit from the fly.io "Employee Free Tier". I don't pay for side projects hosted there "within reasonable limits". The project discussed here qualifies for that.
Why you might want a remote dev environment
Fearmongering aside — and Cthulhu knows there's been a bunch, since this unfortunate tweet — there's a bunch of reasons to want a remote dev environment.
For example, maybe the only computer you have available simply isn't performant enough to perform the tasks you want to perform. Such as: building a lot a lot of Rust. Like the compiler, or rust-analyzer, or maybe you're quirky like me and you maintain like 7 big proprietary codebases solo just so your blog slash video platform is juuuuust the way you like it.
So instead of buying a fuck-you CPU (like a Threadripper, or something more consumery like the latest Ryzens), maybe you rent a big cloud machine that you can turn on and off as needed. Just for the big stuff.
If you need a bunch of CPU to work directly on the Rust project itself btw, the Rust foundation has a program for that.
Disclaimer:
At the time of this writing, I am not affiliated with the Rust foundation in any way.
Another good reason is that you invested in another incompatible brand of fuck-you CPUs, like an M1, or an M2 (where will it end?), which is arm64. But you need to deploy for Linux x86_64, for example.
In that case, you can either pretend that macOS arm64 and Linux x86_64 are "close enough" by virtue of being unixy, and maintain your codebase for essentially two targets, leaving nasty surprises for later, OR you can work in a VM. Or an x86_64 Docker container, which, on macOS, runs in a VM.
And that's fine, but it'll make most laptops take off. I don't know what the Apple Silicon situation is like (somebody send me one!) but I'm assuming even with it being a revolution, emulating x86 on it is still not as fast as an actual x86 processor. I might be wrong. I'm sure I'll find out soon enough.
Another good reason is that you're maybe running a team of developers, it's not just you. And you want to be able to 1) hire folks from all around the world, 2) hire folks who don't already have a chonky computer at home, 3) not have to ship them a chonky computer (which you then have to get back or gift to them), 4) onboard them quickly, giving them a consistent dev environment where everything works the first time.
Me, I don't have a team. Well... I kinda do: there's an army of folks who make up for the absence of an editor by reporting spelling errors, inaccuracies etc. every time I release an article. But all the code side is just me. And I have several computers that are more than up to the task of compiling a metric ton of Rust regularly, as I tend to do. None of them run Linux on the desktop, although I exclusively ship stuff for Linux (both for my day job and for my side gig), but it's nothing VMs can't fix, and I have a bunch of those.
My reason is much simpler, and probably silly: as I'm writing this, it's 37 degrees Celsius outside (99 Fahrenheit). Not only is my big desktop tower not the most environment-friendly machine I have available, it also puts out quite a bit of heat. And it adds up.
Also I like to be able to write from different places, so that means having two computers up, and using one (a laptop) to remote into the other (a desktop), which makes the energy+heat problem even worse and also now we get into "how do you manage to make the desktop stay awake while you're SSH'd into the Linux VM that runs on it, but make it go to sleep very quickly when it's not".
You know the second you hit publish someone will tell you that it's not that hard and if you just do (two pages ensue)
Yes yes I know. There's still the heat and electricity bill problem.
So anyway, since fly.io pays me during weekdays to make their platform better (more precisely fly-proxy, the thing that all TCP traffic goes through right now, except for IPv6 private networking, and which I've recently written about on their blog), I have an employee discount kinda deal.
And the deal is I straight up don't pay for compute there. So here's the big old disclaimer: I DON'T HAVE TO PAY FOR ANY OF THIS. ALSO THEY'RE PAYING ME, but for other stuff. This is week-end amos. The people paying me to write this are my patrons - so I'm gonna give my honest opinion there, as a non-paying customer of fly.io.
Are we good with the disclaimer part? Everybody clear?
So, you're a shill, temporarily cosplaying as "not a shill", and you found a way to sneak Rust in there because you can't help yourself.
I mean, sure... but also I just think it's pretty neat? And I get to explain a bunch of stuff about how it works, and why it works particularly well for me (even if I did have to pay for it).
What the heck is fly.io for, even
This is the part that could be misconstrued as marketing, but really I just want y'all to be clear on what we're working with. It took me a hot minute to really understand what fly was all about, even after going through the docs several times. After hacking on it from the inside, and several side-projects later (such as my video thing), I think I get it.
So essentially what fly lets you do is push your code there, and then boom it runs in the cloud.
Ah, so like Heroku.
Kinda sorta but also no. Heroku has this whole buildpacks thing, and I guess fly supports it too in some way, but I'm not interested in that part at all so I just don't know enough to answer that question.
How I personally see it is that I build a Docker image with anything (x86_64 only right now), push it to fly (so they have their own image registry), and boom it runs in the cloud.
Ah, so like Google Cloud Run or whatever the AWS equivalent is called.
Again, kinda sorta but also no, because it doesn't actually run in Docker. It runs in a Firecracker microVM. Which is a real VM, so you don't have the usual limitations of containers.
Such as?
Let's circle back to that later.
First let me show you how to deploy a thing there. Our thing will be Rust, because my blog my rules, as usual.
So, simplest HTTP server I can think of:
$ cargo new hello-axum Created binary (application) `hello-axum` package $ cd hello-axum $ cargo add tokio +full $ cargo add axum
// in `hello-axum/src/main.rs` use axum::{response::IntoResponse, routing::get, Router, Server}; #[tokio::main] async fn main() { let app = Router::new().route("/", get(index)); let addr = "[::]:8080".parse().unwrap(); println!("Listening on http://{addr}"); Server::bind(&addr) .serve(app.into_make_service()) .await .unwrap(); } async fn index() -> impl IntoResponse { "hello from axum\n" }
Does it work?
$ cargo run Compiling cfg-if v1.0.0 Compiling pin-project-lite v0.2.9 Compiling bytes v1.1.0 Compiling itoa v1.0.2 Compiling once_cell v1.12.0 Compiling smallvec v1.8.0 Compiling scopeguard v1.1.0 Compiling fnv v1.0.7 (cut) Finished dev [unoptimized + debuginfo] target(s) in 2.55s Running `target/debug/hello-axum` Listening on http://[::]:8080
Then, from another shell:
$ curl -i 0:8080 HTTP/1.1 200 OK content-type: text/plain; charset=utf-8 content-length: 23 date: Sat, 18 Jun 2022 15:38:47 GMT hello from axum
Okay, time to build it as a Docker image:
# in `hello-axum/Dockerfile` # syntax = docker/dockerfile:1.4 FROM rust:1.61.0-slim-bullseye AS builder WORKDIR /app COPY . . RUN --mount=type=cache,target=/app/target \ --mount=type=cache,target=/usr/local/cargo/registry \ --mount=type=cache,target=/usr/local/cargo/git \ --mount=type=cache,target=/usr/local/rustup \ set -eux; \ rustup install stable; \ cargo build --release; \ objcopy --compress-debug-sections target/release/hello-axum ./hello-axum ################################################################################ FROM debian:11.3-slim RUN set -eux; \ export DEBIAN_FRONTEND=noninteractive; \ apt update; \ apt install --yes --no-install-recommends bind9-dnsutils iputils-ping iproute2 curl ca-certificates htop; \ apt clean autoclean; \ apt autoremove --yes; \ rm -rf /var/lib/{apt,dpkg,cache,log}/; \ echo "Installed base utils!" WORKDIR app COPY --from=builder /app/hello-axum ./hello-axum CMD ["./hello-axum"]
Also let's exclude /target
from the Docker context, we don't need it there:
# in `hello-axum/.dockerignore` /target
Also make sure you have docker buildkit enabled, it's what you want 99% of the time nowadays and it supports all the nice stuff.
$ docker build -t hello-axum . [+] Building 2.6s (6/13) => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 990B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 79B0.0s => [internal] load metadata for docker.io/library/debian:11.3-slim 1.5s => [internal] load metadata for docker.io/library/rust:1.61.0-slim-bullseye (cut) => [stage-1 4/4] COPY --from=builder /app/hello-axum ./hello-axum => exporting to image => => exporting layers => => writing image sha256:a6ae1acc11eb094218c1abb4da319a4e53ee93844d98d94c912698d75e2136e0 => => naming to docker.io/library/hello-axum
There's a couple of neat tricks in the Dockerfile above: toolchains,
dependencies and the target
folder are cached, it compresses debug symbols
(which aren't there because I forgot to show you to do [profile.release] debug = "1"
in the Cargo.toml
but whatever), it uses separate stages
Anyway the resulting image is 152MB for me, not great, not terrible, probably could go with containerless or base it on something like Alpine, but that comes with other tradeoffs, and this isn't a Docker tutorial, let's move on.
Point is, it works:
$ docker run --detach --rm --name hello-axum --publish 8080:8080 hello-axum 78dde4d1e52dfc199e6ebdeda1b65192ba534cef6d5ea8ba169106e348eb4749 $ curl 0:8080 hello from axum $ docker kill hello-axum hello-axum
That is, if you remembered to kill the other process with Ctrl-C. Otherwise it won't be able to bind on port 8080.
Time to deploy it to fly. I'll spare you the pre-onboarding, you need to install flyctl, log in with your fly account, blah blah let's create an app:
$ fly apps create ? App Name: hello-axum ? Select Organization: Amos Wenger (personal) New app created: hello-axum
Save the config it autogenerated from us:
$ fly config save -a hello-axum Wrote config file fly.toml
Which gives us this:
# in `hello-axum/fly.toml` # fly.toml file generated for hello-axum on 2022-06-18T16:02:28Z app = "hello-axum" kill_signal = "SIGINT" kill_timeout = 5 processes = [] [env] [experimental] allowed_public_ports = [] auto_rollback = true [[services]] http_checks = [] internal_port = 8080 processes = ["app"] protocol = "tcp" script_checks = [] [services.concurrency] hard_limit = 25 soft_limit = 20 type = "connections" [[services.ports]] force_https = true handlers = ["http"] port = 80 [[services.ports]] handlers = ["tls", "http"] port = 443 [[services.tcp_checks]] grace_period = "1s" interval = "15s" restart_limit = 0 timeout = "2s"
Those are the defaults, it's neat to have them written down. I do want to expose stuff on ports 80 and 443, those seem like reasonable limits for a toy app, I do want to redirect port 80 to 443 automatically, and my internal port is 8080 already, the only thing missing is what image to push, so, adding a new section:
[build] image = "hello-axum"
And we're off:
$ fly deploy --local-only ==> Verifying app config --> Verified app config ==> Building image Searching for image 'hello-axum' locally... image found: sha256:3f93ceb9158f5e123253060d58d607f7c2a7e2f93797b49b4edbbbcc8e1b3840 ==> Pushing image to fly The push refers to repository [registry.fly.io/hello-axum] 02f75279051e: Pushed 4e38e245312b: Pushed 85ade8c6ca76: Pushed ad6562704f37: Pushed deployment-1655568332: digest: sha256:1ddfda6a6d8d84d804602653501db1c9720677b6e04e31008d3256c53ec09723 size: 1159 --> Pushing image done ==> Creating release --> release v2 created --> You can detach the terminal anytime without stopping the deployment ==> Monitoring deployment 1 desired, 1 placed, 1 healthy, 0 unhealthy [health checks: 1 total, 1 passing] --> v0 deployed successfully
Because the image was available locally it just pushed it to the fly docker registry (there's also a remote builder feature which I've never used).
And then it created an instance of the app for us... somewhere? Which was eventually allocated on a worker, and.. what feels like an eternity but was almost definitely under a minute, our app is running.
$ curl https://hello-axum.fly.dev -i HTTP/2 200 content-type: text/plain; charset=utf-8 content-length: 16 date: Sat, 18 Jun 2022 16:07:39 GMT server: Fly/09a15cede3 (2022-06-17) via: 2 fly.io fly-request-id: 01G5VS3SPBQ4XY4M7VZXTG8KBJ-cdg hello from axum
And we can see some new headers here! Also it's using http/2, and you can tell I
deployed to production yesterday from the server
header.
There's a lot of other cool stuff happening there like built-in metrics but these aren't really relevant. There's also a whole web UI that shows that yes we do have an app running, shows some graphs, even the logs live-streaming etc. but it's easier to just talk about the CLI in this format so there, logs:
$ fly logs 2022-06-18T16:05:53Z runner[fdca430e] cdg [info]Starting instance 2022-06-18T16:05:53Z runner[fdca430e] cdg [info]Configuring virtual machine 2022-06-18T16:05:53Z runner[fdca430e] cdg [info]Pulling container image 2022-06-18T16:05:58Z runner[fdca430e] cdg [info]Unpacking image 2022-06-18T16:05:59Z runner[fdca430e] cdg [info]Preparing kernel init 2022-06-18T16:05:59Z runner[fdca430e] cdg [info]Configuring firecracker 2022-06-18T16:06:00Z runner[fdca430e] cdg [info]Starting virtual machine 2022-06-18T16:06:00Z app[fdca430e] cdg [info][ 0.026893] PCI: Fatal: No config space access function found 2022-06-18T16:06:00Z app[fdca430e] cdg [info]Starting init (commit: e21acb3)... 2022-06-18T16:06:00Z app[fdca430e] cdg [info]Preparing to run: `./hello-axum` as root 2022-06-18T16:06:00Z app[fdca430e] cdg [info]2022/06/18 16:06:00 listening on [fdaa:0:446c:a7b:ae02:fdca:430e:2]:22 (DNS: [fdaa::3]:53) 2022-06-18T16:06:00Z app[fdca430e] cdg [info]Listening on http://[::]:8080
I don't know what the PCI
message means, don't ask me. init
is fly's custom
init program, it's also all Rust, here's an old snapshot of
it, and we can see our app running.
We even know where it's running (cdg = Charles de Gaulle Airport Paris), the closest region to where I live.
There's a bunch of other useful CLI commands:
$ fly status App Name = hello-axum Owner = personal Version = 0 Status = running Hostname = hello-axum.fly.dev Deployment Status ID = 70edc42a-9bac-0b2a-803c-c0cec866929a Version = v0 Status = successful Description = Deployment completed successfully Instances = 1 desired, 1 placed, 1 healthy, 0 unhealthy Instances ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED fdca430e app 0 cdg run running 1 total, 1 passing 0 6m50s ago
$ fly vm status fdca430e Instance ID = fdca430e Process = Version = 0 Region = cdg Desired = run Status = running Health Checks = 1 total, 1 passing Restarts = 0 Created = 7m10s ago Recent Events TIMESTAMP TYPE MESSAGE 2022-06-18T16:05:52Z Received Task received by client 2022-06-18T16:05:52Z Task Setup Building Task Directory 2022-06-18T16:06:00Z Started Task started by client Checks ID SERVICE STATE OUTPUT 3df2415693844068640885b45074b954 tcp-8080 passing TCP connect 172.19.2.2:8080: Success Recent Logs
And so, yeah, that's classic fly!
With fly regions set
we can decide where our app should run, with fly scale count
we can change how many instances are running, with fly scale vm
we can
switch VM types (it's very smol right now), for example here's what I have to serve
my videos:
$ fly status App Name = tube Owner = personal Version = 164 Status = running Hostname = tube.fly.dev Instances ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED c1f4d89e app 164 sjc run running 0 2022-06-14T22:02:22Z b74afb02 app 164 yyz run running 0 2022-05-09T21:07:53Z 8b5ca0c7 app 164 gru run running 0 2022-05-09T21:07:15Z 0b08b59c app 164 ams run running 0 2022-05-09T21:06:30Z 6389589a app 164 cdg run running 0 2022-05-09T21:05:42Z ea94e5ef app 164 nrt run running 0 2022-05-09T21:03:21Z 79ecda2b app 164 iad run running 1 2022-05-09T21:02:51Z 26ea7a65 app 164 yyz run running 0 2022-05-09T21:02:10Z $ fly scale show VM Resources for tube VM Size: shared-cpu-1x VM Memory: 512 MB Count: 8 Max Per Region: Not set
Trying to make them regret their lifetime employee discount thing, are we?
Hey, rules were made to be tested okay.
Oh yeah also there's volumes! Because instances get created and destroyed and some data you don't want to lose so you stick it in volumes:
$ fly volumes list ID STATE NAME SIZE REGION ZONE ATTACHED VM CREATED AT vol_18l524y8j0er7zmp created tubecache 40GB ams 8aba 0b08b59c 1 month ago vol_18l524y8j5jr7zmp created tubecache 40GB yyz d33c 26ea7a65 1 month ago vol_okgj54580lq4y2wz created tubecache 40GB iad ddf7 1 month ago vol_x915grnzw8krn70q created tubecache 40GB nrt 0e0f ea94e5ef 1 month ago vol_ke628r68g3n4wmnp created tubecache 40GB sjc c0a5 c1f4d89e 1 month ago vol_02gk9vwnej1v76wm created tubecache 40GB cdg 0e8c 6389589a 1 month ago vol_8zmjnv8em85vywgx created tubecache 40GB yyz 5e29 b74afb02 1 month ago vol_ypkl7vz8k5evqg60 created tubecache 40GB iad f6cb 79ecda2b 1 month ago vol_0nylzre12814qmkp created tubecache 40GB gru 2824 8b5ca0c7 1 month ago vol_52en7r1jpl9rk6yx created tubecache 40GB syd 039e 1 month ago vol_w1q85vgn7jj4zdxe created tubecache 40GB lhr ad0e 1 month ago
And you said you didn't want to do marketing? Where's the remote dev environment?
I'm getting to it! So back to our hello-axum
app, we can SSH into it:
$ fly ssh console Connecting to top1.nearest.of.hello-axum.internal... complete # whoami root #
And this is where things get interesting, because this is where you start to notice this isn't actually a container running in docker.
Let's hop into bash
to run a few more commands:
root@fdca430e:/# cat /proc/cpuinfo | grep -i mhz cpu MHz : 2799.998
So we got a single shared core, that's the default.
$ root@fdca430e:/# uname -a Linux fdca430e 5.12.2 #1 SMP Thu Jun 2 14:26:49 UTC 2022 x86_64 GNU/Linux
Linux 5.12.2, that's from.. April 2021, still relatively recent. Recent enough that we could play with io-uring if we wanted.
No no no no stay on task.
But yeah our Docker image doesn't provide a kernel, only a userland. The kernel is whatever fly gives us. Still, we have a kernel. Which will come in handy later.
For now, it's time to review why a "classic fly.io app" doesn't really work as a remote dev environment.
For starters, we can't scale to zero.
$ fly scale count 0 Count changed to 0
(A full minute elapses)
$ fly status App Name = hello-axum Owner = personal Version = 1 Status = dead Hostname = hello-axum.fly.dev Instances ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
Okay I guess you can... but as you can see the app's status is now "dead". And
fly-proxy
suddenly doesn't know the app exists anymore.
So if we try to curl it:
$ curl -v https://hello-axum.fly.dev * Trying 2a09:8280:1::1:4857:443... * TCP_NODELAY set * Connected to hello-axum.fly.dev (2a09:8280:1::1:4857) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.3 (OUT), TLS handshake, Client hello (1):
We'll get stuck there, and eventually:
* OpenSSL SSL_connect: Connection reset by peer in connection to hello-axum.fly.dev:443 * Closing connection 0 curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to hello-axum.fly.dev:443
That's because fly-proxy
is waiting around for another instance to be up:
which may happen if you have a release strategy where your app temporarily has
zero instances between deploys. (You probably don't want to do that, have at
least one instance up to avoid downtime).
We can start it back up with fly scale
, but it's... not fast.
$ time bash -c 'fly scale count 1; while true; do curl https://hello-axum.fly.dev --max-time 1 && exit 0 || echo "still starting..."; done' Count changed to 1 curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1000 milliseconds with 0 out of 0 bytes received still starting... curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received still starting... hello from axum bash -c 0.14s user 0.07s system 2% cpu 8.421 total
That was a particularly unlucky run. I've had it start back up in ~3 seconds while I figured out the right bash incantation.
But still, it makes perfect sense given what's actually happening:
- An API call is made to fly.io
- Which creates a nomad job
- Which eventually gets allocated somewhere by nomad
- Which informs fly.io that it's up
- Also some consul services are created
- Eventually fly-proxy knows about the services
- And because services are how it knows that an app even exists right now, it knows about the app again, and can start routing traffic there.
Wait wait wait, are you supposed to share that many internal details?
Oh don't worry, they did it for me.
So, is that suitable for a remote dev environment?
The SSH server problem
fly.io provides an SSH server, but it's not really good enough. Let's look at why.
First off, fly ssh console
handles all the dirty details for you.
If we want to use vanilla ssh
we have to do a bunch more stuff. We could use
fly proxy
to map a local port to the remote instance's SSH server port.
$ fly proxy 2200:22 hello-axum.internal Proxying local port 2200 to remote [hello-axum.internal]:22
(Here we only have one instance. If we had multiple I'd need to use d76c732a.vm.hello-axum.internal
)
Then issue an SSH keypair:
$ fly ssh issue ? Select organization: Amos Wenger (personal) ? Email address for user to issue cert: [redacted] !!!! WARNING: We're now prompting you to save an SSH private key and certificate !!!! !!!! (the private key in "id_whatever" and the certificate in "id_whatever-cert.pub"). !!!! !!!! These SSH credentials are time-limited and handling them in files is clunky; !!!! !!!! consider running an SSH agent and running this command with --agent. Things !!!! !!!! should just sort of work like magic if you do. !!!! ? Path to store private key: /tmp/id_rsa Wrote 24-hour SSH credential to /tmp/id_rsa, /tmp/id_rsa-cert.pub
(The note about --agent
is right now, it worked for me in the past, just not
today and I couldn't debug it)
And then we can connect:
$ ssh -i /tmp/id_rsa localhost -p 2200 whoami (cut: fingerprint stuff) root
But we can't, for example, proxy some ports over it:
$ ssh -i /tmp/id_rsa localhost -p 2200 -L 8080:localhost:8080 #
(Then, from another terminal)
$ curl localhost:8080 curl: (56) Recv failure: Connection reset by peer
I'm not sure why! ssh -vvv
isn't super helpful there. But when I tried
connecting from VSCode, by shoving this in my ~/.ssh/config
:
Host hello-axum HostName localhost Port 2200 IdentityFile /tmp/id_rsa
It was unhappy too:
[19:27:01.785] Remote server is listening on 43703 [19:27:01.785] Parsed server configuration: {"serverConfiguration":{"remoteListeningOn":{"port":43703},"osReleaseId":"debian","arch":"x86_64","webUiAccessToken":"","sshAuthSock":"","display":"","tmpDir":"/tmp","platform":"linux","connectionToken":"1a11a111-1111-111a-aaa1-a11a11111111"},"downloadTime":3407,"installTime":1447,"serverStartTime":99,"installUnpackCode":"success"} [19:27:01.786] Persisting server connection details to /Users/amos/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-ssh/vscode-ssh-host-9a297f3d-30d9c6cd9483b2cc586687151bcbcd635f373630-0.82.1/data.json [19:27:01.788] Starting forwarding server. localPort 54022 -> socksPort 54016 -> remotePort 43703 [19:27:01.788] Forwarding server listening on 54022 [19:27:01.788] Waiting for ssh tunnel to be ready [19:27:01.789] Tunneled 43703 to local port 54022 [19:27:01.789] Resolved "ssh-remote+hello-axum" to "127.0.0.1:54022" [19:27:01.790] [Forwarding server 54022] Got connection 0 [19:27:01.796] ------ [19:27:01.807] [Forwarding server 54022] Got connection 1 [19:27:01.809] Failed to set up socket for dynamic port forward to remote port 43703: connect ECONNREFUSED 127.0.0.1:54016. Is the remote port correct? [19:27:01.809] > local-server-1> ssh child died, shutting down [19:27:01.809] Failed to set up socket for dynamic port forward to remote port 43703: Socket closed. Is the remote port correct? [19:27:01.812] Local server exit: 0
So, we're going to need a better SSH server. And also, personally, I don't want
to have to run fly proxy
(the flyctl command, not the TCP/HTTP proxy running
in front of fly apps) every time I want to connect to my remote dev environment.
Oh and the SSH keys expire after 24 hours. And you can't configure the SSH server, since it's built-in (unless I missed something).
So combine that with slow start/stop times and things don't look too good.
(Especially since, it's not clear how we'd start/stop individual instances.
Playing with fly regions
and fly scale
to achieve that sounds dangerous!)
Oh no! The all is lost moment!
Enter fly.io machines
The best way to describe fly.io machines is just "firecracker microVMs as a service", with no Nomad/Consul in-between.
We'll need a new fly.io app for that — you can't just add machines to a regular app for now (or ever? I'm not the PM here).
$ fly apps create ? App Name: axum-machine ? Select Organization: Amos Wenger (personal) New app created: axum-machine
Because I don't like specifying -a
/ --app
every time, I'll just edit
fly.toml
and change the app =
line to read "axum-machine" instead of
"hello-axum". The rest of the file doesn't matter for machines.
And then we can run the same Docker image, but as a fly machine:
$ fly machines run --port 80:8080/tcp:http --port 443:8080/tcp:http:tls --region cdg --size shared-cpu-1x hello-axum Searching for image 'hello-axum' locally... image found: sha256:3f93ceb9158f5e123253060d58d607f7c2a7e2f93797b49b4edbbbcc8e1b3840 ==> Pushing image to fly The push refers to repository [registry.fly.io/axum-machine] 02f75279051e: Layer already exists 4e38e245312b: Layer already exists 85ade8c6ca76: Layer already exists ad6562704f37: Layer already exists deployment-1655573668: digest: sha256:1ddfda6a6d8d84d804602653501db1c9720677b6e04e31008d3256c53ec09723 size: 1159 --> Pushing image done Image: registry.fly.io/axum-machine:deployment-1655573668 Image size: 152 MB Machine is launching... Success! A machine has been successfully launched, waiting for it to be started Machine ID: 217814d9c9ee89 Instance ID: 01G5VY2TKH0A1MQWSX05S1GPK8 State: starting Waiting on firecracker VM... Waiting on firecracker VM... Waiting on firecracker VM... Machine started, you can connect via the following private ip fdaa:0:446c:a7b:5b66:d530:1a4b:2
Note that pushing the image was instant, since it already lived in fly's registry.
You can see there's no mention of allocations there or anything: it just starts one VM, as requested, and gives us its private IPv6 address.
That'll only work if we set up private networking, which I can't be bothered to do right now.
Instead, let's check we can still SSH into it with the default SSH server:
$ fly ssh console Connecting to top1.nearest.of.axum-machine.internal... complete # whoami root
So far so good.
$ fly status App Name = axum-machine Owner = personal Version = 0 Status = pending Hostname = axum-machine.fly.dev Machines ID NAME REGION STATE CREATED 217814d9c9ee89 ancient-snowflake-1933 cdg started 2022-06-18T17:34:30Z
That works too, and shows our machine running. Neat!
We also have:
$ fly m list 1 machines have been retrieved. View them in the UI here (https://fly.io/apps/axum-machine/machines/) axum-machine ID IMAGE CREATED STATE REGION NAME IP ADDRESS 217814d9c9ee89 axum-machine:deployment-1655573668 2022-06-18T17:34:30Z started cdg ancient-snowflake-1933 fdaa:0:446c:a7b:5b66:d530:1a4b:2
..which has more detail.
Our app has no public IP right now, so hitting the domain with curl won't work.
But we can allocate one - I'll go IPv6 because I have it, and IPv4 addresses are a precious commodity.
$ fly ips allocate-v6 TYPE ADDRESS REGION CREATED AT v6 2a09:8280:1::48d5 global 1s ago
And now this works!
$ curl -i https://axum-machine.fly.dev HTTP/2 200 content-type: text/plain; charset=utf-8 content-length: 16 date: Sat, 18 Jun 2022 17:39:27 GMT server: Fly/09a15cede3 (2022-06-17) via: 2 fly.io fly-request-id: 01G5VYBX04VT7JDNQF626KGZ52-cdg hello from axum
And here's the neat thing: we can stop machines.
$ fly m stop 217814d9c9ee89 217814d9c9ee89 has been successfully stopped $ fly m status 217814d9c9ee89 Success! A machine has been retrieved Machine ID: 217814d9c9ee89 Instance ID: 01G5VY2TKH0A1MQWSX05S1GPK8 State: stopped Event Logs MACHINE STATUS EVENT TYPE SOURCE TIMESTAMP stopped exit flyd 2022-06-18T17:40:38.517Z stopping stop user 2022-06-18T17:40:35.245Z started start flyd 2022-06-18T17:34:41.353Z created launch user 2022-06-18T17:34:30.538Z
We can see in the event logs that it did stop alright.
And now if we try to run our curl again...
$ curl -i https://axum-machine.fly.dev HTTP/2 200 content-type: text/plain; charset=utf-8 content-length: 16 date: Sat, 18 Jun 2022 17:41:46 GMT server: Fly/09a15cede3 (2022-06-17) via: 2 fly.io fly-request-id: 01G5VYG3JYZFJ0871A26DCYGKT-cdg hello from axum
It... still works?
Surprise! Shock! Awe! Predictable plot twist!
$ fly m status 217814d9c9ee89 Success! A machine has been retrieved Machine ID: 217814d9c9ee89 Instance ID: 01G5VY2TKH0A1MQWSX05S1GPK8 State: started Event Logs MACHINE STATUS EVENT TYPE SOURCE TIMESTAMP started start flyd 2022-06-18T17:41:46.075Z starting start user 2022-06-18T17:41:45.695Z stopped exit flyd 2022-06-18T17:40:38.517Z stopping stop user 2022-06-18T17:40:35.245Z started start flyd 2022-06-18T17:34:41.353Z created launch user 2022-06-18T17:34:30.538Z
Huh, it started up again.
Amos don't feign surprise. You worked on that feature. You know full well what it does.
Okay okay, alright. So if you hit a public port for an app that has machines, it'll try to start a machine to handle the connection (raw TCP) / request (HTTP).
Which we can definitely use to our advantage.
What are you thinking? Expose port 22?
Well yes! Let's try that.
$ fly m remove --force 217814d9c9ee89 machine 217814d9c9ee89 was found and is currently in started state, attempting to destroy... 217814d9c9ee89 has been destroyed $ fly machines run --app axum-machine --port 22:22/tcp --region cdg --size shared-cpu-1x hello-axum Searching for image 'hello-axum' locally... image found: sha256:3f93ceb9158f5e123253060d58d607f7c2a7e2f93797b49b4edbbbcc8e1b3840 ==> Pushing image to fly The push refers to repository [registry.fly.io/axum-machine] 02f75279051e: Layer already exists 4e38e245312b: Layer already exists 85ade8c6ca76: Layer already exists ad6562704f37: Layer already exists deployment-1655574325: digest: sha256:1ddfda6a6d8d84d804602653501db1c9720677b6e04e31008d3256c53ec09723 size: 1159 --> Pushing image done Image: registry.fly.io/axum-machine:deployment-1655574325 Image size: 152 MB Machine is launching... Success! A machine has been successfully launched, waiting for it to be started Machine ID: 5918536ef46383 Instance ID: 01G5VYPX14END6ZPAHBB411304 State: starting Waiting on firecracker VM... Waiting on firecracker VM... Machine started, you can connect via the following private ip fdaa:0:446c:a7b:5adc:24:e81f:2
And then:
$ ssh -vvv -i /tmp/id_rsa root@axum-machine.fly.dev OpenSSH_8.2p1 Ubuntu-4ubuntu0.5, OpenSSL 1.1.1f 31 Mar 2020 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files debug1: /etc/ssh/ssh_config line 21: Applying options for * debug2: resolving "axum-machine.fly.dev" port 22 debug2: ssh_connect_direct debug1: Connecting to axum-machine.fly.dev [2a09:8280:1::48d5] port 22. debug1: Connection established. debug1: identity file /tmp/id_rsa type -1 debug1: identity file /tmp/id_rsa-cert type 7 debug1: Local version string SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
Mh, it's stuck.
Let's check the app logs...
$ 2022-06-18T17:47:43Z proxy[5918536ef46383] cdg [info]Machine not ready yet (11.072820024s since start requested) 2022-06-18T17:47:44Z proxy[5918536ef46383] cdg [info]Machine not ready yet (15.250221892s since start requested) 2022-06-18T17:47:45Z proxy[5918536ef46383] cdg [info]Machine not ready yet (33.956303928s since start requested) 2022-06-18T17:47:47Z proxy[5918536ef46383] cdg [info]Machine not ready yet (5.409191838s since start requested) 2022-06-18T17:47:48Z proxy[5918536ef46383] cdg [info]Machine not ready yet (10.043353267s since start requested) 2022-06-18T17:47:48Z proxy[5918536ef46383] cdg [info]Machine not ready yet (16.080325672s since start requested) 2022-06-18T17:47:50Z proxy[5918536ef46383] cdg [info]Machine not ready yet (38.962990983s since start requested) ^C%
Oh lord. Is nothing listening on port 22?
Let's check...
$ fly ssh console Connecting to top1.nearest.of.axum-machine.internal... complete # ss -lpn Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process nl UNCONN 0 0 0:0 * (cut) nl UNCONN 0 0 18:0 * tcp LISTEN 0 0 *:8080 *:* users:(("hello-axum",pid=508,fd=9)) tcp LISTEN 0 0 [fdaa:0:446c:a7b:5adc:24:e81f:2]:22 *:* users:(("hallpass",pid=509,fd=6)) v_str LISTEN 0 0 3:10000 *:* users:(("init",pid=1,fd=9))
Ah! There is something listening on port 22, called "hallpass". But it's
listening on... the private IPv6 address. Not the special 0.0.0.0
/ ::
address.
So that won't work.
No problem then, we'll just run our own SSH server!
Let's also add a non-root user, for no good reason other than... that's what I'm
used to! I usually have a non-root user, with passwordless sudo, key-only
authentication for SSH. It's not really for security, more for not accidentally
clobbering system files without sudo
.
I also switched to an Ubuntu 20.04 base, something I feel a little more comfortable using than Debian:
# in `hello-axum/Dockerfile` # syntax = docker/dockerfile:1.4 ################################################################################ FROM ubuntu:20.04 RUN set -eux; \ export DEBIAN_FRONTEND=noninteractive; \ apt update; \ apt install --yes --no-install-recommends \ bind9-dnsutils iputils-ping iproute2 curl ca-certificates htop \ curl wget ca-certificates git-core \ openssh-server openssh-client \ sudo less zsh \ ; \ apt clean autoclean; \ apt autoremove --yes; \ rm -rf /var/lib/{apt,dpkg,cache,log}/; \ echo "Installed base utils!" RUN set -eux; \ useradd -ms /usr/bin/zsh amos; \ usermod -aG sudo amos; \ echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers; \ echo "added user" RUN set -eux; \ echo "Port 22" >> /etc/ssh/sshd_config; \ echo "AddressFamily inet" >> /etc/ssh/sshd_config; \ echo "ListenAddress 0.0.0.0" >> /etc/ssh/sshd_config; \ echo "PasswordAuthentication no" >> /etc/ssh/sshd_config; \ echo "ClientAliveInterval 30" >> /etc/ssh/sshd_config; \ echo "ClientAliveCountMax 10" >> /etc/ssh/sshd_config; \ echo "SSH server set up" USER amos RUN set -eux; \ mkdir ~/.ssh; \ curl https://github.com/fasterthanlime.keys | tee -a ~/.ssh/authorized_keys WORKDIR app CMD ["bash", "-c", "sudo service ssh start; echo 'SSH server started'; sleep infinity"]
(Note that we're also no longer building any Rust. Also, the
ClientAliveInterval
config there? Helps fight against fly's default TCP idle
timeout. It makes sure something is sent on the wire regularly as long as you're
connected, even if you're not actively doing stuff in your SSH session.)
Let's build it up again:
$ docker build -t hello-axum .
And create a new machine, making sure we expose port 22 this time.
Note: there's a way to replace machine, by passing --id ID
to fly m run
, but
at the time of this writing there's state update issues around it, so until
those get fixed, we'll just go ahead and remove / recreate machines. It makes no
difference other than the ID not being re-used.
$ fly m remove --force 5918536ef46383 (cut) $ fly m run -p 22:22/tcp -r cdg -s shared-cpu-8x hello-axum (cut)
And... voilà!
$ ssh axum-machine.fly.dev whoami amos
Now we can actually log into the machine with VS Code, and it doesn't complain!
All we have to do is add this to our local ~/.ssh/config
Host hello-axum HostName axum-machine.fly.dev
And then we get to pick which machine to connect to:
And we can edit remote files, open arbitrary terminals, work just as we would normally do in VS Code, except... remotely.
And latency is less of a concern than if we used something like vim over ssh, because it doesn't need to wait for single keystrokes to be sent and then for the terminal to echo back. It's a little more sophisticated than that.
Although, chances are there's a fly.io region where latency isn't too bad for you. For me it's ~10ms:
$ ping6 axum-machine.fly.dev PING6(56=40+8+8 bytes) [redacted] --> 2a09:8280:1::48d5 16 bytes from 2a09:8280:1::48d5, icmp_seq=0 hlim=52 time=15.792 ms 16 bytes from 2a09:8280:1::48d5, icmp_seq=1 hlim=52 time=13.238 ms 16 bytes from 2a09:8280:1::48d5, icmp_seq=2 hlim=52 time=8.906 ms ^C --- axum-machine.fly.dev ping6 statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 8.906/12.645/15.792/2.842 ms
VS Code also knows how to forward ports automatically (and manually, if the detection fails), so if we start a little server over there:
$ sudo apt update && sudo apt install -y python (cut) $ cd /etc $ python -m SimpleHTTPServer
Then VS Code automatically forwards the port to localhost:
And we can open it from a local desktop browser:
And with a couple VS Code extensions I like, like Resource Monitor, that's a pretty ideal setup for me.
Heck, you could even probably figure out a way to mount some folder on the remote machine to your local machine, through something like sshfs, which is apparently no longer maintained? So maybe a maintained alternative instead.
Oh and you'd want a volume. You can create those with fly volumes
(or fly vol
for short) and mount them by passing --volume vol_name:/path/on/disk
.
That CLI option is hidden from the flyctl docs right now. It'll be our little secret!
Caveats are: it's experimental, still, and you can only use one volume (as opposed to "classic" fly apps).
One thing that's neat in that kind of environment is that you can run Docker easily! It's kind of a hassle to add to our sample Dockerfile, but I'm typing this from my remote environment and I can promise it does in fact run docker:
$ docker info (cut) app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.8.2-docker) compose: Docker Compose (Docker Inc., v2.6.0) (cut) Server: Server Version: 20.10.17 Storage Driver: overlay2 (cut) containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 runc version: v1.1.2-0-ga916309 init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 5.12.2 Operating System: Ubuntu 20.04.4 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.63GiB (cut)
Also, also! Again because this is a real VM, not just a docker container,
we can install something like perf
!
We have to build it from sources, but that's no problem:
KERNEL_VERSION=$(uname -r | sed -r 's/(^[^-]+).*/\1/' | sed -r 's/\.0//g') echo "Installing perf for kernel ${KERNEL_VERSION}" mkdir ~/kernel-sources cd ~/kernel-sources curl --fail --location "https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-${KERNEL_VERSION}.tar.xz" | tar -xJ --strip-components=1 sudo apt install --yes libiberty-dev binutils-dev flex bison libelf-dev libunwind-dev liblzma-dev libzstd-dev libdw-dev sudo make -C tools/ perf_install prefix=/usr/
And then we see where CPU time is being spent with perf top
, for example!
See Brendan Gregg's perf page for more info.
There's only one problem with our little setup, and it's price-related.
The machine only stops if you call fly m stop --id MACHINE_ID
.
I'd like it to stop when it's just "not used for a while".
And we can solve that problem... with Rust.
Ah! Finally.
A naive TCP proxy with tokio
This is what I use "in production", so to speak. It's absolutely not the only way to do this, and in fact we'll see if we have time to do it a few other fun ways, but it's simple and straight-forward, and I like it.
We don't need axum for this, since we only want to speak TCP, not HTTP.
// in `hello-axum/src/main.rs` use std::{ process::Stdio, sync::{ atomic::{AtomicU64, Ordering}, Arc, }, time::{Duration, Instant}, }; use tokio::{ net::{TcpListener, TcpStream}, process::Command, time::sleep, }; #[tokio::main] async fn main() { let status = Command::new("service") .arg("ssh") .arg("start") .stdin(Stdio::null()) .stdout(Stdio::inherit()) .stderr(Stdio::inherit()) .status() .await .unwrap(); assert!(status.success()); let num_conns: Arc<AtomicU64> = Default::default(); tokio::spawn({ let num_conns = num_conns.clone(); let mut last_activity = Instant::now(); async move { loop { if num_conns.load(Ordering::SeqCst) > 0 { last_activity = Instant::now(); } else { let idle_time = last_activity.elapsed(); println!("Idle for {idle_time:?}"); if idle_time > Duration::from_secs(60) { println!("Stopping machine. Goodbye!"); std::process::exit(0) } } sleep(Duration::from_secs(5)).await; } } }); let listener = TcpListener::bind("[::]:2222").await.unwrap(); while let Ok((mut ingress, _)) = listener.accept().await { let num_conns = num_conns.clone(); tokio::spawn(async move { // We'll tell OpenSSH to listen on this IPv4 address. let mut egress = TcpStream::connect("127.0.0.2:22").await.unwrap(); // did you know: loopback is 127.0.0.1/8, it goes all the way to // 127.255.255.254 (and 127.255.255.255 for broadcast) num_conns.fetch_add(1, Ordering::SeqCst); match tokio::io::copy_bidirectional(&mut ingress, &mut egress).await { Ok((to_egress, to_ingress)) => { println!( "Connection ended gracefully ({to_egress} bytes from client, {to_ingress} bytes from server)" ); } Err(err) => { println!("Error while proxying: {}", err); } } num_conns.fetch_sub(1, Ordering::SeqCst); }); } }
Wait... stopping the machine is just std::process::exit
?
Yeah! If our docker image's "CMD" exits, the machine is stopped. In this case, it's much easier to tell from the inside whether the machine needs to be stopped.
(If we could only tell from the outside, we'd use the machines API to stop it instead.)
Anyway, here's our adjusted Dockerfile:
# in `hello-axum/Dockerfile` # syntax = docker/dockerfile:1.4 ################################################################################ # Let's just make our own Rust builder image based on ubuntu:20.04 to avoid # any libc version problems FROM ubuntu:20.04 AS builder # Install base utils: curl to grab rustup, gcc + build-essential for linking. # we could probably reduce that a bit but /shrug RUN set -eux; \ export DEBIAN_FRONTEND=noninteractive; \ apt update; \ apt install --yes --no-install-recommends \ curl ca-certificates \ gcc build-essential \ ; \ apt clean autoclean; \ apt autoremove --yes; \ rm -rf /var/lib/{apt,dpkg,cache,log}/; \ echo "Installed base utils!" # Install rustup RUN set -eux; \ curl --location --fail \ "https://static.rust-lang.org/rustup/dist/x86_64-unknown-linux-gnu/rustup-init" \ --output rustup-init; \ chmod +x rustup-init; \ ./rustup-init -y --no-modify-path; \ rm rustup-init; # Add rustup to path, check that it works ENV PATH=${PATH}:/root/.cargo/bin RUN set -eux; \ rustup --version; # Build some code! # Careful: now we need to cache `/root/.cargo/` rather than `/usr/local/cargo` # since rustup installed things differently than in the rust build image WORKDIR /app COPY . . RUN --mount=type=cache,target=/app/target \ --mount=type=cache,target=/root/.cargo/registry \ --mount=type=cache,target=/root/.cargo/git \ --mount=type=cache,target=/root/.rustup \ set -eux; \ rustup install stable; \ cargo build --release; \ objcopy --compress-debug-sections target/release/hello-axum ./hello-axum ################################################################################ FROM ubuntu:20.04 RUN set -eux; \ export DEBIAN_FRONTEND=noninteractive; \ apt update; \ apt install --yes --no-install-recommends \ bind9-dnsutils iputils-ping iproute2 curl ca-certificates htop \ curl wget ca-certificates git-core \ openssh-server openssh-client \ sudo less zsh \ ; \ apt clean autoclean; \ apt autoremove --yes; \ rm -rf /var/lib/{apt,dpkg,cache,log}/; \ echo "Installed base utils!" RUN set -eux; \ useradd -ms /usr/bin/zsh amos; \ usermod -aG sudo amos; \ echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers; \ echo "added user" # Note that we've changed the `ListenAddress` here from `0.0.0.0` to # `127.0.0.2`. It's not really necessary but it's neat that 127.0.0.1 is a /8. RUN set -eux; \ echo "Port 22" >> /etc/ssh/sshd_config; \ echo "AddressFamily inet" >> /etc/ssh/sshd_config; \ echo "ListenAddress 127.0.0.2" >> /etc/ssh/sshd_config; \ echo "PasswordAuthentication no" >> /etc/ssh/sshd_config; \ echo "ClientAliveInterval 30" >> /etc/ssh/sshd_config; \ echo "ClientAliveCountMax 10" >> /etc/ssh/sshd_config; \ echo "SSH server set up" USER amos # Don't forget to change that if you don't want to give /me/ access to your # remote dev env! Otherwise I'll ssh in there and fix your code 😈 RUN set -eux; \ mkdir ~/.ssh; \ curl https://github.com/fasterthanlime.keys | tee -a ~/.ssh/authorized_keys WORKDIR app COPY --from=builder /app/hello-axum ./hello-axum # Because our top-level process starts the ssh daemon itself, for simplicity, # let's run it as root. It could drop privileges after that but we already have # passwordless sudo set up on the machine so double-shrug. USER root CMD ["./hello-axum"]
After a quick docker build -t hello-axum .
, let's start it up again, mapping
edge port 22 to machine port 2222 instead:
$ fly m run -p 22:2222/tcp -r cdg -s shared-cpu-8x hello-axum (cut)
And that's basically all it takes! You can stop reading the article now!
Right now? But... your brand.
Oh I'll keep going. But you could stop reading the article now. It's missing a volume (explained above), which means right now every time it's stopped, all our data disappears. So we definitely want that.
And because we can only have one volume, I have my "pseudo-init" process create
a symlink from /var/lib/docker
to /home/amos/docker
, and change some
permissions, also start the docker daemon, things like that.
Oh, I also have the PROXY protocol handler set up on my machine, which I parse with the ppp crate so I'm able to log the real client IPs that try to connect to my remote dev environment, even though these are all TCP connections.
Mh? As opposed to what?
Well if they were HTTP connections we'd get the real IP as the fly-client-ip
header. But with TCP there's not really a concept of "headers" / "custom
metadata", hence the PROXY protocol.
Oh and I didn't really show it in action: here's what the logs look like when I sign off for over a minute:
2022-06-19T19:24:18Z proxy[e148e394a72e89] cdg [info]Machine became reachable in 12.924218ms 2022-06-19T19:25:21Z app[e148e394a72e89] cdg [info]Connection ended gracefully (259121 bytes from client, 343897 bytes from server) 2022-06-19T19:25:22Z app[e148e394a72e89] cdg [info]Idle for 5.001673407s 2022-06-19T19:25:27Z app[e148e394a72e89] cdg [info]Idle for 10.002938289s 2022-06-19T19:25:32Z app[e148e394a72e89] cdg [info]Idle for 15.004794068s 2022-06-19T19:25:37Z app[e148e394a72e89] cdg [info]Idle for 20.005997194s 2022-06-19T19:25:42Z app[e148e394a72e89] cdg [info]Idle for 25.00744559s 2022-06-19T19:25:47Z app[e148e394a72e89] cdg [info]Idle for 30.008603681s 2022-06-19T19:25:52Z app[e148e394a72e89] cdg [info]Idle for 35.009784886s 2022-06-19T19:25:57Z app[e148e394a72e89] cdg [info]Idle for 40.010062697s 2022-06-19T19:26:02Z app[e148e394a72e89] cdg [info]Idle for 45.011428658s 2022-06-19T19:26:07Z app[e148e394a72e89] cdg [info]Idle for 50.012635341s 2022-06-19T19:26:12Z app[e148e394a72e89] cdg [info]Idle for 55.013845891s 2022-06-19T19:26:17Z app[e148e394a72e89] cdg [info]Idle for 60.014006722s 2022-06-19T19:26:17Z app[e148e394a72e89] cdg [info]Stopping machine. Goodbye!
Alright cool! Seems like our work is done here? Yet you wanted to continue, somehow?
Well yes, because see, if there's one thing I've learned from amateur microbenchmarks doing bullshit comparisons between programming languages...
The salt. God, just sign off amos.
...it's that syscalls are bad. Or slow. Whichever. And right now we do a bunch of syscalls:
$ e148e394a72e89% sudo strace -ff -p $(pidof hello-axum) 2>&1 | head -30 strace: Process 586 attached with 9 threads [pid 599] futex(0x7f8ece5a9608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 598] epoll_wait(3, <unfinished ...> [pid 597] futex(0x7f8ece9b7608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 596] futex(0x7f8ecebbb608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 595] futex(0x7f8ecedbc608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 594] futex(0x7f8ecefbd608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 593] futex(0x7f8ecf1be608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 592] futex(0x7f8ecf3bf608, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 586] futex(0x7f8ecf3c1448, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 598] <... epoll_wait resumed>[{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 336) = 1 [pid 598] recvfrom(12, "\30\317\332\271\354+\345:\3231\223\330\303\333x\177\347%\332[\316\241\235\307\277\200\34~\322\262s\337"..., 8192, 0, NULL, NULL) = 196 [pid 598] sendto(11, "\30\317\332\271\354+\345:\3231\223\330\303\333x\177\347%\332[\316\241\235\307\277\200\34~\322\262s\337"..., 196, MSG_NOSIGNAL, NULL, 0) = 196 [pid 598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 334) = 1 [pid 598] recvfrom(12, "\306\214e\204\242x,\315\34\3427\7\241{I\23f\251\321\235\36\262\35#V\372\246\344\277S\4\337"..., 8192, 0, NULL, NULL) = 212 [pid 598] sendto(11, "\306\214e\204\242x,\315\34\3427\7\241{I\23f\251\321\235\36\262\35#V\372\246\344\277S\4\337"..., 212, MSG_NOSIGNAL, NULL, 0) = 212 [pid 598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 333) = 1 [pid 598] recvfrom(12, "K\223\332\24h\346#N\37\234t\364-\326\v\221p\320\254\363m<\323\254\206\32\250'\362\346\207\246"..., 8192, 0, NULL, NULL) = 180 [pid 598] sendto(11, "K\223\332\24h\346#N\37\234t\364-\326\v\221p\320\254\363m<\323\254\206\32\250'\362\346\207\246"..., 180, MSG_NOSIGNAL, NULL, 0) = 180 [pid 598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 332) = 1 [pid 598] recvfrom(12, "L\361W\16\244\r\254\244\313\360\357\6n\v\26.\362\364\2068\24\262\23\345\22\263\365z]\37\5~"..., 8192, 0, NULL, NULL) = 164 [pid 598] sendto(11, "L\361W\16\244\r\254\244\313\360\357\6n\v\26.\362\364\2068\24\262\23\345\22\263\365z]\37\5~"..., 164, MSG_NOSIGNAL, NULL, 0) = 164 [pid 598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 598] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=16777219, u64=16777219}}], 1024, 330) = 1 [pid 598] recvfrom(12, "\362\275LJk\200\25*\367\22\370\345\214A\317nX\32L\217;\270gX{\254fZ\206sqL"..., 8192, 0, NULL, NULL) = 140 [pid 598] sendto(11, "\362\275LJk\200\25*\367\22\370\345\214A\317nX\32L\217;\270gX{\254fZ\206sqL"..., 140, MSG_NOSIGNAL, NULL, 0) = 140 [pid 598] recvfrom(12, 0x7f8ea4002c30, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
I'm only showing 30 lines here, but it scrolls by really fast.
I mean, yes. copy_bidirectional
is reading data from a socket and copying
to the other socket. And also reading from the other socket and copying it
to the first socket. What did you expect?
Nothing, nothing, it's a perfectly reasonable way to do I/O: we have user-space buffers that the kernel copies data into, and out of.
It's just... we have more modern equivalents now.
Such as?
Well, I don't know if this'll work, but let's give it a shot.
A wonderful TCP proxy with tokio-uring
Okay, I hope things will go really smoothly, because I don't have a lot of time left to write this article.
io-uring is a different way to do I/O. I'll explain what I've understood about it, and let The Internet correct me.
So the olden way is just do blocking syscalls. First you allocate a buffer, then
you do a syscall (probably via some libc wrapper, like read
or write
),
passing the address (and size) of the buffer you've allocated, and when it
returns, if there were no errors, you have some data in your buffer!
And you can make that scale by having more threads! Since every thread blocks on... having more data to read from somewhere, or a write finishing (it might end up in a kernel buffer, but that's fine).
And then there's non-blocking I/O, which is much the same, except you set everything (file descriptors, sockets) to "non-blocking mode", and when you call "read" and "write", IF they don't have data that's immediately available, if the call "would block", they return "EWOULDBLOCK".
But then how do you know when to call read & write then? In a loop?
In a loop yes, but first you register your interest in some resource being "ready", and then the only blocking syscall you do (from your async runtime) is one that waits for the next readiness event. (And there might be multiple events because multiple resources might become ready "at the same time").
So from a single thread you do something like:
- Open a, set as non-blocking, register interest
- Open b, set as non-blocking, register interest
- Wait for next readiness event
- We have readiness events!
- One of them is "a is ready to read from"
- Try to read from a, it either succeeds immediately or returns EWOULDBLOCK (spurious wake-ups happen if I recall correctly?)
- Wait for next readiness event
- etc.
Okay, and that's what regular tokio does?
Exactly. And then there's io-uring, in which you don't do "one syscall per I/O operation", instead you submit items to a ring buffer and you can monitor completion from another ring buffer, at least I think so, I'm a bit fuzzy on the details still.
Ah, so, fewer syscalls overall! It sounds like a great fit for, like... highly concurrent stuff?
Yeah, which our thing is not... we're just doing bidirectional copy between two sockets. So it's probably not even much of an improvement, but hey, I've never tried it before, and all we need is a 5.11+ kernel and everyone's always saying to try new stuff. THIS IS ME TRYING.
So we'll just want to add tokio-uring:
# in `hello-axum/Cargo.toml` [package] name = "hello-axum" version = "0.1.0" edition = "2021" [dependencies] tokio = { version = "1.19.2", features = ["full"] } tokio-uring = "0.3.0"
And then... it's easier to explain this one in the comments, so read the comments!
use std::{ process::Stdio, rc::Rc, sync::atomic::{AtomicU64, Ordering}, time::{Duration, Instant}, }; // we can still use regular tokio stuff! use tokio::{process::Command, time::sleep}; // but we want the uring versions of TCP sockets. use tokio_uring::{ buf::IoBuf, net::{TcpListener, TcpStream}, }; // can't use a regular main function because we need to start a // `tokio-uring` runtime, which manages both the main tokio runtime // and the uring runtime. fn main() { // nobody's stopping us from defining our own main function though. tokio_uring::start(main_inner()); } async fn main_inner() { // this is regular tokio stuff, still works fine. let status = Command::new("service") .arg("ssh") .arg("start") .stdin(Stdio::null()) .stdout(Stdio::inherit()) .stderr(Stdio::inherit()) .status() .await .unwrap(); assert!(status.success()); let num_conns: Rc<AtomicU64> = Default::default(); // We can still spawn stuff, but with tokio_uring's `spawn`. The future // we send doesn't have to be `Send`, since it's all single-threaded. tokio_uring::spawn({ let num_conns = num_conns.clone(); let mut last_activity = Instant::now(); async move { loop { if num_conns.load(Ordering::SeqCst) > 0 { last_activity = Instant::now(); } else { let idle_time = last_activity.elapsed(); println!("Idle for {idle_time:?}"); if idle_time > Duration::from_secs(60) { println!("Stopping machine. Goodbye!"); std::process::exit(0) } } sleep(Duration::from_secs(5)).await; } } }); // tokio-uring's TcpListener wants a `SocketAddr`, not a `ToAddrs` or // something, so let's parse it ahead of time. let addr = "[::]:2222".parse().unwrap(); // also it doesn't return a future? let listener = TcpListener::bind(addr).unwrap(); while let Ok((ingress, _)) = listener.accept().await { println!("Accepted connection"); let num_conns = num_conns.clone(); tokio_uring::spawn(async move { // same deal, we need to parse first. if you're puzzled why there's // no mention of `SocketAddr` anywhere, it's inferred from what // `TcpStream::connect` wants. let egress_addr = "127.0.0.2:22".parse().unwrap(); let egress = TcpStream::connect(egress_addr).await.unwrap(); num_conns.fetch_add(1, Ordering::SeqCst); // `read` and `write` take owned buffers (more on that later), and // there's no "per-socket" buffer, so they actually take `&self`. // which means we don't need to split them into a read half and a // write half like we'd normally do with "regular tokio". Instead, // we can send a reference-counted version of it. also, since a // tokio-uring runtime is single-threaded, we can use `Rc` instead of // `Arc`. let egress = Rc::new(egress); let ingress = Rc::new(ingress); // We need to copy in both directions... let mut from_ingress = tokio_uring::spawn(copy(ingress.clone(), egress.clone())); let mut from_egress = tokio_uring::spawn(copy(egress.clone(), ingress.clone())); // Stop as soon as one of them errors let res = tokio::try_join!(&mut from_ingress, &mut from_egress); if let Err(e) = res { println!("Connection error: {}", e); } // Make sure the reference count drops to zero and the socket is // freed by aborting both tasks (which both hold a `Rc<TcpStream>` // for each direction) from_ingress.abort(); from_egress.abort(); num_conns.fetch_sub(1, Ordering::SeqCst); }); } } async fn copy(from: Rc<TcpStream>, to: Rc<TcpStream>) -> Result<(), std::io::Error> { let mut buf = vec![0u8; 1024]; loop { // things look weird: we pass ownership of the buffer to `read`, and we get // it back, _even if there was an error_. There's a whole trait for that, // which `Vec<u8>` implements! let (res, buf_read) = from.read(buf).await; // Propagate errors, see how many bytes we read let n = res?; if n == 0 { // A read of size zero signals EOF (end of file), finish gracefully return Ok(()); } // The `slice` method here is implemented in an extension trait: it // returns an owned slice of our `Vec<u8>`, which we later turn back // into the full `Vec<u8>` let (res, buf_write) = to.write(buf_read.slice(..n)).await; res?; // Later is now, we want our full buffer back. // That's why we declared our binding `mut` way back at the start of `copy`, // even though we moved it into the very first `TcpStream::read` call. buf = buf_write.into_inner(); } }
A docker build, fly m remove --force
, fly m run
later.... it works!
Let's take a look at the syscalls we have now:
59185369a43383% sudo strace -ff -p $(pidof hello-axum) 2>&1 | head -30 strace: Process 584 attached with 3 threads [pid 584] epoll_wait(3, 0x56361d15d240, 1024, 951) = -1 EINTR (Interrupted system call) [pid 584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 947) = 1 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 946) = 2 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 946) = 1 [pid 584] epoll_wait(3, 0x56361d15d240, 1024, 946) = -1 EINTR (Interrupted system call) [pid 584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 945) = 1 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 945) = 2 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 943) = 1 [pid 584] epoll_wait(3, 0x56361d15d240, 1024, 943) = -1 EINTR (Interrupted system call) [pid 584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 943) = 1 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 943) = 2 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 943) = 1 [pid 584] epoll_wait(3, 0x56361d15d240, 1024, 943) = -1 EINTR (Interrupted system call) [pid 584] epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 942) = 1 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] io_uring_enter(9, 1, 0, 0, NULL, 128) = 1 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}, {EPOLLIN|EPOLLOUT, {u32=1, u64=1}}], 1024, 941) = 2
Well, I see io_uring_enter
in there, so it's definitely Doing The Thing(TM),
but it's still scrolling by really fast.
Say, Amos?
Yes Bear?
Where do you think strace's output is sent to?
Well, to my terminal. I'm looking right at it.
Right, and how does it end up in your terminal?
Oh. OHHHhhhhhhhhhhhhh right. As strace outputs more data it's sent over SSH back to me, which results in more syscalls, which results in more strace output, which results in more data being sent over SSH back to me, which...
Okay if we really want to snoop without disrupting hello-axum's operation (guess who's regretting picking that name now!), we probably want to connect through hallpass instead:
$ fly ssh console Connecting to top1.nearest.of.axum-machine.internal... complete # bash root@59185369a43383:/# strace -p $(pidof hello-axum) -ff strace: Process 584 attached with 3 threads [pid 584] epoll_wait(3, [], 1024, 565) = 0 [pid 584] epoll_wait(3, [], 1024, 21) = 0 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 2536) = 1 [pid 584] epoll_wait(3, [], 1024, 2536) = 0 [pid 584] epoll_wait(3, [], 1024, 2430) = 0 [pid 584] epoll_wait(3, [], 1024, 30) = 0 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 584] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 1631) = 1 [pid 584] epoll_wait(3, [], 1024, 1631) = 0 [pid 584] epoll_wait(3,
Better! Now we only see the SSH keepalives (set through ClientAliveInterval
up
there, your browser has a search function I believe in you), and whatever
chatter is exchanged between vscode and vscode server.
Okay, now our work is like, super done. Right?
Mhhhhhhhh one could say so, yes. But we're still doing some syscalls. You know what's better than some syscalls?
...no syscalls?
An eBPF thingy with aya
See bear, we never actually do anything useful with the data we proxy back and forth. It's not like we were doing HTTP, or if we were the SSH server ourselves.
We're merely acting as a pipe, copying stuff back and forth in both directions.
And at first I thought about using syscalls like splice
, but I realized,
that's not even a step up from io-uring. The io-uring solution is more general
and completely replaces splice
as far as I'm concerned.
All we really need to do, is know if there are packets being sent to/from OpenSSH's port. If there are: we have activity, let's stay up! If not, let's go to sleep.
And you know what's a great way to snoop on network traffic?
Is it.. is in the title? Is it BPF?
Yes! Or eBPF if you want to nitpick, but rest assured I have no plans to learn classic BPF any time soon.
So! Let's get started. We'll actually want two programs here:
- A BPF program, that we'll compile and link for the BPF target (it'll end up being bytecode in an ELF file)
- A regular Linux executable that will be in charge of loading our BPF program and attaching it to a network interface.
So let's make a new project:
$ cd hello-axum/ $ cargo new flyremote-bpf Created binary (application) `flyremote-bpf` package
Add a couple dependencies:
# in `hello-axum/flyremote-bpf/Cargo.toml` [package] name = "flyremote-bpf" version = "0.1.0" edition = "2021" # these are important too! [profile.release] lto = true panic = "abort" codegen-units = 1 [dependencies] aya-bpf = { git = "https://github.com/aya-rs/aya", branch = "main" } aya-log-ebpf = { git = "https://github.com/aya-rs/aya-log", branch = "main" }
Make sure we use nightly Rust for this one:
# in `hello-axum/flyremote-bpf/rust-toolchain.toml` [toolchain] channel="nightly"
There's many different BPF program types. We could look at every single packet going through an interface (and mess with them), but here we really just need to listen to some events: when has a connection just been made? When has it been closed? And of course, on what address/port.
Here's a good starting point:
// in `hello-axum/flyremote-bpf/src/main.rs` // We won't have an allocator, so we can't bring the Rust standard library // with us here. Besides, it probably wouldn't pass the BPF verifier. #![no_std] #![no_main] use aya_bpf::{macros::sock_ops, programs::SockOpsContext}; // This works a little like `tracing`! use aya_log_ebpf::info; // The proc macro here does the heavy lifting. There's a bunch of linker fuckery // at hand here that would be fascinating, but that I won't get into. #[sock_ops(name = "flyremote")] pub fn flyremote(ctx: SockOpsContext) -> u32 { match unsafe { try_flyremote(ctx) } { Ok(ret) => ret, Err(ret) => ret, } } // This gets called for every "socket operation" event. unsafe fn try_flyremote(ctx: SockOpsContext) -> Result<u32, u32> { // transmuting from a `u32` to a `[u8; 4]` - should be okay. let local_ip4: [u8; 4] = core::mem::transmute([ctx.local_ip4()]); let remote_ip4: [u8; 4] = core::mem::transmute([ctx.remote_ip4()]); // log some stuff info!( &ctx, "op ({} {}), local port {}, remote port {}, local ip4 = {}.{}.{}.{} remote ip4 = {}.{}.{}.{}", op_name(ctx.op()), ctx.op(), ctx.local_port(), // this value is big-endian (but local_port is native-endian) u32::from_be(ctx.remote_port()), local_ip4[0], local_ip4[1], local_ip4[2], local_ip4[3], remote_ip4[0], remote_ip4[1], remote_ip4[2], remote_ip4[3], ); // that's `BPF_SOCK_OPS_STATE_CB_FLAG` - so we receive "state_cb" events, // when a socket changes state. // this may fail, so it returns a `Result`, but I wouldn't know what to do // if it failed anyway. let _ = ctx.set_cb_flags(1 << 2); // if this is a "state_cb" event, show the old state and new state, which // are the first two arguments (we have up to 4 arguments) if ctx.op() == 10 { info!( &ctx, "state transition: {} {} => {} {}", ctx.arg(0), state_name(ctx.arg(0)), ctx.arg(1), state_name(ctx.arg(1)), ); } Ok(0) } // gleaned from `bpf.h` fn op_name(op: u32) -> &'static str { match op { 0 => "void", 1 => "timeout_init", 2 => "rwnd_init", 3 => "tcp_connect_cb", 4 => "active_established_cb", 5 => "passive_established_cb", 6 => "needs_ecn", 7 => "base_rtt", 8 => "rto_cb", 9 => "retrans_cb", 10 => "state_cb", _ => "unknown", } } // gleaned from `bpf.h` too fn state_name(op: u32) -> &'static str { match op { 1 => "established", 2 => "syn-sent", 3 => "syn-recv", 4 => "fin-wait1", 5 => "fin-wait2", 6 => "time-wait", 7 => "close", 8 => "close-wait", 9 => "last-ack", 10 => "listen", 11 => "closing", 12 => "new-syn-recv", _ => "unknown", } } #[panic_handler] fn panic(_info: &core::panic::PanicInfo) -> ! { unsafe { core::hint::unreachable_unchecked() } }
We can now build this for the BPF target using Rust nightly, doing a release
build, and asking rustc to build libcore (the smol part of libstd) from scratch,
since there's no prebuilt target for bpfel-unknown-none
as far as I know:
$ cd hello-axum/flyremote-bpf/ $ cargo +nightly build --verbose --target bpfel-unknown-none -Z build-std=core --release Fresh unicode-ident v1.0.1 Fresh core v0.0.0 (/home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core) Fresh rustc-std-workspace-core v1.99.0 (/home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/rustc-std-workspace-core) Fresh proc-macro2 v1.0.39 (cut) Fresh aya-log-ebpf v0.1.0 (https://github.com/aya-rs/aya-log?branch=main#1b0d3da1) Compiling flyremote-bpf v0.1.0 (/home/amos/bearcove/flyremote-bpf) Running `rustc --crate-name flyremote_bpf --edition=2021 src/main.rs (cut.)` Finished release [optimized] target(s) in 0.88s
We can inspect the result with llvm-objdump
:
$ llvm-objdump -t target/bpfel-unknown-none/release/flyremote-bpf target/bpfel-unknown-none/release/flyremote-bpf: file format elf64-bpf SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 flyremote_bpf-8df4772bd494bad9 0000000000001890 l sockops/flyremote 0000000000000000 LBB0_30 (cut) 0000000000000000 g F sockops/flyremote 0000000000002a48 flyremote 0000000000000000 g O maps 000000000000001c AYA_LOG_BUF 0000000000000040 g F .text 0000000000000058 .hidden memcpy 000000000000001c g O maps 000000000000001c AYA_LOGS 0000000000000000 g F .text 0000000000000040 .hidden memset
To load it into the kernel, we'll need a regular Linux executable. For us
it'll be hello-axum
(really regretting that name now, it's not axum-powered
at all anymore).
We'll need these dependencies:
# in `hello-axum/Cargo.toml` [package] name = "hello-axum" version = "0.1.0" edition = "2021" [dependencies] aya = { version = ">=0.11", features=["async_tokio"] } aya-log = "0.1" clap = { version = "3.1", features = ["derive"] } color-eyre = "0.6.1" log = "0.4" simplelog = "0.12" tokio = { version = "1.19.2", features = ["full"] }
And then: we include the bytecode as part of our executable, do some syscalls to
load it, grab a handle to the flyremote
program in there, attach it to the
default cgroup (/sys/fs/cgroup/unified
), also set up some logging, and off
we go:
// in `hello-axum/src/main.rs` use aya::programs::SockOps; use aya::{include_bytes_aligned, Bpf}; use aya_log::BpfLogger; use clap::Parser; use log::info; use simplelog::{ColorChoice, ConfigBuilder, LevelFilter, TermLogger, TerminalMode}; use tokio::signal; #[derive(Debug, Parser)] struct Opt { #[clap(short, long, default_value = "/sys/fs/cgroup/unified")] cgroup_path: String, } #[tokio::main] async fn main() -> color_eyre::Result<()> { color_eyre::install()?; let opt = Opt::parse(); TermLogger::init( LevelFilter::Debug, ConfigBuilder::new() .set_target_level(LevelFilter::Error) .set_location_level(LevelFilter::Error) .build(), TerminalMode::Mixed, ColorChoice::Auto, )?; let mut bpf = Bpf::load(include_bytes_aligned!( "../flyremote-bpf/target/bpfel-unknown-none/release/flyremote-bpf" ))?; BpfLogger::init(&mut bpf)?; let program: &mut SockOps = bpf.program_mut("flyremote").unwrap().try_into()?; let cgroup = std::fs::File::open(opt.cgroup_path)?; program.load()?; program.attach(cgroup)?; info!("Waiting for Ctrl-C..."); signal::ctrl_c().await?; info!("Exiting..."); Ok(()) }
Before we get into Dockerfile
business again, I can try to run it on my local
machine:
$ cargo run Compiling hello-axum v0.1.0 (/home/amos/bearcove/hello-axum) Finished dev [unoptimized + debuginfo] target(s) in 3.48s Running `target/debug/hello-axum` 09:34:56 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:106] [FEAT PROBE] BPF program name support: true 09:34:56 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:109] [FEAT PROBE] BTF support: false Error: 0: map error 1: the `bpf_map_freeze` syscall failed with code -1 2: Operation not permitted (os error 1) Location: src/main.rs:35 Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it. Run with RUST_BACKTRACE=full to include source snippets.
Oh, uh.
Seems like you need to be root to run this?
Well... there is such a thing as unprivileged BPF, just need to tune a few sysctls and reboot, but most distributions disable it by default because it has complicated security implications.
It doesn't really matter to us in our microVM, since we're already running the top-level process as root, so, let's just run it as root here too:
$ cargo build --quiet && sudo ./target/debug/hello-axum 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:106] [FEAT PROBE] BPF program name support: true 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:109] [FEAT PROBE] BTF support: true 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:113] [FEAT PROBE] BTF func support: true 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:116] [FEAT PROBE] BTF global func support: true 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:122] [FEAT PROBE] BTF var and datasec support: true 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:128] [FEAT PROBE] BTF float support: false 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:131] [FEAT PROBE] BTF decl_tag support: false 09:37:03 [DEBUG] (1) aya::bpf: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/bpf.rs:134] [FEAT PROBE] BTF type_tag support: false 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:270] relocating program flyremote function flyremote 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:327] relocating call to callee address 64 (relocation) 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:348] callee is memcpy 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:270] relocating program flyremote function memcpy 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:363] finished relocating program flyremote function memcpy 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:327] relocating call to callee address 64 (relocation) 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:348] callee is memcpy 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:327] relocating call to callee address 64 (relocation) 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:348] callee is memcpy 09:37:04 [DEBUG] (1) aya::obj::relocation: [/home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/aya-0.11.0/src/obj/relocation.rs:363] finished relocating program flyremote function flyremote 09:37:04 [INFO] hello_axum: [src/main.rs:44] Waiting for Ctrl-C...
The program waits for new sockops events. I have an SSH server running here on 127.0.0.2 port 22, so if I try to connect to it from another tab:
$ ssh 127.0.0.2 (omitted: fingerprint stuff, etc.)
...we see:
09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (tcp_connect_cb 3), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (rwnd_init 2), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (timeout_init 1), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (needs_ecn 6), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (rwnd_init 2), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (timeout_init 1), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (needs_ecn 6), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:52 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 2 syn-sent => 1 established 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (active_established_cb 4), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (passive_established_cb 5), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:52 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:52 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 3 syn-recv => 1 established
Note that we see both the "client" and the "server" socket here being established, since I'm connecting from localhost.
The client socket goes from 127.0.0.1:59920
to 127.0.0.2:22
(where the
SSH server lives), and the server socket goes from 127.0.0.2:22
to
127.0.0.1:59920
, the exact opposite. tcp_connect_cb
is only for "outgoing"
connections, and eventually we get an active_established_cb
. For the other
direction, we eventually have a passive_established_cb
.
So, for our actual program, we'll want to watch for passive_established_cb
with local port 22.
And when I disconnect, we get this:
09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 1 established => 4 fin-wait1 09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 1 established => 8 close-wait 09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 4 fin-wait1 => 5 fin-wait2 09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 59920, remote port 22, local ip4 = 127.0.0.1 remote ip4 = 127.0.0.2 09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 5 fin-wait2 => 7 close 09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 8 close-wait => 9 last-ack 09:38:56 [INFO] flyremote_bpf: [src/main.rs:26] op (state_cb 10), local port 22, remote port 59920, local ip4 = 127.0.0.2 remote ip4 = 127.0.0.1 09:38:56 [INFO] flyremote_bpf: [src/main.rs:52] state transition: 9 last-ack => 7 close
Closing a TCP socket is more involved than it first appears! You can check the TCP state diagram if it's not burned into your brain by virtue of suffering it for long enough.
This is tremendous progress, and if it runs on our fly.io machine, we'll have
finally achieved O(0)
syscalls.
Or will we? Won't we need to... query some state, still?
Yes! How do you think logging works, currently?
I don't know, it's all hidden by magic proc macros.
Well, I'll tell you! When we did llvm-objdump
on our BPF program, we saw these
lines, which correspond to exported symbols:
0000000000000000 g O maps 000000000000001c AYA_LOG_BUF 000000000000001c g O maps 000000000000001c AYA_LOGS
And in our "driver" program we had this line:
BpfLogger::init(&mut bpf)?;
I bet if we dig into what this BpfLogger
thing actually does, we'll have our answer:
// in `aya-log/src/lib.rs` impl BpfLogger { /// Starts reading log records created with `aya-log-ebpf` and logs them /// with the default logger. See [log::logger]. pub fn init(bpf: &mut Bpf) -> Result<BpfLogger, Error> { BpfLogger::init_with_logger(bpf, DefaultLogger {}) } /// Starts reading log records created with `aya-log-ebpf` and logs them /// with the given logger. pub fn init_with_logger<T: Log + 'static>( bpf: &mut Bpf, logger: T, ) -> Result<BpfLogger, Error> { let logger = Arc::new(logger); let mut logs: AsyncPerfEventArray<_> = bpf.map_mut("AYA_LOGS")?.try_into()?; for cpu_id in online_cpus().map_err(Error::InvalidOnlineCpu)? { let mut buf = logs.open(cpu_id, None)?; let log = logger.clone(); tokio::spawn(async move { let mut buffers = (0..10) .map(|_| BytesMut::with_capacity(LOG_BUF_CAPACITY)) .collect::<Vec<_>>(); loop { let events = buf.read_events(&mut buffers).await.unwrap(); #[allow(clippy::needless_range_loop)] for i in 0..events.read { let buf = &mut buffers[i]; log_buf(buf, &*log).unwrap(); } } }); } Ok(BpfLogger {}) } }
AhAH! THere it is! It's listening for changes to a "perf event array"! I guess that's one way for BPF programs to communicate with userspace.
Userspace?
Our "regular Linux executable" that instructs the kernel to load our BPF program and what to attach it to etc.
Oh, right.
And we've also learned that aya-log messages can be 8K at most. Interesting.
So... I guess we can just do the same to send messages when a new connection (to port 22) is established and when it's closed?
Let's try it!
Our new program becomes this:
// in `hello-axum/flyremote-bpf/src/main.rs` #![no_std] #![no_main] use aya_bpf::{ macros::{map, sock_ops}, maps::PerfEventArray, programs::SockOpsContext, }; use aya_log_ebpf::info; // This is what we'll send over our "perf event array" #[repr(C)] pub struct ConnectionEvent { // 1 = connected, 2 = disconnected pub action: u32, } // We could probably make a Rust enum work here, but I don't feel like fighting // the verifier too much today. const ACTION_CONNECTED: u32 = 1; const ACTION_DISCONNECTED: u32 = 2; // Just like aya-log does, but this only has events we care about #[map(name = "EVENTS")] static mut EVENTS: PerfEventArray<ConnectionEvent> = PerfEventArray::<ConnectionEvent>::with_max_entries(1024, 0); #[sock_ops(name = "flyremote")] pub fn flyremote(ctx: SockOpsContext) -> u32 { match unsafe { try_flyremote(ctx) } { Ok(ret) => ret, Err(ret) => ret, } } unsafe fn try_flyremote(ctx: SockOpsContext) -> Result<u32, u32> { if ctx.local_port() != 22 { // don't care if it's not SSH-server-relevant return Ok(0); } // constants gotten from `bpf.h` const OP_PASSIVE_ESTABLISHED_CB: u32 = 5; const OP_STATE_CB: u32 = 10; const STATE_CLOSE: u32 = 7; match ctx.op() { OP_PASSIVE_ESTABLISHED_CB => { info!(&ctx, "Connection accepted!"); // subscribe to `state_cb` events let _ = ctx.set_cb_flags(1 << 2); // notify userspace let ev = ConnectionEvent { action: ACTION_CONNECTED, }; EVENTS.output(&ctx, &ev, 0); } OP_STATE_CB => { let new_state = ctx.arg(1); if new_state == STATE_CLOSE { info!(&ctx, "Connection closed!"); // notify userspace let ev = ConnectionEvent { action: ACTION_DISCONNECTED, }; EVENTS.output(&ctx, &ev, 0); } } _ => { // ignore } } Ok(0) } #[panic_handler] fn panic(_info: &core::panic::PanicInfo) -> ! { unsafe { core::hint::unreachable_unchecked() } }
We can try it out without changing our userspace program, thanks to aya-log
.
$ (cd flyremote-bpf && cargo +nightly build --verbose --target bpfel-unknown-none -Z build-std=core --release) (cut) $ cargo build --quiet && sudo ./target/debug/hello-axum (cut) 10:01:30 [INFO] hello_axum: [src/main.rs:45] Waiting for Ctrl-C... 10:01:36 [INFO] flyremote_bpf: [src/main.rs:49] Connection accepted! 10:01:37 [INFO] flyremote_bpf: [src/main.rs:63] Connection closed!
Wonderful! Now if we just subscribe to our perf event array...
// in `hello-axum/src/main.rs` use std::{ fs::File, sync::{ atomic::{AtomicU64, Ordering}, Arc, }, time::{Duration, Instant}, }; use aya::{include_bytes_aligned, util::online_cpus, Bpf}; use aya::{maps::perf::AsyncPerfEventArray, programs::SockOps}; use aya_log::BpfLogger; use bytes::BytesMut; use tokio::{signal, time::sleep}; // This is what we'll receive over our "perf event array". We'd normally // have a "common" crate we pull from both the bpf-nostd world and the // userspace-yesstd world, but for this example we're just copying it wholesale. #[repr(C)] #[derive(Clone, Copy)] pub struct ConnectionEvent { // 1 = connected, 2 = disconnected pub action: u32, } const ACTION_CONNECTED: u32 = 1; const ACTION_DISCONNECTED: u32 = 2; // Because we used `repr(C)` we can treat it as POD (plain old data) unsafe impl aya::Pod for ConnectionEvent {} #[tokio::main] async fn main() -> color_eyre::Result<()> { color_eyre::install()?; let mut bpf = Bpf::load(include_bytes_aligned!( "../flyremote-bpf/target/bpfel-unknown-none/release/flyremote-bpf" ))?; BpfLogger::init(&mut bpf)?; let num_conns: Arc<AtomicU64> = Default::default(); let mut perf_array = AsyncPerfEventArray::try_from(bpf.map_mut("EVENTS")?)?; for cpu_id in online_cpus()? { let mut buf = perf_array.open(cpu_id, None)?; let num_conns = num_conns.clone(); tokio::spawn(async move { let mut buffers = (0..10) .map(|_| BytesMut::with_capacity(1024)) .collect::<Vec<_>>(); loop { let events = buf.read_events(&mut buffers).await.unwrap(); for buf in &mut buffers[..events.read] { let ev = unsafe { (buf.as_ptr() as *const ConnectionEvent).read_unaligned() }; match ev.action { ACTION_CONNECTED => { println!("Connection accepted!"); num_conns.fetch_add(1, Ordering::SeqCst); } ACTION_DISCONNECTED => { println!("Connection closed!"); num_conns.fetch_sub(1, Ordering::SeqCst); } unknown => { println!("Unknown action: {}", unknown); } } } } }); } tokio::spawn(async move { let mut last_activity = Instant::now(); loop { if num_conns.load(Ordering::SeqCst) > 0 { last_activity = Instant::now(); } else { let idle_time = last_activity.elapsed(); println!("Idle for {idle_time:?}"); if idle_time > Duration::from_secs(60) { println!("Stopping machine. Goodbye!"); std::process::exit(0) } } sleep(Duration::from_secs(5)).await; } }); let program: &mut SockOps = bpf.program_mut("flyremote").unwrap().try_into()?; let cgroup = File::open("/sys/fs/cgroup/unified")?; program.load()?; program.attach(cgroup)?; println!("Waiting for Ctrl-C..."); signal::ctrl_c().await?; println!("Exiting..."); Ok(()) }
It does the thing!
$ cargo build --quiet && sudo ./target/debug/hello-axum Idle for 19.527µs Waiting for Ctrl-C... (in another terminal: ssh 127.0.0.2) Connection accepted! (in another terminal: Ctrl-D to close out of SSH) Connection closed! Idle for 5.001708865s Idle for 10.003602174s Idle for 15.004068679s Idle for 20.005524839s Idle for 25.006052848s Idle for 30.007529878s Idle for 35.008838041s Idle for 40.010259957s Idle for 45.011105232s Idle for 50.012581951s Idle for 55.013017848s Idle for 60.01454433s Stopping machine. Goodbye!
Impressive. Very nice. But how the heck is that gonna run "in the cloud"?
Well Bear, I've been doing all of this "in the cloud" already, as I've said before. I've tested that code from my actual remote dev environment on fly.io. Because the kernel they provide is BPF-enabled. That's how I know it'll work.
"All we need to do (TM)", is to re-add starting the SSH server before we do any of this, and have it listen on... let's say port 2222 this time, on IPv4.
Wait, why are we changing ports?
Well because we need to listen on 0.0.0.0
this time. Remember, fly-proxy is
the one exposing "edge port 22" to some internal port inside the VM. It's
actually connecting through the eth0
interface:
$ ip addr show dev eth0 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc pfifo_fast state UP group default qlen 1000 link/ether de:ad:f9:57:5a:f4 brd ff:ff:ff:ff:ff:ff inet 172.19.0.210/29 brd 172.19.0.215 scope global eth0 valid_lft forever preferred_lft forever inet 172.19.0.211/29 brd 172.19.0.215 scope global secondary eth0 valid_lft forever preferred_lft forever inet6 2604:1380:71:1403:0:ae09:fd49:1/127 scope global nodad valid_lft forever preferred_lft forever inet6 fdaa:0:6964:a7b:5b66:ae09:fd49:2/112 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::dcad:f9ff:fe57:5af4/64 scope link valid_lft forever preferred_lft forever
...but let's not rely on that. We really do want OpenSSH to listen on 0.0.0.0
("all interfaces") now, and port 22 is already taken by hallpass for a given
interface, so the bind would fail.
So let's have it listen on port 2222 instead (left as an exercise to the reader), and make sure to adjust our BPF program so it monitors connections to port 2222 instead of 22 as well (also left as an exercise).
The Dockerfile, I'll help with. But really it's what we've already done, just in Docker: we only need to mess with the very last part of the "builder" target
# in `hello-axum/Dockerfile` # syntax = docker/dockerfile:1.4 ################################################################################ FROM ubuntu:20.04 AS builder # (omitted: install base utils, install rustup, add rustup to path) # Build some code! WORKDIR /app COPY . . RUN --mount=type=cache,target=/app/target \ --mount=type=cache,target=/root/.cargo/registry \ --mount=type=cache,target=/root/.cargo/git \ --mount=type=cache,target=/root/.rustup \ set -eux; \ rustup install nightly; \ rustup component add rust-src --toolchain nightly; \ cargo +nightly install bpf-linker; \ (cd flyremote-bpf && cargo +nightly build --verbose --target bpfel-unknown-none -Z build-std=core --release); \ cargo +nightly build --release; \ objcopy --compress-debug-sections target/release/hello-axum ./hello-axum # (omitted: other targets)
Couple notes here: we need nightly to build the BPF program anyway, so I've
chosen to use it to build the main program too, just so we don't have to install
two different toolchains. We need the rust-src
component so we add that. This
is definitely the wrong place to install bpf-linker
(we want to do that before
COPY . .
), but you can figure that one out.
By now you should know how to forcefully remove the old machine and run a new one, so I won't show that part - instead I'll show some logs that provide IRREFUTABLE PROOF that it's working as intended:
$ fly logs (cut) 2022-06-20T10:32:59Z app[73d8d463ce7589] cdg [info]Idle for 55.013385687s 2022-06-20T10:33:04Z app[73d8d463ce7589] cdg [info]Idle for 60.014628076s 2022-06-20T10:33:04Z app[73d8d463ce7589] cdg [info]Stopping machine. Goodbye! 2022-06-20T10:35:09Z proxy[73d8d463ce7589] cdg [info]Machine started in 378.712588ms 2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info] * Starting OpenBSD Secure Shell server sshd 2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info] ...done. 2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]Idle for 351ns 2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]Waiting for Ctrl-C... 2022-06-20T10:35:09Z proxy[73d8d463ce7589] cdg [info]Machine became reachable in 162.022275ms 2022-06-20T10:35:09Z app[73d8d463ce7589] cdg [info]Connection accepted! 2022-06-20T10:35:24Z app[73d8d463ce7589] cdg [info]Connection closed! 2022-06-20T10:35:29Z app[73d8d463ce7589] cdg [info]Idle for 5.000919043s 2022-06-20T10:35:33Z app[73d8d463ce7589] cdg [info]Connection accepted! 2022-06-20T10:35:37Z app[73d8d463ce7589] cdg [info]Connection closed! 2022-06-20T10:35:39Z app[73d8d463ce7589] cdg [info]Idle for 5.001182727s 2022-06-20T10:35:44Z app[73d8d463ce7589] cdg [info]Idle for 10.00136033s 2022-06-20T10:35:49Z app[73d8d463ce7589] cdg [info]Idle for 15.002518752s 2022-06-20T10:35:54Z app[73d8d463ce7589] cdg [info]Idle for 20.003763466s 2022-06-20T10:35:59Z app[73d8d463ce7589] cdg [info]Idle for 25.00504594s 2022-06-20T10:36:04Z app[73d8d463ce7589] cdg [info]Idle for 30.006257662s 2022-06-20T10:36:09Z app[73d8d463ce7589] cdg [info]Idle for 35.007527924s 2022-06-20T10:36:14Z app[73d8d463ce7589] cdg [info]Idle for 40.008764092s 2022-06-20T10:36:19Z app[73d8d463ce7589] cdg [info]Idle for 45.010007093s 2022-06-20T10:36:24Z app[73d8d463ce7589] cdg [info]Idle for 50.011195821s 2022-06-20T10:36:29Z app[73d8d463ce7589] cdg [info]Idle for 55.011391438s 2022-06-20T10:36:34Z app[73d8d463ce7589] cdg [info]Idle for 60.01265106s 2022-06-20T10:36:34Z app[73d8d463ce7589] cdg [info]Stopping machine. Goodbye!
Ah... bliss.
Aren't you forgetting something?
Oh, right! How many syscalls are we actually doing? Since it's the Sole Measure of goodness in this wretched world?
To test this, I'm going to connect from VSCode, and run something that
constantly outputs text, like... ok maybe not yes
, but watch whoami
.
And then from fly ssh console
, we just run strace
...
$ strace -ff -p $(pidof hello-axum) strace: Process 583 attached with 9 threads [pid 596] futex(0x7fb0baa0d618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 595] futex(0x7fb0bac11618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 594] epoll_wait(3, <unfinished ...> [pid 592] futex(0x7fb0bb21d618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 591] futex(0x7fb0bb41e618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 590] futex(0x7fb0bb622618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 589] futex(0x7fb0bb823618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 583] futex(0x7fb0bb825498, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 593] futex(0x7fb0bb019618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 594] <... epoll_wait resumed>[], 1024, 1718) = 0 [pid 594] epoll_wait(3, [], 1024, 30) = 0 [pid 594] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 594] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 1824) = 1 [pid 594] epoll_wait(3, [], 1024, 1824) = 0 [pid 594] epoll_wait(3, [], 1024, 3134) = 0 [pid 594] epoll_wait(3, [], 1024, 37) = 0 [pid 594] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 594] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 919) = 1 [pid 594] epoll_wait(3, [], 1024, 919) = 0 [pid 594] epoll_wait(3,
Huh. It's vewwy vewwy quiet. Let's try to disconnect?
[], 1024, 2775) = 0 [pid 594] epoll_wait(3, [{EPOLLIN, {u32=2, u64=2}}, {EPOLLIN, {u32=10, u64=10}}], 1024, 2173) = 2 [pid 594] futex(0x7fb0bb823618, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 589] <... futex resumed>) = 0 [pid 594] write(1, "Connection closed!\n", 19 <unfinished ...> [pid 589] futex(0x7fb0bb019618, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 594] <... write resumed>) = 19 [pid 589] <... futex resumed>) = 1 [pid 593] <... futex resumed>) = 0 [pid 594] futex(0x7fb0bae15618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 593] epoll_wait(3, <unfinished ...> [pid 589] futex(0x7fb0bb823618, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 593] <... epoll_wait resumed>[], 1024, 1370) = 0 [pid 593] epoll_wait(3, [], 1024, 49) = 0 [pid 593] write(1, "Idle for 5.001369468s\n", 22) = 22 [pid 593] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 593] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 1869) = 1 [pid 593] epoll_wait(3,
And now reconnect?
[], 1024, 1869) = 0 [pid 593] epoll_wait(3, [], 1024, 3070) = 0 [pid 593] epoll_wait(3, [], 1024, 57) = 0 [pid 593] write(1, "Idle for 10.002899968s\n", 23) = 23 [pid 593] write(4, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 593] epoll_wait(3, [{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 964) = 1 [pid 593] epoll_wait(3, [], 1024, 964) = 0 [pid 593] epoll_wait(3, [{EPOLLIN, {u32=2, u64=2}}, {EPOLLIN, {u32=10, u64=10}}], 1024, 4031) = 2 [pid 593] futex(0x7fb0bb823618, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 593] write(1, "Connection accepted!\n", 21) = 21 [pid 589] <... futex resumed>) = 0 [pid 593] epoll_wait(3, <unfinished ...> [pid 589] futex(0x7fb0bb823618, FUTEX_WAIT_PRIVATE, 1, NULL
Wonderful.
Say amos? Isn't that completely overkill?
Oh, almost definitely. I don't think anybody needs to optimize their remote dev environment that much: we definitely could've lived with the additional overhead of manually copying bufferfuls from kernelspace to userspace and back.
But isn't cool that we can?
Sure, sure. But I mean... using BPF for this?
Oh yeah that's hella overkill too. And again I'm just flexing at this point (teaching, I mean teaching).
Yeah I was gonna say... there's probably other ways to know what connections OpenSSH's process has anyway, right? Doesn't the kernel keep track of that?
It 100% absolutely does, and so there's a much simpler solution for all that, that we could've done with a bash script.
Ooooh are we writing a bash script?
Ohhhhh no. No no no. Not today.
Simply polling procfs
See, the kernel is kind enough to expose all kinds of information simply through procfs. So if you're willing to read a couple files, you can get this, for example:
root@73d8d463ce7589:/# cat "/proc/$(pidof -s sshd)/net/tcp" sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode 0: 00000000:08AE 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 10244 1 000000007d0c6713 100 0 0 10 0 1: 0100007F:8019 00000000:0000 0A 00000000:00000000 00:00000000 00000000 1000 0 7194 1 000000000b0ca858 100 0 0 10 0 2: 1A0413AC:08AE 47851C93:E44A 01 00000000:00000000 02:000A7F5E 00000000 0 0 13366 2 00000000255b322e 21 4 30 10 50 3: 0100007F:8019 0100007F:E76C 01 00000000:00000000 00:00000000 00000000 1000 0 7682 1 000000004d3b51d2 21 4 2 10 -1 4: 0100007F:8019 0100007F:E76A 01 00000000:00000000 00:00000000 00000000 1000 0 7680 1 00000000eb58348a 20 4 30 10 -1 5: 0100007F:E76C 0100007F:8019 01 00000000:00000000 00:00000000 00000000 1000 0 13405 1 00000000d63dc6ce 21 4 18 10 -1 6: 0100007F:E76A 0100007F:8019 01 00000000:00000000 00:00000000 00000000 1000 0 13403 1 0000000057ff1c3b 20 4 10 10 -1
Mhhh what are we looking at.
Well it says so! It's a list of TCP connections: there's the local address, the remote address, and some more information.
Okay but those addresses look... I mean.. they look a little like MAC addresses?
Oh no no, they're just hex. So for example we have sshd listen on port 2222 now, and in hex, that's?
Oh, 0x8AE!
Exactly! And so that one is our active connection:
root@73d8d463ce7589:/# grep "08AE" "/proc/$(pidof -s sshd)/net/tcp" 0: 00000000:08AE 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 10244 1 000000007d0c6713 100 0 0 10 0 2: 1A0413AC:08AE 47851C93:E44A 01 00000000:00000000 02:000A50A1 00000000 0 0 13366 2 00000000255b322e 21 4 30 10 50
Wait, no, there's two of them.
Ah, maybe the first one is the listening socket?
Probably! That seems right since it has all-zero values for a bunch of fields.
In fact, let's try disconnecting...
root@73d8d463ce7589:/# grep "08AE" "/proc/$(pidof -s sshd)/net/tcp" 0: 00000000:08AE 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 10244 1 000000007d0c6713 100 0 0 10 0
Yup! Sounds about right.
Okay, I see what you mean about the bash script now. Because it's just a file,
we could like... grep for 08AE
, exclude the 00000000
thingy, and then have a
counter that keeps track of how long it's been without connections... and then
exit if it's been a while...
Yes, and since bash is notoriously awful at both strings AND numbers (do not @ me), and time, and conditionals, everything in fact, and because there's a neat procfs crate, and because this is still my blog and I stil make the rules here, I'm just gonna write Rust.
Let's gooooo:
$ cargo add procfs Updating 'https://github.com/rust-lang/crates.io-index' index Adding procfs v0.12.0 to dependencies. Features: + chrono + flate2 - backtrace
Our Cargo.toml becomes this, very lean, no need for tokio anymore:
# in `hello-axum/Cargo.toml` [package] name = "hello-axum" version = "0.1.0" edition = "2021" [dependencies] color-eyre = "0.6.1" procfs = "0.12.0"
// in `hello-axum/src/main.rs` use std::{ process::{Command, Stdio}, thread::sleep, time::{Duration, Instant}, }; use procfs::net::TcpState; fn main() -> color_eyre::Result<()> { color_eyre::install()?; let status = Command::new("service") .arg("ssh") .arg("start") .stdin(Stdio::null()) .stdout(Stdio::inherit()) .stderr(Stdio::inherit()) .status()?; assert!(status.success()); let mut last_activity = Instant::now(); loop { if count_conns()? > 0 { last_activity = Instant::now(); } else { let idle_time = last_activity.elapsed(); println!("Idle for {idle_time:?}"); if idle_time > Duration::from_secs(60) { println!("Stopping machine. Goodbye!"); std::process::exit(0) } } sleep(Duration::from_secs(5)); } } fn count_conns() -> color_eyre::Result<usize> { Ok(procfs::net::tcp()? .into_iter() // don't count listen, only established .filter(|entry| matches!(entry.state, TcpState::Established)) .filter(|entry| matches!(entry.local_address.port(), 2222)) .count()) }
The build section of our Dockerfile is simple again, too:
# Build some code! WORKDIR /app COPY . . RUN --mount=type=cache,target=/app/target \ --mount=type=cache,target=/root/.cargo/registry \ --mount=type=cache,target=/root/.cargo/git \ --mount=type=cache,target=/root/.rustup \ set -eux; \ rustup install stable; \ cargo build --release; \ objcopy --compress-debug-sections target/release/hello-axum ./hello-axum
And just as before... it works:
$ fly logs 2022-06-20T11:06:04Z proxy[06e82219b74987] cdg [info]Machine started in 492.723163ms 2022-06-20T11:06:04Z app[06e82219b74987] cdg [info] * Starting OpenBSD Secure Shell server sshd 2022-06-20T11:06:04Z app[06e82219b74987] cdg [info] ...done. 2022-06-20T11:06:04Z app[06e82219b74987] cdg [info]Idle for 125.596µs 2022-06-20T11:06:04Z proxy[06e82219b74987] cdg [info]Machine became reachable in 81.028264ms 2022-06-20T11:06:14Z app[06e82219b74987] cdg [info]Idle for 5.000259697s 2022-06-20T11:06:19Z app[06e82219b74987] cdg [info]Idle for 10.001721387s 2022-06-20T11:06:24Z app[06e82219b74987] cdg [info]Idle for 15.002035967s 2022-06-20T11:06:29Z app[06e82219b74987] cdg [info]Idle for 20.002387236s 2022-06-20T11:06:34Z app[06e82219b74987] cdg [info]Idle for 25.002711273s 2022-06-20T11:06:39Z app[06e82219b74987] cdg [info]Idle for 30.003033687s 2022-06-20T11:06:44Z app[06e82219b74987] cdg [info]Idle for 35.003318902s 2022-06-20T11:06:49Z app[06e82219b74987] cdg [info]Idle for 40.003605699s 2022-06-20T11:06:54Z app[06e82219b74987] cdg [info]Idle for 45.003890203s 2022-06-20T11:06:59Z app[06e82219b74987] cdg [info]Idle for 50.004175949s 2022-06-20T11:07:04Z app[06e82219b74987] cdg [info]Idle for 55.004478496s 2022-06-20T11:07:09Z app[06e82219b74987] cdg [info]Idle for 60.004781203s 2022-06-20T11:07:09Z app[06e82219b74987] cdg [info]Stopping machine. Goodbye!
This is boring. Everything just works.
Right?? And yet I'm thrilled about it. I'm thrilled to know that, with Rust, I can do any of:
- blocking I/O with threads
- non-blocking I/O with tokio
- io-uring with tokio-uring
- eBPF with aya
- just read procfs
And they all, boringly enough, work just fine. And I feel good about leaving them in production and never touching them again.
To me, that is the dream.
Thanks to my sponsors: prairiewolf, Raphaël Thériault, Mason Ginter, Aiden Scandella, AdrianEddy, Sawyer Knoblich, Astrid, Sam, Christoph Grabo, Diego Roig, Justin Ossevoort, Matthew T, Chris Biscardi, Nicolas Goy, Jean-David Gadina, Johnathan Pagnutti, Jonathan Adams, Evan Relf, Mattia Valzelli, Ronen Cohen and 227 more
If you liked what you saw, please support my work!
Here's another article just for you:
What's a ktls
I started work on ktls and ktls-sys, a pair of crates exposing Kernel TLS offload to Rust, about two years ago.
kTLS lets the kernel (and, in turn, any network interface that supports it) take care of encryption, framing, etc., for the entire duration of a TLS connection... as soon as you have a TLS connection.
For the handshake itself (hellos, change cipher, encrypted extensions, certificate verification, etc.), you still have to use a userland TLS implementation.