Impromptu disaster recovery

Background

im-promp-tu (im-ˈpräm(p)-(ˌ)tü)

  1. made, done, or formed on or as if on the spur of the moment: improvised

  2. composed or uttered without previous preparation: extemporaneous

Merriam-Webster

On March 18th, 2025, I thought I would look into self-hosted project management solutions — something kanban-y, but.. better?

A screenshot of WeKan, the open-source Kanban. It looks.. from another age.

This one does not spark joy.

After discovering that Teamhood was awesome (and EU-based), but had a 3-seat minimum on their subscriptions, I resigned to reluctantly self-host something.

The teamhood kanban view. I really like their interface.

No mobile support though.

I’m familiar with self-hosting. Like everyone, I’ve been SSH-ing into VPSs for as long as they’ve been cheap, which is… pretty long now.

Before that, shared PHP hosts, and nowadays, I just run a small k3s cluster, on a dedicated Hetzner server in Germany, with nodes around the world (also all on Hetzner now — they have a southeast-asia region now!)

A list of available Hetzner locations as of March 2025. There's Nuremberg, Falkenstein, Helsinki, Singapore, Hillsborough, Oregon, and Ashburn, Virginia.

This is honestly all you need for most projects.

Hetzner Locations

On this cluster, we find a custom-made CDN that powers this website, along with a bunch of internal services, like a Forgejo instance.

My forgejo homepage, with the URL redacted. You can see some activity in a project called 'snugkit', 'snug', and 'fasterthanli.me'.

My instance is only available to logged in users, so I don't have issues with AI crawlers. Big up to everyone fighting the good fight out there.

After finding out that taiga has a couple of unofficial helm charts on ArtifactHub, I opened my infra repo (hosted on GitHub, not on Forgejo, to avoid cyclical dependencies) and started drafting manifests to deploy taiga.

I keep forgetting what Artifact Hub is named.

There are three Helm charts for Taiga, none are official and none have been touched in the last two years.

This doesn't mean they don't work — maybe the software is just very stable!

ArtifactHub search for taiga

How I deploy stuff on k3s

The most straightforward way to deploy something to Kubernetes is to use kubectl. This works just as well on k3s, the lightweight distribution of Kubernetes I use.

As you can see, my website is back up as I’m writing this:

~ k -n snug get pods NAME READY STATUS RESTARTS AGE cub-7c59bdf6f7-86m69 1/1 Running 0 50m cub-7c59bdf6f7-9z9hk 1/1 Running 0 50m cub-7c59bdf6f7-csk89 1/1 Running 0 50m cub-7c59bdf6f7-kvxbq 1/1 Running 0 50m cub-7c59bdf6f7-nq44t 1/1 Running 0 50m cub-7c59bdf6f7-th9sf 1/1 Running 0 50m mom-5ccc54d65-26bmn 1/1 Running 0 50m
Amos

Real ones have kubectl aliased to k!

I don’t like that approach, because there’s a disconnect between the YAML manifest as text files on your disk or in a repository, and what actually exists as resources in your Kubernetes server.

What if you forget to run kubectl apply on one of the files? What if you run it in a different order and some resources get overwritten?

Cool bear

Isn’t that where something like Spinnaker comes in?

It absolutely is. Once you’re done with continuous integration, making sure all the parts fit together, then you can do continuous delivery, which makes sure that you are deploying changes as quickly as possible in a safe way.

Amos

Speaking of CI, this is a fun watch:

This can involve rolling out changes to only a portion of nodes, monitoring some metrics to make sure that they don’t go up or don’t go down, and deciding whether to roll out the rest of the fleet or rolling back.

I have thus far resisted the temptation to resort to something like Spinnaker for my blog.

Instead, I use rsync!

#!/bin/bash -eux ssh root@brat "touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip" & rsync -av --delete ./manifests/ root@brat:/var/lib/rancher/k3s/server/manifests/custom/ ssh root@brat "journalctl -fxu k3s"

That’s my infra/deploy-manifests script. When the new files are written to that folder, the k3s reconciler receives notifications that files changed and applies those changes.

The state of the Kubernetes server and the state of my git repository can still drift if I’m not careful, but most of the time I remember and things are fine.

Cool bear

Why not resolve the problem by running that rsync from a CI job?

I would like to do it from ForgeJo actions, but then we have a chicken and egg problem: ForgeJo itself is deployed with k3s…

Cool bear

…right, a circular dependency like that would be bad in case of outages.

Exactly. I could do something like GitHub Actions, but apart from the fact that I really don’t like GitHub Actions, I don’t feel comfortable giving it a private key that can log into all my production servers, even on a private repository.

Cool bear

Are you sure you’re not just finding excuses to avoid setting it up?

Cool bear

Isn’t it just laziness?

Mayhaps.

The k3s reconciler

Trouble is: the k3s reconciler is much, uhhh simpler than I anticipated.

See, my resources are actually spread across multiple YAML files:

infra on  main [$] via 🦀 v1.85.0 tree -ahC manifests [ 512] manifests ├── [ 345] README.md ├── [ 160] cert-manager │   ├── [ 451] 300-cert-manager-chart.yaml │   ├── [ 374] 900-cert-manager-issuer-letsencrypt-prod.yaml │   └── [ 139] README.md ├── [ 160] cloudnative-pg │   ├── [ 109] 000-cloudnative-pg-namespace.yaml │   ├── [ 273] 100-cloudnative-pg-chart.yaml │   └── [ 208] README.md ├── [ 448] forgejo │   ├── [ 115] 000-forgejo-namespace.yaml │   ├── [4.6K] 001-forgejo-config-secret.yaml │   ├── [ 174] 002-forgejo-runner-secret.yaml ✂️

This is something k3s understands: on startup, it reads and processes all .yaml files, compares the whole set against existing resources in k8s, and then does whatever CRUD actions are needed.

Cool bear

Is there an intermediate “here’s the plan, do you want to go ahead” step like you’d have in a tool like OpenTofu, or a dry-run option like you’d have in a tool like Ansible?

Nope!

I’d gotten used to the reconciler limitations over time.

I would routinely comment and uncomment resources to force recreation rather than a modification that often fails.

And of course, the only way to know if something failed is to look at its logs:

# journalctl -u k3s | grep -F 'custom/' | ccze -A | tail -4 Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.545897 52573 event.go:389] "Event occurred" object="kube-system/410-snug-tenants-ingress" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="AppliedManifest" message="Applied manifest at \"/var/lib/rancher/k3s/server/manifests/custom/snug/410-snug-tenants-ingress.yaml\"" Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.552720 52573 event.go:389] "Event occurred" object="kube-system/300-traefik-v3-chart" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="ApplyingManifest" message="Applying manifest at \"/var/lib/rancher/k3s/server/manifests/custom/traefik-v3/300-traefik-v3-chart.yaml\"" Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.575145 52573 event.go:389] "Event occurred" object="kube-system/300-traefik-v3-chart" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="AppliedManifest" message="Applied manifest at \"/var/lib/rancher/k3s/server/manifests/custom/traefik-v3/300-traefik-v3-chart.yaml\""
Cool bear Cool Bear's hot tip

ccze is a log colorizer as old as the world itself, which is to say you can install it with apt install ccze on Debian 12 — it makes journalctl output almost bearable.

This makes this solution worse than running kubectl apply locally, because at least kubectl apply will block until the apply is done and will show you errors.

~ k apply -f /etc/hosts error: error validating "/etc/hosts": error validating data: invalid object to validate; if you choose to ignore these errors, turn validation off with --validate=false

After a while, I had gotten so good at predicting what the reconciler would do that I stopped checking the logs systematically.

I’d only SSH in if things didn’t magically start working withing 30s or so.

Amos

Kubernetes controllers love to panic.

You have to go out of your way to properly validate data in Go, so some just.. don’t. It’s probably not as bad in widely used controllers, but this is a systemic issue in the Go ecosystem.

Until March 18, 2025, when things went very, very wrong.

The reformattening

I recently changed my code editor settings to indent all files with 4 spaces instead of 2. Not just Rust, but YAML too.

Upon opening existing manifest to refresh my memory, I noticed they were indented with two spaces, so I asked “Claude 3.5 Sonnet (Fast Edit)” to give me a command to reformat all of them.

Amos

The “(Fast Edit)” mode is, as far as I know, only available in the Zed code editor.

For my money, it’s the best code help from the current set of assistants because it has very limited smarts, but it is very, very fast, so I don’t actually have time to get out of flow state.

It gave me this command:

yq -i -P '.' manifests/**/*.yaml
Amos

I know, because it’s in my atuin history. atuin is awesome.

I installed yq, rightfully assuming it’s jq but for YAML.

The command looked good to me on the surface and because this is all happening on a git repository I figured that if it didn’t work, I could just revert.

I ran it, and checked git status:

infra on  main [$] via 🦀 v1.85.0 yq -i -P '.' manifests/**/*.yaml infra on  main [$!] via 🦀 v1.85.0 gwS On branch main Your branch is up to date with 'origin/main'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: manifests/cert-manager/300-cert-manager-chart.yaml no changes added to commit (use "git add" and/or "git commit -a")

I was disappointed seeing that it only worked on one file.

I briefly thought about adapting it to use a loop, but then I remembered I switched to the fish shell and I don’t remember their looping syntax off the top of my head yet.

So I committed the work so far, remembered to run ./deploy-manifests, and then ran zed manifests/**/*.yaml to open them all as tabs — I only had maybe 15, I figured I’d just repeatedly ask it to format them.

While I was doing that, I happened to have the k3s logs tailing in a window on my second screen.

Because it started scrolling furiously, I took a closer look.

It was trying (and failing, repeatedly) to create… duplicates of resources that already existed?

After frowning at the logs for a bit I found that it mentioned 300-cert-manager-chart.yaml a lot, and… wait a minute, why is it 1.75MB?

Cool bear What did we learn?

We’ll talk more about the roles LLMs played in that story later.

Since this isn’t a blameless postmortem, I, cool bear, will be in charge of assigning blame.

I’m observing that:

  • Amos did not read the man page for yq before running the command.
  • Amos only looked at git status, not git diff — even a very quick look would have shown that something was seriously wrong.
  • Amos was under a lot of stress that day, didn’t have enough sleep, and probably should not have been touching infrastructure at all.

Further blame will be assigned later.

No thoughts head weighty

This command did not do what I thought it would:

yq -i -P '.' manifests/**/*.yaml

I’m fully bracing myself for a thousand replies saying we told you so blah blah blah — I know, I know. But let’s take the experiment to its logical conclusion and ask a bunch of LLMs to explain what’s wrong with this command.

GPT-4o

GPT-4o is the default “non-reasoning” model from OpenAI if you have a “Plus” subscription.

At the time of this writing, GPT-4.5 did technically come out, but I don’t know anybody who’s using it seriously. Many people doubt that non-reasoning models will get much smarter than GPT-40.

Something OpenAI did recently-ish is adding sources to answers

Amos

Explain what this does?

❯ yq -i -P ‘.’ manifests/**/*.yaml

This command uses yq, a command-line YAML processor, to modify YAML files in place with pretty-printing.

Breakdown:

  1. yq - The YAML processor (like jq for JSON but for YAML).
  2. -i - Edits the files in place (modifies the original files rather than printing output to stdout).
  3. -P - Pretty-prints the output (ensures consistent formatting with indentation and spacing).
  4. '.' - The yq expression. In this case, . represents the entire YAML structure, meaning it reads and rewrites the YAML file as-is without modifying the content itself.
  5. manifests/**/*.yaml - Targets all YAML files recursively inside the manifests/ directory and its subdirectories.

Effect:

  • It reformats all YAML files under manifests/ with consistent indentation and spacing.
  • If the YAML files were messy or had inconsistent formatting, they will now be neatly formatted.
  • No actual content change happens unless yq has an opinionated reformatting behavior.

This is useful for enforcing consistent YAML formatting across multiple files, especially in projects with many configuration files (e.g., Kubernetes manifests).

GPT-4o didn’t see the problem.

I had to insist.

Amos

well, it actually merges all the files into the first argument.

did you know that?

You’re absolutely right! The behavior of yq can be a bit surprising when using -i (in-place editing) with multiple files.

What’s Happening?

  • When yq -i -P '.' manifests/**/*.yaml is executed, yq processes all matching YAML files.
  • However, -i makes yq write back the result to a single file—specifically, the first file in the glob expansion (manifests/**/*.yaml).
  • Instead of formatting each file separately, all files get merged into the first file, effectively overwriting it with a merged version of all YAML contents.
Cool bear

Ah, so it does sort of “understand” what the problem is! I mean, not understand, but…

Well, it then went on to say this:

How to Fix It?

If the goal is to reformat each file separately rather than merging them, use:

yq -i -P '.' manifests/**/*

This ensures yq processes files individually, preventing unwanted merging.

I didn’t want to let it get away with that, so I pressed on:

Amos

…did you just give me the same command again?

it still has the same problem.

You’re absolutely right to call that out. I should have been more careful! The issue still persists because the core problem is how yq handles multiple files when using -i.

It eventually gave me two solutions that do work:

Solution using find + xargs

find manifests -type f -name '*.yaml' -print0 | xargs -0 -I{} yq -i -P '.' {}

Solution using a simple for loop

for file in manifests/**/*.yaml; do yq -i -P '.' "$file" done

Before we close the LLM parenthesis, let’s check what some other models would say.

Claude 3.7 Sonnet

This command uses the yq tool to process YAML files. Let me break it down:

yq -i -P '.' manifests/**/*.yaml does the following:

  • yq is a command-line tool for processing YAML files (similar to how jq processes JSON)

  • The flags:

    • -i stands for “in-place” - it modifies the files directly instead of printing to standard output
    • -P stands for “pretty-print” - it formats the YAML with proper indentation and structure
  • '.' is the filter expression, which means “the entire document” (essentially, process the whole document without changing its content)

  • manifests/**/*.yaml is a glob pattern that matches:

    • All files with .yaml extension
    • Inside any subdirectory (** means “any depth of directories”)
    • Under the manifests directory

In summary, this command reformats and pretty-prints all YAML files within the manifests directory and its subdirectories, updating the files directly with consistent formatting. This is commonly used in Kubernetes or other infrastructure-as-code projects to ensure YAML files are consistently formatted.

Then, when I point out it merges everything into the first file:

You’re absolutely right, and I apologize for my mistake. Thank you for the correction.

Indeed, the command yq -i -P '.' manifests/**/*.yaml does not just pretty-print each file individually - it merges the content of all matched YAML files into the first file in the list. This is a critical distinction!

What’s happening is:

  1. The shell expands the glob pattern manifests/**/*.yaml into a list of files
  2. yq takes this list and, because of the -i (in-place) flag, it merges the content of all those files into the first file in the list
  3. The remaining files are left untouched

This behavior can be very destructive if not intended, as it would combine the contents of potentially many YAML files into just one file, while leaving the others unchanged.

If someone wanted to pretty-print each file individually without merging, they would need to use a different approach, such as a loop:

find manifests -name "*.yaml" -exec yq -i -P '.' {} \;

Thank you for pointing this out - it’s an important distinction that could cause significant problems if misunderstood.

I like Claude’s solution better — there’s no need to go through xargs or worry about IFS at all.

Deepseek R1

The command you provided uses yq, a command-line YAML processor, to modify YAML files in place. Here’s a breakdown of what it does:

  1. yq: This is the command-line tool for processing YAML files. It is similar to jq but specifically designed for YAML.

  2. -i: This flag tells yq to edit the files in place. Without this flag, yq would output the modified YAML to the terminal without saving the changes to the original files.

  3. -P: This flag tells yq to pretty-print the output. It ensures that the YAML files are formatted in a readable way with proper indentation and structure.

  4. '.': This is the filter expression. In this case, it simply means “select the entire document.” It doesn’t modify the content but ensures the file is processed and rewritten with the specified formatting.

  5. manifests/**/*.yaml: This is the file path pattern. It tells yq to process all YAML files (*.yaml) in the manifests directory and its subdirectories (**).

What does this command do?

  • It processes all YAML files in the manifests directory and its subdirectories.
  • It ensures the YAML files are pretty-printed (formatted nicely).
  • It saves the changes back to the original files (in place).

Example

If you have a YAML file like this:

foo: bar baz: - qux - quux

After running the command, the file will be reformatted (if necessary) to ensure proper indentation and structure, but the content will remain the same.

Use Case

This command is useful for cleaning up or standardizing the formatting of YAML files in a directory, especially when working with Kubernetes manifests or other YAML-based configurations.

Still oblivious.

After correction, it goes on at length:

Ah, you’re absolutely correct! I missed that subtle but important detail in my initial explanation. Let me clarify what actually happens when you run the command:

yq -i -P '.' manifests/**/*.yaml

What it actually does:

  1. manifests/**/*.yaml: This glob pattern matches all YAML files in the manifests directory and its subdirectories.
  2. yq: When multiple files are passed to yq, it merges them into a single YAML document by default.
  3. '.': The filter expression '.' means “select the entire document,” but since multiple files are passed, yq merges them into one.
  4. -i: The -i flag tells yq to write the result back to the first file in the list (in place).
  5. -P: The -P flag ensures the output is pretty-printed.

Key Behavior:

  • Merging: All the YAML files matched by manifests/**/*.yaml are merged into a single YAML document.
  • In-place editing: The merged result is written back to the first file in the list (overwriting its content).
  • The other files remain unchanged.

DeepSeek R1 thought it important enough to make a whole heading answering my question: did it know about this?

Did I know this?

Yes, I’m aware of this behavior, but I missed it in my initial explanation. Thank you for pointing it out! This merging behavior is a feature of yq when multiple files are passed, and it’s important to be cautious when using -i (in-place editing) to avoid accidentally overwriting files.

Use Case

This merging behavior can be useful when you want to combine multiple YAML files into one, such as consolidating Kubernetes manifests or configuration files. However, you should always double-check the results to ensure the merge is what you intended.

Let me know if you’d like further clarification! 😊

Mistral’s Le Chat

The command yq -i -P '.' manifests/**/*.yaml is used to process YAML files with the yq tool. Here’s a breakdown of what each part of the command does:

  1. yq: This is a command-line YAML processor. It allows you to parse, edit, and manipulate YAML files using a syntax similar to jq, which is used for JSON files.

  2. -i: This flag stands for “in-place.” It means that the changes made by the yq command will be applied directly to the original files, rather than outputting the results to the standard output or a new file.

  3. -P: This flag stands for “pretty-print.” It ensures that the output YAML is formatted in a readable way, with proper indentation and line breaks.

  4. '.': This is the expression that yq will evaluate. In this case, the dot (.) represents the entire YAML document. So, this expression selects the whole content of each YAML file.

  5. manifests/**/*.yaml: This is a glob pattern that matches all YAML files (*.yaml) in the manifests directory and its subdirectories. The ** part means that it will recursively search through all subdirectories.

Putting it all together, the command yq -i -P '.' manifests/**/*.yaml will:

  • Recursively find all YAML files in the manifests directory and its subdirectories.
  • Pretty-print the entire content of each YAML file.
  • Update each YAML file in place with the pretty-printed version.

This is useful for ensuring that all YAML files in a project have

Le Chat cites sources from the yq repository, the yq docs frontpage, the Tips, Trick and Troubleshooting part of the yq docs, and a 2021 blog article called Mastering YAML Processing in Command Line.

But of course, it’s still wrong.

You are correct. The command yq -i -P '.' manifests/**/*.yaml will indeed merge all the YAML files matched by the glob pattern into the first file listed. This behavior is due to how yq processes multiple files when using the -i (in-place) flag.

Here’s a more accurate explanation:

  • yq -i -P '.' manifests/**/*.yaml:
    • yq: The command-line YAML processor.
    • -i: Update the files in place.
    • -P: Pretty-print the output.
    • '.': Select the entire content of each YAML file.
    • manifests/**/*.yaml: Match all YAML files in the manifests directory and its subdirectories.

When multiple files are specified with the -i flag, yq merges the content of all the files into the first file in the list. This means that the content of all YAML files matched by manifests/**/*.yaml will be merged into the first file in that list, and the result will be pretty-printed and saved in place.

This behavior can be useful for consolidating multiple YAML configuration files into a single file, but it’s important to be aware of this merging effect to avoid unintentionally overwriting data.

Thank you for pointing that out!

Short and to the point — why waste time apologizing, the damage is done.

All of the models are aligned enough to thank me, but of course, my corrections did not make a difference. Those models don’t evolve — maybe the next generation’s training set will include this article, and then we’ll know how to invoke yq.

The aftermath

This isn’t anything new. It’s fun that they all got it wrong, though, even o3-mini-high.

I haven’t tried GPT-4.5, Claude Sonnet 3.7 with the thinking knob turned up, or whatever the flavor of the day is, but I imagine they all get it wrong.

And you know what? I’m kinda siding with the LLMs on that one.

First off, I’m the idiot, right? I’m the one who pressed enter after skimming over the command line, going “ah, that might work”.

Say it with me:

A computer can never be held accountable, therefore a computer must never make a management decision.

Isn't that a mood.

Internal IBM training in 1979

And second, I actually think that’s poor design in yq.

I foresee forgetting about this quirk and falling into that trap again.

However, I also recognize it’s probably too late to do anything about it: this would be a breaking change. Oh well.

While I’m blaming tools instead of myself, let me keep on critiquing!

From bad to worse

When I saw that the reconciler was busy creating duplicate copies of everything and running into a lot of conflicts…

And I realized that all the duplicates came from this single file…

…I removed the duplicates.

Cool bear

OH BOY.

But what do you think the k3s reconciler thought of that?

Cool bear

Well, if it only reads one file… it would… see that a lot of resources are gone and get busy deleting?

Quite! Very fast, in fact.

Again, what caught my eyes in the logs were some errors when it failing to delete some namespaces.

Which got me thinking why is it deleting namespaces, and before updown.io could notify me, I knew I was in for a fun day.

At this point, most of my Kubernetes resources were gone.

Deleting deployments deleted pods, which in turn stopped and deleted containers.

Certificates and their secrets were deleted as well, resulting in traefik serving its default certificate.

The services and ingress routes went too, so traefik started returned 404 for any domain and any route. Eventually, traefik itself was gone!

I looked for ways to convince K3S to stop deleting what little was left, and quickly decided it was probably not worth it.

I am prepared for disaster recovery, I have backups of everything, and I had deployments I was barely using anyway — I decided it was time for, uhh, an earty spring cleaning.

I decided I would wipe the machine hosting the k3s server and start fresh — after all, how hard could it be?

No matter how prepared…

What follows is a quick account of putting everything back together and the things I learned discovered along the way.

rsync or swim

The first thing I did was grab everything out of /var/lib/rancher/k3s with rsync, just for safety.

rsync -avz --progress \ root@brat:/var/lib/rancher/k3s \ ./var-lib-rancher-k3s

I quickly discovered two things:

  • 600K small files were hiding under agent/containerd
  • rsync is single-threaded

And rediscovered that, if you have a fast link, compression (at least whatever zlib implementation and setting rsync uses by default) is just holding you back.

I was impatient and eventually switched to:

rsync -av --progress \ --exclude 'agent/containerd' \ root@brat:/var/lib/rancher/k3s \ ./var-lib-rancher-k3s

Which was done fairly quickly, despite being 30GB.

Amos

I’m really glad this happened at home, where I keep my comfort 2.5Gbps internet connection.

Debian 12 reinstall

After taking a quick look around the S3 console to make sure the backups I was counting on did, in fact, exist, I headed over to Hetzner Robot to enable:

The Hetzner robot rescue tab, letting you pick between Linux and Linux (old), a public key and a keyboard layout.

Believe it or not, I moved away from ed25519 keys because of a bug in some Go software.

The bad blood runs deep.

Hetzner robot

You then reboot, wait a bit, ssh back into it, run:

installimage

And then you see a nice ncurses TUI (terminal user interface)!

Hetzner Online GmBH - installimage. Your server will be installed now, this will take some minutes You can abort at any time with CTRL+C ... Reading configuration Loading image file variables etc.
Lowendbox Blog (because I forgot to take a screenshot myself)

…unless you’ve switched to Ghostty, in which case it just throws you directly into nano because ghostty has its own TERM value of xterm-ghostty and a lot of places don’t have the requisite terminfo. This is easily fixed by:

infocmp -x | ssh root@brat -- tic -x -

I briefly researched “is any of the RHEL derivatives a good idea for this?” and decided today was exciting enough already.

I have feelings about Debian package management, but really, the applications I ran are all containerized, so the host matters only somewhat.

Amos

In fact, Debian has a needlessly large attack surface and there are more suitable distros for that, but, shrug.

I like Hetzner’s installimage thingy. It’s small, it’ll set up RAID 1 for me, it’s okay in my book.

A new k3s cluster

Altough it’s entirely possible to export an etcd snapshot to S3 and restore it later, I was going for a “fresh install”, so I decided not to do that.

My poor edge nodes were crying for mommy at this point, so I put them to sleep for the time being by running the very aptly named ansible playbook:

./ansible-playbook playbooks/are-you-sure/k3s-nuke.yaml -l cloud

I could have easily torn down all the VMs altogether and created them anew through OpenTofu, but I didn’t need to mess with that part, and besides, I recently:

…so I have seen enough HCL for the calendar year, methinks.

Setting up k3s on the leader node is as easy as:

./ansible-playbook playbooks/k3s.yaml -l brat

Just enough Ansible to be dangerous

Ansible is basically “Python reads YAML to run sudo on machines over SSH”.

RedHat maintains it. I think.

I also think nobody really likes it? But it’s been around long enough that people have made it do pretty much everything you would like to do.

In my case, that’s setting up k3s.

The playbook is defined as:

--- - hosts: k3s_cluster gather_facts: true roles: - role: k3s-prereqs - role: k3s-download - role: ssh - hosts: k3s_leader roles: - role: k3s/leader - hosts: k3s_node roles: - role: k3s/node

k3s_leader and k3s_node are groups in the inventory, which is generated automatically from OpenTofu state by a Rust script — hence the ./ansible-playbook wrapper:

# in `./ansible-playbook` #!/bin/bash -eux source ./config/ansible-env ansible-playbook -u root -i ./config/inventory.yaml "$@"

Where ansible-env is:

./gen-inventory.rs export ANSIBLE_CONFIG=$PWD/config/ansible.cfg

And gen-inventory.rs looks something like:

#!/usr/bin/env -S cargo +nightly -Zscript --quiet --- [package] edition = "2021" [dependencies] serde = { version = "1.0", features = ["derive"] } serde_json = { version = "1.0", features = ["preserve_order"] } indexmap = { version = "2.2.6", features = ["serde"] } --- use serde::Deserialize; use serde_json::Map; use indexmap::IndexMap; use std::net::{Ipv4Addr, Ipv6Addr}; use std::fs::File; #[derive(Deserialize, Debug)] struct TfState { outputs: Outputs, } #[derive(Deserialize, Debug)] struct Outputs { all_servers: AllServers, } #[derive(Deserialize, Debug)] struct AllServers { value: IndexMap<String, Server>, } #[derive(Deserialize, Debug)] #[allow(unused)] #[serde(deny_unknown_fields)] struct Server { aws_region: String, ipv4: Ipv4Addr, ipv6: Ipv6Addr, location: Option<String>, // Since this is not present in all nodes, make it optional provider: String, node_type: String, ready: bool, latlong: (f64, f64), } fn main() { let tfstate: TfState = serde_json::from_reader(File::open("terraform.tfstate").unwrap()).unwrap(); let all_servers = &tfstate.outputs.all_servers.value; let servers_of_type = |node_type: String| { all_servers.iter().filter(move |(_k, v)| v.node_type == node_type) }; let mut children = serde_json::json!({ "k3s_leader": { "children": { "dedicated": {} } }, "k3s_node": { "children": { "cloud": {} } }, "k3s_cluster": { "children": { "k3s_leader": {}, "k3s_node": {} } }, "dedicated": { "children": (servers_of_type("dedicated".into()).map(|(name, _)| (name.clone(), Map::default())).collect::<IndexMap<_, _>>()) }, "cloud": { "children": (servers_of_type("cloud".into()).map(|(name, _)| (name.clone(), Map::default())).collect::<IndexMap<_, _>>()) } }); { // add individual nodes let children = children.as_object_mut().unwrap(); for (k, v) in all_servers.iter() { children.insert(k.clone(), serde_json::json!({ "hosts": { format!("{}-node", k): { "ansible_host": v.ipv4.to_string(), "ipv6": v.ipv6.to_string(), "region": v.aws_region, "node_type": v.node_type, "node_name": k, "provider": v.provider, "ansible_user": "root", "ansible_become": false, "latlong": v.latlong, "ansible_python_interpreter": "/usr/bin/python3.11", } } })); } } let inventory = serde_json::json!({ "all": { "children": children } }); let output = serde_json::to_string_pretty(&inventory).unwrap(); let out_path = "config/inventory.yaml"; std::fs::write(out_path, output).unwrap(); println!("Wrote inventory to {out_path}") }

This uses the unstable -Zscript cargo flag, and I’m honestly thrilled about it.

Cool bear

Weird syntax for a python script but okay.

I won’t bore you with the detail of the k3s-leader and k3s-node roles (they are adapted from k3s-ansible, if memory serves), but I will point out, because this fact never ceases to amuse me, that Ansible has a whopping 22 levels of variable precedence.

CA troubles

After setting up both the leader and the nodes this way, the nodes kept complaining:

Mar 18 11:25:40 hawk k3s[174830]: time="2025-03-18T11:25:40Z" level=error msg="token CA hash does not match the Cluster CA certificate hash: de13... != d262..."

A k3s server has its own certificate authority — when it bootstraps, it generates a keypair for it, and, well, the new install had a different cert than the old.

…but where was the old cert’s CA hash / token specified anyway?

In an ansible variable somewhere. Apparently I didn’t have enough faith to figure out how to pull out /var/lib/rancher/k3s/server/node-token from the leader and feed it to the nodes, so I just copied it manually into an ansible variable.

Amos

That variable is committed to the infra repo but encrypted via git-crypt, a compromise I’m happy with.

However, by mistake, the k3s-leader role also installed that file (the old value), so reading the node-token file fresh off the server didn’t help.

GPT-4o did though, letting me know that the structure of node-token was:

K10<token-ca-hash>::server:<random-token>

And that I just needed to replace what was between the K10 and the :: with the failed assertion value. Same way I write AUR packages and Homebrew formulas 😬.

Amos

I actually found out after the fact that GPT-4o explained that to me.

At the time, I figured it out by looking at the hashes — skipping over “K10” took some focus, but I did it. Please clap.

This took an embarassingly long time, but eventually, I had the nodes connected to the central k3s server.

Bringing back the essentials

Traefik v3

After that, I took some time to bring services back, starting from the most essential one: traefik.

Cool bear

Wait, doesn’t-

Yes, k3s does come with Traefik, but they ship v2. I want non-experimental HTTP/3 support, so I disable their traefik version, that’s what that line in deploy-manifests did:

touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip

And I add this manifest to grab traefik v3:

--- apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: traefik namespace: traefik spec: repo: https://traefik.github.io/charts chart: traefik version: 34.4.1 valuesContent: | image: repository: "traefik" tag: "v3.3.4" deployment: kind: DaemonSet logs: general: level: "INFO" hostNetwork: true # ✂️

Cert-manager

cert-manager lets you provision TLS certificates through Let’s Encrypt (among others), again, it’s just one HelmChart resource away:

--- apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: cert-manager namespace: kube-system spec: repo: https://charts.jetstack.io chart: cert-manager version: 1.17.1 valuesContent: | crds: enabled: true config: apiVersion: controller.config.cert-manager.io/v1alpha1 kind: ControllerConfiguration

Minio

Sometimes you want object storage but you don’t need S3’s durability guarantee.

My dedicated server comes with 2x512GB SSD storage, it’d be a shame not to use it:

--- apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: minio namespace: minio spec: repo: https://charts.bitnami.com/bitnami chart: minio version: 12.13.1 valuesContent: | auth: existingSecret: minio nodeSelector: kubernetes.io/hostname: brat persistence: enabled: true size: 250Gi

k8up

k8up allows backing up folders using restic.

Again, there’s a helm chart we can use to enjoy someone else’s default configuration:

--- apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: k8up namespace: k8up spec: repo: https://k8up-io.github.io/k8up chart: k8up version: 4.8.4 valuesContent: | replicaCount: 1 nodeSelector: kubernetes.io/hostname: brat k8up: enableLeaderElection: false metrics: serviceMonitor: enabled: true additionalLabels: release: kube-prometheus-stack prometheusRule: enabled: true additionalLabels: release: kube-prometheus-stack

Reflector

Cool bear

I’ve noticed you deploy various services to different namespaces, any reason?

Yes! I like things to be neatly separated, although my understanding is that at my scale it’s kinda ridiculous to bother.

Personally I like to be able to make a mess in one namespace, and delete it forcefully after I’m done, making sure that every resource in that namespace is gone for good.

Which causes one notable problem: you can’t read secrets from another namespace.

Hence, reflector:

--- apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: reflector namespace: kube-system spec: repo: https://emberstack.github.io/helm-charts chart: reflector version: 9.0.322

Now, when defining secrets, you can see in which namespaces they’re “reflected”:

--- kind: Secret apiVersion: v1 metadata: name: s3-credentials namespace: default annotations: reflector.v1.k8s.emberstack.com/reflection-allowed: "true" reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "umami,forgejo,k8up" reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true" reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "umami,forgejo,k8up" type: Opaque stringData: ACCESS_KEY: REDACTED SECRET_KEY: REDACTED
A screenshot of the k9s TUI tool. It shows 4 instances of the s3-credentials secret, in namespaces default, forgejo, k8up, and umami. The secret is of type opaque and it has 2 data.

No, seriously, k9s is really cool.

k9s

CloudNativePG

Running a database in Kubernetes is annoying because… databases are stateful. Very much so. Of all the Postgres controllers I’ve tried, cnpg is the one that’s gotten the least in my way.

It has, of course, a helm chart, and I want the controller to run on my dedicated node (brat, if you hadn’t picked up by now):

--- apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: cnpg namespace: cnpg spec: repo: https://cloudnative-pg.io/charts/ chart: cloudnative-pg version: 0.23.2 valuesContent: | nodeSelector: kubernetes.io/hostname: brat

Bringing forgejo back

Forgejo is annoying, because it wants object storage (which lives in actual Amazon S3, since, I want it to be persistent), and it wants local storage, for git repositories and whatnot.

Postgres

First off, I needed to restore the Postgres database from backup — easy peasy:

--- kind: Cluster apiVersion: postgresql.cnpg.io/v1 metadata: name: forgejo-db namespace: forgejo labels: cnpg.io/reload: "true" spec: instances: 1 imageName: ghcr.io/cloudnative-pg/postgresql:16 # Specify PostgreSQL 16 image primaryUpdateStrategy: unsupervised affinity: nodeSelector: kubernetes.io/hostname: brat storage: size: 10Gi pvcTemplate: metadata: annotations: # do not back up the db volume with k8up (we back it up # with barman, see below) k8up.io/backup: "false" nodeSelector: kubernetes.io/hostname: brat bootstrap: # we're doing disaster recovery aw yiss recovery: source: cluster-backup externalClusters: - name: cluster-backup barmanObjectStore: destinationPath: "s3://bearcove-cnpg-backups/forgejo/brat-1/" serverName: forgejo-db wal: compression: snappy data: compression: snappy s3Credentials: accessKeyId: name: s3-credentials key: ACCESS_KEY secretAccessKey: name: s3-credentials key: SECRET_KEY backup: barmanObjectStore: # pro-tip: this needs to be a different path, otherwise it's very unhappy at you. destinationPath: "s3://bearcove-cnpg-backups/forgejo/brat-2025-03-18/" wal: compression: snappy data: compression: snappy s3Credentials: accessKeyId: name: s3-credentials key: ACCESS_KEY secretAccessKey: name: s3-credentials key: SECRET_KEY retentionPolicy: "30d"
Amos

I too recoil in horror too when looking at this amount of YAML but… it does the trick!

Amos

It’s also really annoying to read the cnpg 1.25 docs — I’m sure MkDocs is much better than nothing, I just think we needn’t stop there.

Anyway — this queues up a restore job (you can view those in k9s with :jobs), you can keep an eye on its logs from k9s as well, eventually, the data’s back.

Persistent volume restore

This one, believe it or not, was a lot more trouble?

Kubernetes really wants you to think of storage, and compute, and memory, as fungible resources. Say how much you want, not where.

Well, you can say where, but it’s annoying.

For example, k8s provides a local-path storage class:

--- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: forgejo namespace: forgejo annotations: k8up.io/backup: "true" spec: accessModes: - ReadWriteOnce storageClassName: local-path resources: requests: storage: 20Gi

And what it does is provision a volume under /var/lib/rancher/k3s/storage — as a regular folder.

root@brat /var/lib/rancher/k3s/storage # ls -lhA total 20K drwxrwsrwx 3 root 1001 4.0K Mar 18 14:08 pvc-02c60622-d5c9-4dc5-8ea8-22a51eee0b83_minio_minio drwxrwsrwx 3 root tape 4.0K Mar 28 11:33 pvc-1d42f27a-b955-49ec-9f10-42ed222602f4_umami_umami-db-pg17-1 drwxrwsrwx 3 root tape 4.0K Mar 18 14:32 pvc-2a1953cf-8867-4980-8b59-56ff2ed6411c_forgejo_forgejo-db-1 drwxrwsrwx 4 root amos 4.0K Mar 18 15:31 pvc-a79bde12-25c0-40b9-98c3-ab16a6d12afa_forgejo_forgejo drwxrwsrwx 3 root tape 4.0K Mar 28 11:49 pvc-bfa94fbd-4105-4860-aa19-06b95d7dd573_forgejo_forgejo-db-pg17-1

This is “odd” merely because the common case is to do “managed” k8s — you pay someone to run your cluster. You ask for compute and they bill you very precisely for what you use. And they have volumes, of course — so when you ask for a volume, it’s created using their volume provider.

But when you run k3s on a bunch of VMs, you don’t have that. And you can’t attach, for example, a DigitalOcean volume to a Hetzner VM — doesn’t work.

So either you roll with something like longhorn or ceph (probably via rook), for which you really honestly want at least 3 (preferably 5) dedicated server that have similar specs, in the same datacenter, etc. — or, you just use local storage because, well, it suits your usecase perfectly.

Problem is — the persistent volume claim doesn’t get “provisioned” (not sure what the proper term here is) until it’s actually needed by something — like a pod.

And if the first pod that needs it is a k8up restore pod (which runs restic), and that restore pod just happens to run on one of your small edge nodes rather than your big and round dedicated server, then…

Cool bear

Oh gosh — the volume ends up in the wrong place?

Correctamundo! Now you see why I’m used to deleting and re-creating stuff.

It’s also why, when I started redeploying everything, I started splitting it into files starting with three digits, to know in which order things should happen:

infra/manifests/forgejo on  main [$] l Permissions Size User Date Modified Name .rw-r--r--@ 115 amos 18 Mar 18:27 000-forgejo-namespace.yaml .rw-r--r--@ 4.8k amos 28 Mar 13:27 001-forgejo-config-secret.yaml .rw-r--r--@ 4.2k amos 28 Mar 16:40 100-forgejo-db-cluster.yaml .rw-r--r--@ 643 amos 28 Mar 13:27 101-forgejo-db-backups.yaml .rw-r--r--@ 927 amos 18 Mar 18:27 200-forgejo-persistent-volumes.yaml .rw-r--r--@ 161 amos 18 Mar 18:27 201-forgejo-backup-secrets.yaml .rw-r--r--@ 1.4k amos 18 Mar 18:27 202-forgejo-backups.yaml .rw-r--r--@ 1.2k amos 18 Mar 18:27 203-forgejo-backup-schedule.yaml .rw-r--r--@ 1.5k amos 23 Mar 16:57 300-forgejo-deployment.yaml .rw-r--r--@ 1.2k amos 23 Mar 16:57 400-forgejo-ingress.yaml .rw-r--r-- 33 amos 18 Mar 17:14 README.md

k3s doesn’t actually interpret that ordering, I simply staged them under old-manifests/ and manually copied them into manifests/, applying them one by one.

Cool bear

Okay, but — what about the volume on the wrong node?

Right, well! We just force the volume to be provisioned where we want, by creating a dummy pod!

--- ## This forces the volume to be created on node 'brat' apiVersion: v1 kind: Pod metadata: name: forgejo-ls2 namespace: forgejo spec: restartPolicy: Never securityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 containers: - name: ls-container image: busybox command: ["ls", "-lhA", "/workdir"] volumeMounts: - name: forgejo-workdir mountPath: /workdir nodeSelector: kubernetes.io/hostname: brat volumes: - name: forgejo-workdir persistentVolumeClaim: claimName: forgejo

Is this the stupidest shit ever? Probably. Does it work? Hell yeah.

As for actually restoring from k8up, that was hard too.

For some reason, I use a rootless image of forgejo, and that means that, well, inside the container, we’re not root:

~ kubectl exec forgejo-f9dd988c4-9svkx -n forgejo -it -- /bin/bash forgejo-f9dd988c4-9svkx:/var/lib/gitea$ whoami git forgejo-f9dd988c4-9svkx:/var/lib/gitea$ id uid=1000(git) gid=1000(git) groups=1000(git) forgejo-f9dd988c4-9svkx:/var/lib/gitea$ exit

We’re user git, with UID 1000.

So, if we do a normal k8up restore, we’ll end up with a bunch of files owned by root, which git won’t be able to read, and that’s no good!

Now, you are able to specify a podSecurityContext so that the restore pod (the one that runs restic) also runs as the same user:

apiVersion: k8up.io/v1 kind: Restore # ✂️ spec: podSecurityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 fsGroupChangePolicy: "OnRootMismatch" # ✂️

But then — amazingly — it fails to write to its own cache dir!!!

The way restic works, from what I can gather, is that it starts by making a whole bunch of S3 GET commands to figure out which backups were made, what do they contain, etc.

It stores all those in its local cache, on disk, because it refers back to that data a lot during the restore process.

If the cache isn’t writable, it still works, it’s just… very, very, VERY slow.

Cool bear

And I’m assuming, costly?

I think I would’ve died of old age first. Anyway, if you set RESTIC_CACHE_DIR to a path you can write from a non-root user, then it works fine. Oh, also if you give it a bunch more CPU and memory than the default:

--- ############################################################################## # k8up backup/restore setup ############################################################################## apiVersion: v1 kind: ConfigMap metadata: name: restic-vars namespace: forgejo data: RESTIC_CACHE_DIR: /tmp/restic-cache --- apiVersion: k8up.io/v1 kind: Restore metadata: name: restore-workdir-2024-03-18-b namespace: forgejo spec: snapshot: e9e9a75d # last `/data/forgejo` backup podSecurityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 fsGroupChangePolicy: "OnRootMismatch" restoreMethod: folder: # restore to PVC forgejo claimName: forgejo backend: repoPasswordSecretRef: name: backup-repo key: password resources: requests: cpu: 10 memory: 1Gi limits: cpu: 10 memory: 4Gi envFrom: - configMapRef: name: restic-vars s3: bucket: bearcove-k8up-backups endpoint: https://s3.eu-central-1.amazonaws.com accessKeyIDSecretRef: name: s3-credentials key: ACCESS_KEY secretAccessKeySecretRef: name: s3-credentials key: SECRET_KEY

Deployment, service, ingress

I’m not going to go into details for all these, because we’ve spent our quota of YAML for the year and it’s only March.

I guess I can show one combo for forgejo, since it’s not that complicated.

--- kind: Service apiVersion: v1 metadata: name: forgejo namespace: forgejo spec: selector: app: forgejo ipFamilyPolicy: RequireDualStack ports: - protocol: TCP port: 80 targetPort: http

This is the service: it makes it so anyone in the same namespace is able to connect to forgejo.forgejo.svc.cluster.local:80.

--- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: tls namespace: forgejo spec: secretName: tls-secret issuerRef: name: letsencrypt-prod kind: ClusterIssuer dnsNames: [redacted.example.org]

This requests a TLS certificate for my forgejo instance’s domain

--- apiVersion: traefik.io/v1alpha1 kind: Middleware metadata: name: request-body-limit namespace: forgejo spec: buffering: maxRequestBodyBytes: 1073741824 # 1 GiB memRequestBodyBytes: 67108864 # 64 MiB

This raises the max POST limit, since apparently that’s an issue when uploading “generic” packages to a forgejo registry 🤷

--- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: forgejo namespace: forgejo spec: entryPoints: - websecure routes: - match: Host(`redacted.example.org`) kind: Rule services: - name: forgejo port: 80 middlewares: - name: request-body-limit namespace: forgejo tls: secretName: tls-secret

And this is the ingress route!

Cool bear

Mhh this feels a little verbose, and a little ingress-specific, even for k8s?

Yeah — you can do shorter with labels (but I could never figure it out), and most people actually use nginx for ingress, but of C and Go, I’ll take the lesser evil.

Bringing home back

The software that powers my website, currently named home, is also deployed through k3s.

It doesn’t have a persistent database: assets (images, videos, etc.) are in object storage, there’s a central “mom” service that has one sqlite database per tenant — that database remembers which assets were already uploaded to object storage, which revision is the latest, and the list of sponsors fetched from GitHub/Patreon.

That’s about it:

################################################################################ # 🐻 MOM DEPLOYMENT ################################################################################ --- kind: Deployment apiVersion: apps/v1 metadata: name: mom namespace: home labels: group: home app: mom spec: replicas: 1 selector: matchLabels: app: mom template: metadata: labels: group: home app: mom spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-type operator: In values: - dedicated imagePullSecrets: - name: forgejo-docker-pull-secrets containers: - name: mom image: redacted.example.org/bearcove/home:32.2.4 command: ["home", "mom"] workingDir: /var/lib/home envFrom: - secretRef: name: home-vars - secretRef: name: home-conf env: - name: RUST_LOG value: "info" - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName ports: - containerPort: 1118 name: http readinessProbe: httpGet: path: /health port: http initialDelaySeconds: 1 periodSeconds: 1 resources: requests: memory: "400Mi" cpu: "0.5" limits: memory: "16000Mi" cpu: "20" volumeMounts: - name: mom mountPath: /var/lib/home - name: metadata mountPath: /metadata readOnly: true volumes: - name: mom persistentVolumeClaim: claimName: mom - name: metadata hostPath: path: /metadata type: DirectoryOrCreate

As for edge nodes, they’re named “cubs”, and the deployment manifest is fun — it deploys one pod per edge node, while allowing from some pods to be down, and, while that’s the case, routing to the closest node.

################################################################################ # 🧸 CUB DEPLOYMENT ################################################################################ --- kind: Deployment apiVersion: apps/v1 metadata: name: cub namespace: home labels: group: home app: cub spec: replicas: 6 # 1 dedicated + 5 edge nodes topologySpreadConstraints: # Define topology spread constraints for the deployment - maxSkew: 1 # defines the maximum skew between the number of pods in different topology domains # Use zone as the topology key topologyKey: topology.kubernetes.io/zone # Allow scheduling even if constraints are not met whenUnsatisfiable: ScheduleAnyway # Specify the label selector for the pods labelSelector: matchLabels: # Match pods with the label app: cub app: cub affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: node-type operator: In values: - cloud topologyKey: "kubernetes.io/hostname" selector: matchLabels: app: cub template: metadata: labels: app: cub group: home spec: imagePullSecrets: - name: forgejo-docker-pull-secrets containers: - name: cub image: redacted.example.org/bearcove/home:32.2.4 command: ["home", "serve"] envFrom: - secretRef: name: home-vars - secretRef: name: home-conf env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName ports: - containerPort: 1111 name: http # define readiness probe: must serve HTTP on port 1111 readinessProbe: httpGet: path: / port: http httpHeaders: - name: x-forwarded-host value: fasterthanli.me initialDelaySeconds: 1 periodSeconds: 1 resources: requests: memory: "400Mi" cpu: "0.5" limits: memory: "1200Mi" cpu: "12" volumeMounts: - name: cub mountPath: /var/lib/home - name: metadata mountPath: /metadata readOnly: true volumes: - name: cub persistentVolumeClaim: claimName: cub - name: metadata hostPath: path: /metadata type: DirectoryOrCreate

Zero-downtime deploys

This is nice, because it allows zero-downtime deploys.

When everything is fine, here’s what the home namespace looks like:

k9s screenshot, showing cub with 6/6 pods ready, and mom with 1/1 pods ready.

All the cub pods are healthy:

6 pods are listed now, they each have 1/1 containers ready, they have status running, 0 restarts, use between 30 and 49 CPU and between 360 and 522 Mem. They have internal IPs and are running on nodes named marl, flam, brat, kaya, heim, hawk.

Now let’s say I cause trouble on purpose, by pushing a change that crashes home on purpose:

home on  main via 🦀 v1.85.1 g show commit 984d0f5b7fa5cff9172ea1bc16091b1fbc6bed6a (HEAD -> main, origin/main, origin/HEAD) Author: Amos Wenger <amos@bearcove.eu> Date: Fri Mar 28 17:35:46 2025 +0100 crash on purpose diff --git a/crates/home/src/main.rs b/crates/home/src/main.rs index 119e3e91..959fa414 100644 --- a/crates/home/src/main.rs +++ b/crates/home/src/main.rs @@ -30,6 +30,8 @@ async fn real_main() -> eyre::Result<()> { errhandling::load().install(); tracingsub::load().install(); + panic!("woopsie doopsie I'm doing a demo"); + let args = clap::load().parse(); let res = match args.sub {

Ah, uh, no:

forgejo actions screenshot, showing that my CI pipeline is running --version on the binary, so it failed the job.

Okay, FINE, let’s push a change that… only crashes on serve, and only if we’re in production.

home on  main [!] via 🦀 v1.85.1 gwd crates/mod-cub/src/lib.rs --- Rust 8 #[dylo::export] 9 impl Mod for ModImpl { 10 fn serve(&self, config: Config) -> BoxFuture<'static, Result<()>> { 11 Box::pin(async { 12 if std::env::var("KUBERNETES_SERVICE_HOST").is_ok() { 13 panic!("is this a container? LET ME OUT! LEMMEOUT"); 14 } 15 16 impls::serve(config) 17 .await 18 .map_err(|e| noteyre::eyre!("{}", e))

This time the pipelines pass, I get a chance to show some of the automation I’ve been doing recently:

home on  main via 🦀 v1.85.1 bd bump Fetched all tags from remote. Latest tag: v32.2.4 Choose version bump type: 1. Patch (32.2.5) 2. Minor (32.3.0) 3. Major (33.0.0) 1 Creating new tag: v32.2.5 Total 0 (delta 0), reused 0 (delta 0), pack-reused 0 To https://redacted.example.org/bearcove/home * [new tag] v32.2.5 -> v32.2.5 Tag v32.2.5 created and pushed successfully
Cool bear

bd stands for beardist, by the way!

A screenshot of beards running in CI, showing various bits of info about the build system, environment, etc. You can see that the trigger-formula-update job was run this time around.

Those -build jobs pushed a generic package:

The generic package page for home, which contains curl -OJ instructions generated by forgejo. You can see it's built for x86_64-unknown-linux-gnu and for aarch64-apple-darwin

…and some homebrew tap was updated automatically (that’s what the trigger-formula-update job was about)

Along with a container image:

The home container package, containing `docker pull` instructions, a sha256 digest. it's for the linux/amd64 platform and is 427 Mib total.

That one isn’t deployed automatically — I like to be the one to give the greenlight, also via beardist.

The k8s subcommand first identifies where the given image is referenced:

infra on  main [$] via 🦀 v1.85.0 bd k8s bearcove/home Searching for manifests in: manifests YAML files containing 'bearcove/home' are: File: manifests/home/301-home-cub-deployment.yaml Version 32.2.4 at positions 1761 to 1808 Context: >>> - name: cub >>> image: redacted.example.org/bearcove/home:32.2.4 >>> command: ["home", "serve"] >>> envFrom: >>> - secretRef: File: manifests/home/300-home-mom-deployment.yaml Version 32.2.4 at positions 1074 to 1121 Context: >>> - name: mom >>> image: redacted.example.org/bearcove/home:32.2.4 >>> command: ["home", "mom"] >>> workingDir: /var/lib/home >>> envFrom:

And then, it repeatedly polls the forgejo instance, waiting for a ‘different’ version to pop up:

Initializing Forgejo client... Checking for new versions... Fetching latest version for package 'home' from 'https://redacted.example.org/api/v1/packages/bearcove' Request completed in 385ms with status 200 OK Received 13 packages in response Filtered to 12 matching packages Found 11 valid versions Latest version found: 32.2.5 New version detected: 32.2.5

At which point, it updates the manifests:

Updating manifests... Updated manifests/home/301-home-cub-deployment.yaml Updated manifests/home/300-home-mom-deployment.yaml Staging changes... Showing staged changes: manifests/home/300-home-mom-deployment.yaml --- YAML 33 imagePullSecrets: 34 - name: forgejo-docker-pull-secrets 35 containers: 36 - name: mom 37 image: redacted.example.org/bearcove/home:32.2.4 37 image: redacted.example.org/bearcove/home:32.2.5 38 command: ["home", "mom"] 39 workingDir: /var/lib/home 40 envFrom: 41 - secretRef: manifests/home/301-home-cub-deployment.yaml --- YAML 48 imagePullSecrets: 49 - name: forgejo-docker-pull-secrets 50 containers: 51 - name: cub 52 image: redacted.example.org/bearcove/home:32.2.4 52 image: redacted.example.org/bearcove/home:32.2.5 53 command: ["home", "serve"] 54 envFrom: 55 - secretRef: 56 name: home-vars

Commits and pushes:

Committing changes... > bearcove-infra@1.0.0 lint-staged > lint-staged No staged files match any configured task. [main d37d8eb] bump bearcove/home to 32.2.5 2 files changed, 2 insertions(+), 2 deletions(-) Pushing changes... Enumerating objects: 10, done. Counting objects: 100% (10/10), done. Delta compression using up to 8 threads Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 1.17 KiB | 1.17 MiB/s, done. Total 6 (delta 5), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (5/5), completed with 4 local objects. To https://github.com/bearcove/infra.git 89bddc6..d37d8eb main -> main

And calls ./deploy-manifests, which learned a couple new tricks since last time:

Deploying manifests... 🔍 Performing dry run... Source: ./manifests/ Destination: root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/ ================================================== 🚨 REVIEW THIS CAREFULLY 🚨 ================================================== The following changes will be made: ================================================== <fc.T.... home/300-home-mom-deployment.yaml <fc.T.... home/301-home-cub-deployment.yaml ================================================== Please review the above changes carefully before proceeding. ================================================== 🚨 Warning: This will perform the changes above. Are you sure you want to continue? (y/n) y 🔧 Performing operations... 📁 Creating skip file: /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip 🔄 Syncing files... 📜 Viewing logs... ✂️ building file list ... 60 files to consider home/300-home-mom-deployment.yaml 2.92K 100% 2.12MB/s 0:00:00 (xfer#1, to-check=29/60) home/301-home-cub-deployment.yaml 3.71K 100% 3.53MB/s 0:00:00 (xfer#2, to-check=28/60) sent 4.41K bytes received 130 bytes 3.02K bytes/sec total size is 1.07M speedup is 236.91

By “learned a couple new tricks”, I mostly mean “I figured out the right set of flags to tell rsync to show what it’s actually going to do”:

# in `deploy-manifests` #!/bin/bash -euo pipefail SRC="./manifests/" DST="root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/" # Define rsync flags RSYNC_FLAGS=(--recursive --delete --checksum --human-readable --progress --include='*/' --include='*.yaml' --exclude='*') # Perform a dry run of rsync to show what would happen echo "🔍 Performing dry run..." printf "Source: \033[33m%s\033[0m\n" "$SRC" printf "Destination: \033[33m%s\033[0m\n" "$DST" printf "\033[2m==================================================\033[0m\n" printf "\033[2m🚨 REVIEW THIS CAREFULLY 🚨\033[0m\n" printf "\033[2m==================================================\033[0m\n" printf "\033[2mThe following changes will be made:\033[0m\n" printf "\033[2m==================================================\033[0m\n" rsync "${RSYNC_FLAGS[@]}" --dry-run --itemize-changes "$SRC" "$DST" printf "\033[2m==================================================\033[0m\n" printf "\033[2mPlease review the above changes carefully before proceeding.\033[0m\n" printf "\033[2m==================================================\033[0m\n" # Ask for consent before continuing printf "\n🚨 \033[1;31mWarning:\033[0m This will perform the changes above.\n" read -p "Are you sure you want to continue? (y/n) " -n 1 -r echo if [[ ! $REPLY =~ ^[Yy]$ ]] then printf "❌ \033[1;31mOperation cancelled.\033[0m\n" exit 1 fi # The actual operations echo "🔧 Performing operations..." printf "📁 Creating skip file: \033[33m/var/lib/rancher/k3s/server/manifests/traefik.yaml.skip\033[0m\n" ssh root@brat "touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip" & echo "🔄 Syncing files..." rsync "${RSYNC_FLAGS[@]}" "$SRC" "$DST" & echo "📜 Viewing logs..." ssh -t root@brat "journalctl -fxu k3s | ccze -A"

Anyway, after that’s done, and after the k3s reconciler wakes up, we’re looking at this:

The cub deployment now has 5/8 pods ready.

More precisely (one Enter key later), this:

We have 5 healthy pods, the other 3 have 0/1 ready containers, and they're in status CrashLoopBackOff. They have 6 restarts already.

Two more Enter show us the logs:

The logs show: reading config from HOME_JSON_BASE64 environmental variable, then decoded 5358 bytes, 138 lines. Reading config from env (HOMECONF_ prefix) and /tmp/home.json (in green), then: The application panicked (crashed).
Message: is this a container? LET ME OUT! LEMMEOUT
Location: crates/mod-cub/src/lib.rs:13
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it. Run with RUST_BACKTRACE=full to include source snippets.
stream closed EOF for home/cub-764bfbdfd-lhwg6 (cub)

But my site is still up!

A screenshot of keycdn's Performance Test. It shows TTFB varying from 47ms (in Singapore) to 75ms (most locations), to 162ms in Bangalore and 402ms in Sydney.

At this point, I can either roll back:

infra on  main [$] via 🦀 v1.85.0 g revert d37d8eb5fdf101dfb70427a845fee8924c058dad [main d9602ff] Revert "bump bearcove/home to 32.2.5" 2 files changed, 2 insertions(+), 2 deletions(-) infra on  main [$⇡] via 🦀 v1.85.0 gp Enumerating objects: 10, done. Counting objects: 100% (10/10), done. Delta compression using up to 8 threads Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 1.23 KiB | 1.23 MiB/s, done. Total 6 (delta 5), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (5/5), completed with 4 local objects. To https://github.com/bearcove/infra.git 35d0771..d9602ff main -> main infra on  main [$] via 🦀 v1.85.0 ./deploy-manifests 🔍 Performing dry run... Source: ./manifests/ Destination: root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/ ================================================== 🚨 REVIEW THIS CAREFULLY 🚨 ================================================== The following changes will be made: ================================================== <fc.T.... home/300-home-mom-deployment.yaml <fc.T.... home/301-home-cub-deployment.yaml ================================================== Please review the above changes carefully before proceeding. ================================================== 🚨 Warning: This will perform the changes above. Are you sure you want to continue? (y/n) y 🔧 Performing operations... 📁 Creating skip file: /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip 🔄 Syncing files... 📜 Viewing logs... Mar 28 18:21:43 brat k3s[3054412]: I0328 18:21:43.178939 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11" Mar 28 18:21:43 brat k3s[3054412]: E0328 18:21:43.179199 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08" Mar 28 18:21:46 brat k3s[3054412]: I0328 18:21:46.188632 3054412 range_allocator.go:247] "Successfully synced" key="brat" Mar 28 18:21:57 brat k3s[3054412]: I0328 18:21:57.179142 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11" Mar 28 18:21:57 brat k3s[3054412]: E0328 18:21:57.179508 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08" Mar 28 18:22:09 brat k3s[3054412]: I0328 18:22:09.179891 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11" Mar 28 18:22:09 brat k3s[3054412]: E0328 18:22:09.180186 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08" Mar 28 18:22:21 brat k3s[3054412]: I0328 18:22:21.178435 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11" k3siles to consider 0 files... home/300-home-mom-deployment.yaml 2.92K 100% 2.12MB/s 0:00:00 (xfer#1, to-check=29/60) home/301-home-cub-deployment.yaml 3.71K 100% 3.53MB/s 0:00:00 (xfer#2, to-check=28/60) sent 4.41K bytes received 130 bytes 3.02K bytes/sec total size is 1.07M speedup is 236.91

Or I could just push a fix, and roll forward!

Amos

Which I did, right after writing this.

Closing words

So, how did this disaster recovery go? Pretty well, all in all.

I didn’t need to tear down and rebuild from scratch my control node.

Nor did I need to reorganize all my manifests carefully. But I was already down, and it’s been a shit month, so, I decided to treat myself.

The deploy-manifests script, even in its latest incarnation, still blows.

A better version would merge all resources in a single YAML file, and compare against what the control node already has, requiring --fuck-me-up if more than 2 resources are deleted, for example.

But… that would be making a faster horse, right? I probably do want a proper continuous deployment (CD) solution. Something that can do progressive rollouts, automated rollbacks, etc.

For now, I’ll just be very, very careful, and be proud that I didn’t actually lose any data!

Amos

I was a bit scared to store repositories only in forgejo (without mirroring them to GitHub) at first, but this isn’t my first disaster and I’ve recovered pretty well each time.

Cool bear

Oh, and in the end, did you end up using taiga for Kanban?

Amos

Nope! I’m happily on Teamhood’s free tier, which does everything I need for now.

Thanks for reading this far, and I hope to talk more about beardist on the next season of the self-directed research podcast!

I’ll leave you with a somewhat popular episode of it:

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Rust generics vs Java generics

In my previous article, I said I needed to stop thinking of Rust generics as Java generics, because in Rust, generic types are erased.

Someone gently pointed out that they are also erased in Java, the difference was elsewhere. And so, let’s learn the difference together.

Java generics

I learned Java first (a long, long time ago), and their approach to generics made sense to me at the time.