Mar 28, 2025 40 min #kubernetes · #devops · #llm

Impromptu disaster recovery

Background

im-promp-tu (im-ˈpräm(p)-(ˌ)tü)

made, done, or formed on or as if on the spur of the moment: improvised

composed or uttered without previous preparation: extemporaneous

Merriam-Webster

On March 18th, 2025, I thought I would look into self-hosted project management solutions — something kanban-y, but.. better?

A screenshot of WeKan, the open-source Kanban. It looks.. from another age. — This one does not spark joy.

After discovering that Teamhood was awesome (and EU-based), but had a 3-seat minimum on their subscriptions, I resigned to reluctantly self-host something.

The teamhood kanban view. I _really_ like their interface. No mobile support though. — The teamhood kanban view. I *really* like their interface.

No mobile support though.

I’m familiar with self-hosting. Like everyone, I’ve been SSH-ing into VPSs for as long as they’ve been cheap, which is… pretty long now.

Before that, shared PHP hosts, and nowadays, I just run a small k3s cluster, on a dedicated Hetzner server in Germany, with nodes around the world (also all on Hetzner now — they have a southeast-asia region now!)

A list of available Hetzner locations as of March 2025. There's Nuremberg, Falkenstein, Helsinki, Singapore, Hillsborough, Oregon, and Ashburn, Virginia. — This is honestly all you need for most projects.

Hetzner Locations

On this cluster, we find a custom-made CDN that powers this website, along with a bunch of internal services, like a Forgejo instance.

My forgejo homepage, with the URL redacted. You can see some activity in a project called 'snugkit', 'snug', and 'fasterthanli.me'. — My instance is only available to logged in users, so I don't have issues with AI crawlers. Big up to everyone fighting the good fight out there.

After finding out that taiga has a couple of unofficial helm charts on ArtifactHub, I opened my infra repo (hosted on GitHub, not on Forgejo, to avoid cyclical dependencies) and started drafting manifests to deploy taiga.

I keep forgetting what Artifact Hub is named. There are three Helm charts for Taiga, none are official and none have been touched in the last two years. This doesn't mean they don't work — maybe the software is just very stable! — I keep forgetting what Artifact Hub is named.

There are three Helm charts for Taiga, none are official and none have been touched in the last two years.

This doesn't mean they don't work — maybe the software is just very stable!

ArtifactHub search for taiga

How I deploy stuff on k3s

The most straightforward way to deploy something to Kubernetes is to use kubectl. This works just as well on k3s, the lightweight distribution of Kubernetes I use.

As you can see, my website is back up as I’m writing this:



~
❯ k -n snug get pods                                            
NAME                   READY   STATUS    RESTARTS   AGE
cub-7c59bdf6f7-86m69   1/1     Running   0          50m
cub-7c59bdf6f7-9z9hk   1/1     Running   0          50m
cub-7c59bdf6f7-csk89   1/1     Running   0          50m
cub-7c59bdf6f7-kvxbq   1/1     Running   0          50m
cub-7c59bdf6f7-nq44t   1/1     Running   0          50m
cub-7c59bdf6f7-th9sf   1/1     Running   0          50m
mom-5ccc54d65-26bmn    1/1     Running   0          50m

Real ones have kubectl aliased to k!

I don’t like that approach, because there’s a disconnect between the YAML manifest as text files on your disk or in a repository, and what actually exists as resources in your Kubernetes server.

What if you forget to run kubectl apply on one of the files? What if you run it in a different order and some resources get overwritten?

Isn’t that where something like Spinnaker comes in?

It absolutely is. Once you’re done with continuous integration, making sure all the parts fit together, then you can do continuous delivery, which makes sure that you are deploying changes as quickly as possible in a safe way.

Speaking of CI, this is a fun watch:

A video thumbnail saying GitHub Actions Feels Bad with amos drinking coffee

This can involve rolling out changes to only a portion of nodes, monitoring some metrics to make sure that they don’t go up or don’t go down, and deciding whether to roll out the rest of the fleet or rolling back.

I have thus far resisted the temptation to resort to something like Spinnaker for my blog.

Instead, I use rsync!

#!/bin/bash -eux
ssh root@brat "touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip" &
rsync -av --delete ./manifests/ root@brat:/var/lib/rancher/k3s/server/manifests/custom/
ssh root@brat "journalctl -fxu k3s"

That’s my infra/deploy-manifests script. When the new files are written to that folder, the k3s reconciler receives notifications that files changed and applies those changes.

The state of the Kubernetes server and the state of my git repository can still drift if I’m not careful, but most of the time I remember and things are fine.

Why not resolve the problem by running that rsync from a CI job?

I would like to do it from ForgeJo actions, but then we have a chicken and egg problem: ForgeJo itself is deployed with k3s…

…right, a circular dependency like that would be bad in case of outages.

Exactly. I could do something like GitHub Actions, but apart from the fact that I really don’t like GitHub Actions, I don’t feel comfortable giving it a private key that can log into all my production servers, even on a private repository.

Are you sure you’re not just finding excuses to avoid setting it up?

Isn’t it just laziness?

Mayhaps.

The k3s reconciler

Trouble is: the k3s reconciler is much, uhhh simpler than I anticipated.

See, my resources are actually spread across multiple YAML files:



infra on  main [$] via 🦀 v1.85.0
❯ tree -ahC manifests                                             
[ 512]  manifests
├── [ 345]  README.md
├── [ 160]  cert-manager
│   ├── [ 451]  300-cert-manager-chart.yaml
│   ├── [ 374]  900-cert-manager-issuer-letsencrypt-prod.yaml
│   └── [ 139]  README.md
├── [ 160]  cloudnative-pg
│   ├── [ 109]  000-cloudnative-pg-namespace.yaml
│   ├── [ 273]  100-cloudnative-pg-chart.yaml
│   └── [ 208]  README.md
├── [ 448]  forgejo
│   ├── [ 115]  000-forgejo-namespace.yaml
│   ├── [4.6K]  001-forgejo-config-secret.yaml
│   ├── [ 174]  002-forgejo-runner-secret.yaml
✂️

This is something k3s understands: on startup, it reads and processes all .yaml files, compares the whole set against existing resources in k8s, and then does whatever CRUD actions are needed.

Is there an intermediate “here’s the plan, do you want to go ahead” step like you’d have in a tool like OpenTofu, or a dry-run option like you’d have in a tool like Ansible?

Nope!

I’d gotten used to the reconciler limitations over time.

I would routinely comment and uncomment resources to force recreation rather than a modification that often fails.

And of course, the only way to know if something failed is to look at its logs:



# journalctl -u k3s | grep -F 'custom/' | ccze -A | tail -4
Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.545897   52573 event.go:389] "Event occurred" object="kube-system/410-snug-tenants-ingress" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="AppliedManifest" message="Applied manifest at \"/var/lib/rancher/k3s/server/manifests/custom/snug/410-snug-tenants-ingress.yaml\""
Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.552720   52573 event.go:389] "Event occurred" object="kube-system/300-traefik-v3-chart" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="ApplyingManifest" message="Applying manifest at \"/var/lib/rancher/k3s/server/manifests/custom/traefik-v3/300-traefik-v3-chart.yaml\""
Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.575145   52573 event.go:389] "Event occurred" object="kube-system/300-traefik-v3-chart" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="AppliedManifest" message="Applied manifest at \"/var/lib/rancher/k3s/server/manifests/custom/traefik-v3/300-traefik-v3-chart.yaml\""

Cool Bear's hot tip

ccze is a log colorizer as old as the world itself, which is to say you can install it with apt install ccze on Debian 12 — it makes journalctl output almost bearable.

This makes this solution worse than running kubectl apply locally, because at least kubectl apply will block until the apply is done and will show you errors.



~
❯ k apply -f /etc/hosts                                            
error: error validating "/etc/hosts": error validating data: invalid object to validate; if you choose to ignore these errors, turn validation off with --validate=false

After a while, I had gotten so good at predicting what the reconciler would do that I stopped checking the logs systematically.

I’d only SSH in if things didn’t magically start working withing 30s or so.

Kubernetes controllers love to panic.

You have to go out of your way to properly validate data in Go, so some just.. don’t. It’s probably not as bad in widely used controllers, but this is a systemic issue in the Go ecosystem.

Until March 18, 2025, when things went very, very wrong.

The reformattening

I recently changed my code editor settings to indent all files with 4 spaces instead of 2. Not just Rust, but YAML too.

Upon opening existing manifest to refresh my memory, I noticed they were indented with two spaces, so I asked “Claude 3.5 Sonnet (Fast Edit)” to give me a command to reformat all of them.

The “(Fast Edit)” mode is, as far as I know, only available in the Zed code editor.

For my money, it’s the best code help from the current set of assistants because it has very limited smarts, but it is very, very fast, so I don’t actually have time to get out of flow state.

It gave me this command:

yq -i -P '.' manifests/**/*.yaml

I know, because it’s in my atuin history. atuin is awesome.

I installed yq, rightfully assuming it’s jq but for YAML.

The command looked good to me on the surface and because this is all happening on a git repository I figured that if it didn’t work, I could just revert.

I ran it, and checked git status:



infra on  main [$] via 🦀 v1.85.0
❯ yq -i -P '.' manifests/**/*.yaml                                  

infra on  main [$!] via 🦀 v1.85.0 
❯ gwS            
On branch main   
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   manifests/cert-manager/300-cert-manager-chart.yaml

no changes added to commit (use "git add" and/or "git commit -a")

I was disappointed seeing that it only worked on one file.

I briefly thought about adapting it to use a loop, but then I remembered I switched to the fish shell and I don’t remember their looping syntax off the top of my head yet.

So I committed the work so far, remembered to run ./deploy-manifests, and then ran zed manifests/**/*.yaml to open them all as tabs — I only had maybe 15, I figured I’d just repeatedly ask it to format them.

While I was doing that, I happened to have the k3s logs tailing in a window on my second screen.

Because it started scrolling furiously, I took a closer look.

It was trying (and failing, repeatedly) to create… duplicates of resources that already existed?

After frowning at the logs for a bit I found that it mentioned 300-cert-manager-chart.yaml a lot, and… wait a minute, why is it 1.75MB?

What did we learn?

We’ll talk more about the roles LLMs played in that story later.

Since this isn’t a blameless postmortem, I, cool bear, will be in charge of assigning blame.

I’m observing that:

Amos did not read the man page for yq before running the command.
Amos only looked at git status, not git diff — even a very quick look would have shown that something was seriously wrong.
Amos was under a lot of stress that day, didn’t have enough sleep, and probably should not have been touching infrastructure at all.

Further blame will be assigned later.

No thoughts head weighty

This command did not do what I thought it would:

yq -i -P '.' manifests/**/*.yaml

I’m fully bracing myself for a thousand replies saying we told you so blah blah blah — I know, I know. But let’s take the experiment to its logical conclusion and ask a bunch of LLMs to explain what’s wrong with this command.

GPT-4o

GPT-4o is the default “non-reasoning” model from OpenAI if you have a “Plus” subscription.

At the time of this writing, GPT-4.5 did technically come out, but I don’t know anybody who’s using it seriously. Many people doubt that non-reasoning models will get much smarter than GPT-40.

Something OpenAI did recently-ish is adding sources to answers

Explain what this does?

❯ yq -i -P ‘.’ manifests/**/*.yaml

This command uses yq, a command-line YAML processor, to modify YAML files in place with pretty-printing.

Breakdown:

yq - The YAML processor (like jq for JSON but for YAML).
-i - Edits the files in place (modifies the original files rather than printing output to stdout).
-P - Pretty-prints the output (ensures consistent formatting with indentation and spacing).
'.' - The yq expression. In this case, . represents the entire YAML structure, meaning it reads and rewrites the YAML file as-is without modifying the content itself.
manifests/**/*.yaml - Targets all YAML files recursively inside the manifests/ directory and its subdirectories.

Effect:

It reformats all YAML files under manifests/ with consistent indentation and spacing.
If the YAML files were messy or had inconsistent formatting, they will now be neatly formatted.
No actual content change happens unless yq has an opinionated reformatting behavior.

This is useful for enforcing consistent YAML formatting across multiple files, especially in projects with many configuration files (e.g., Kubernetes manifests).

GPT-4o didn’t see the problem.

I had to insist.

well, it actually merges all the files into the first argument.

did you know that?

You’re absolutely right! The behavior of yq can be a bit surprising when using -i (in-place editing) with multiple files.

What’s Happening?

When yq -i -P '.' manifests/**/*.yaml is executed, yq processes all matching YAML files.
However, -i makes yq write back the result to a single file—specifically, the first file in the glob expansion (manifests/**/*.yaml).
Instead of formatting each file separately, all files get merged into the first file, effectively overwriting it with a merged version of all YAML contents.

Ah, so it does sort of “understand” what the problem is! I mean, not understand, but…

Well, it then went on to say this:

How to Fix It?

If the goal is to reformat each file separately rather than merging them, use:

yq -i -P '.' manifests/**/*

This ensures yq processes files individually, preventing unwanted merging.

I didn’t want to let it get away with that, so I pressed on:

…did you just give me the same command again?

it still has the same problem.

You’re absolutely right to call that out. I should have been more careful! The issue still persists because the core problem is how yq handles multiple files when using -i.

It eventually gave me two solutions that do work:

Solution using find + xargs

find manifests -type f -name '*.yaml' -print0 | xargs -0 -I{} yq -i -P '.' {}

Solution using a simple for loop

for file in manifests/**/*.yaml; do
		yq -i -P '.' "$file"
done

Before we close the LLM parenthesis, let’s check what some other models would say.

Claude 3.7 Sonnet

This command uses the yq tool to process YAML files. Let me break it down:

yq -i -P '.' manifests/**/*.yaml does the following:

yq is a command-line tool for processing YAML files (similar to how jq processes JSON)
The flags:
- -i stands for “in-place” - it modifies the files directly instead of printing to standard output
- -P stands for “pretty-print” - it formats the YAML with proper indentation and structure
'.' is the filter expression, which means “the entire document” (essentially, process the whole document without changing its content)
manifests/**/*.yaml is a glob pattern that matches:
- All files with .yaml extension
- Inside any subdirectory (** means “any depth of directories”)
- Under the manifests directory

In summary, this command reformats and pretty-prints all YAML files within the manifests directory and its subdirectories, updating the files directly with consistent formatting. This is commonly used in Kubernetes or other infrastructure-as-code projects to ensure YAML files are consistently formatted.

Then, when I point out it merges everything into the first file:

You’re absolutely right, and I apologize for my mistake. Thank you for the correction.

Indeed, the command yq -i -P '.' manifests/**/*.yaml does not just pretty-print each file individually - it merges the content of all matched YAML files into the first file in the list. This is a critical distinction!

What’s happening is:

The shell expands the glob pattern manifests/**/*.yaml into a list of files
yq takes this list and, because of the -i (in-place) flag, it merges the content of all those files into the first file in the list
The remaining files are left untouched

This behavior can be very destructive if not intended, as it would combine the contents of potentially many YAML files into just one file, while leaving the others unchanged.

If someone wanted to pretty-print each file individually without merging, they would need to use a different approach, such as a loop:

find manifests -name "*.yaml" -exec yq -i -P '.' {} \;

Thank you for pointing this out - it’s an important distinction that could cause significant problems if misunderstood.

I like Claude’s solution better — there’s no need to go through xargs or worry about IFS at all.

Deepseek R1

The command you provided uses yq, a command-line YAML processor, to modify YAML files in place. Here’s a breakdown of what it does:

yq: This is the command-line tool for processing YAML files. It is similar to jq but specifically designed for YAML.
-i: This flag tells yq to edit the files in place. Without this flag, yq would output the modified YAML to the terminal without saving the changes to the original files.
-P: This flag tells yq to pretty-print the output. It ensures that the YAML files are formatted in a readable way with proper indentation and structure.
'.': This is the filter expression. In this case, it simply means “select the entire document.” It doesn’t modify the content but ensures the file is processed and rewritten with the specified formatting.
manifests/**/*.yaml: This is the file path pattern. It tells yq to process all YAML files (*.yaml) in the manifests directory and its subdirectories (**).

What does this command do?

It processes all YAML files in the manifests directory and its subdirectories.
It ensures the YAML files are pretty-printed (formatted nicely).
It saves the changes back to the original files (in place).

Example

If you have a YAML file like this:

foo: bar
baz:
  - qux
  - quux

After running the command, the file will be reformatted (if necessary) to ensure proper indentation and structure, but the content will remain the same.

Use Case

This command is useful for cleaning up or standardizing the formatting of YAML files in a directory, especially when working with Kubernetes manifests or other YAML-based configurations.

Still oblivious.

After correction, it goes on at length:

Ah, you’re absolutely correct! I missed that subtle but important detail in my initial explanation. Let me clarify what actually happens when you run the command:

yq -i -P '.' manifests/**/*.yaml

What it actually does:

manifests/**/*.yaml: This glob pattern matches all YAML files in the manifests directory and its subdirectories.
yq: When multiple files are passed to yq, it merges them into a single YAML document by default.
'.': The filter expression '.' means “select the entire document,” but since multiple files are passed, yq merges them into one.
-i: The -i flag tells yq to write the result back to the first file in the list (in place).
-P: The -P flag ensures the output is pretty-printed.

Key Behavior:

Merging: All the YAML files matched by manifests/**/*.yaml are merged into a single YAML document.
In-place editing: The merged result is written back to the first file in the list (overwriting its content).
The other files remain unchanged.

DeepSeek R1 thought it important enough to make a whole heading answering my question: did it know about this?

Did I know this?

Yes, I’m aware of this behavior, but I missed it in my initial explanation. Thank you for pointing it out! This merging behavior is a feature of yq when multiple files are passed, and it’s important to be cautious when using -i (in-place editing) to avoid accidentally overwriting files.

Use Case

This merging behavior can be useful when you want to combine multiple YAML files into one, such as consolidating Kubernetes manifests or configuration files. However, you should always double-check the results to ensure the merge is what you intended.

Let me know if you’d like further clarification! 😊

Mistral’s Le Chat

The command yq -i -P '.' manifests/**/*.yaml is used to process YAML files with the yq tool. Here’s a breakdown of what each part of the command does:

yq: This is a command-line YAML processor. It allows you to parse, edit, and manipulate YAML files using a syntax similar to jq, which is used for JSON files.
-i: This flag stands for “in-place.” It means that the changes made by the yq command will be applied directly to the original files, rather than outputting the results to the standard output or a new file.
-P: This flag stands for “pretty-print.” It ensures that the output YAML is formatted in a readable way, with proper indentation and line breaks.
'.': This is the expression that yq will evaluate. In this case, the dot (.) represents the entire YAML document. So, this expression selects the whole content of each YAML file.
manifests/**/*.yaml: This is a glob pattern that matches all YAML files (*.yaml) in the manifests directory and its subdirectories. The ** part means that it will recursively search through all subdirectories.

Putting it all together, the command yq -i -P '.' manifests/**/*.yaml will:

Recursively find all YAML files in the manifests directory and its subdirectories.
Pretty-print the entire content of each YAML file.
Update each YAML file in place with the pretty-printed version.

This is useful for ensuring that all YAML files in a project have

Le Chat cites sources from the yq repository, the yq docs frontpage, the Tips, Trick and Troubleshooting part of the yq docs, and a 2021 blog article called Mastering YAML Processing in Command Line.

But of course, it’s still wrong.

You are correct. The command yq -i -P '.' manifests/**/*.yaml will indeed merge all the YAML files matched by the glob pattern into the first file listed. This behavior is due to how yq processes multiple files when using the -i (in-place) flag.

Here’s a more accurate explanation:

yq -i -P '.' manifests/**/*.yaml:
- yq: The command-line YAML processor.
- -i: Update the files in place.
- -P: Pretty-print the output.
- '.': Select the entire content of each YAML file.
- manifests/**/*.yaml: Match all YAML files in the manifests directory and its subdirectories.

When multiple files are specified with the -i flag, yq merges the content of all the files into the first file in the list. This means that the content of all YAML files matched by manifests/**/*.yaml will be merged into the first file in that list, and the result will be pretty-printed and saved in place.

This behavior can be useful for consolidating multiple YAML configuration files into a single file, but it’s important to be aware of this merging effect to avoid unintentionally overwriting data.

Thank you for pointing that out!

Short and to the point — why waste time apologizing, the damage is done.

All of the models are aligned enough to thank me, but of course, my corrections did not make a difference. Those models don’t evolve — maybe the next generation’s training set will include this article, and then we’ll know how to invoke yq.

The aftermath

This isn’t anything new. It’s fun that they all got it wrong, though, even o3-mini-high.

I haven’t tried GPT-4.5, Claude Sonnet 3.7 with the thinking knob turned up, or whatever the flavor of the day is, but I imagine they all get it wrong.

And you know what? I’m kinda siding with the LLMs on that one.

First off, I’m the idiot, right? I’m the one who pressed enter after skimming over the command line, going “ah, that might work”.

Say it with me:

A computer can never be held accountable, therefore a computer must never make a management decision. — Isn't that a mood.

Internal IBM training in 1979

And second, I actually think that’s poor design in yq.

I foresee forgetting about this quirk and falling into that trap again.

However, I also recognize it’s probably too late to do anything about it: this would be a breaking change. Oh well.

While I’m blaming tools instead of myself, let me keep on critiquing!

From bad to worse

When I saw that the reconciler was busy creating duplicate copies of everything and running into a lot of conflicts…

And I realized that all the duplicates came from this single file…

…I removed the duplicates.

OH BOY.

But what do you think the k3s reconciler thought of that?

Well, if it only reads one file… it would… see that a lot of resources are gone and get busy deleting?

Quite! Very fast, in fact.

Again, what caught my eyes in the logs were some errors when it failing to delete some namespaces.

Which got me thinking why is it deleting namespaces, and before updown.io could notify me, I knew I was in for a fun day.

At this point, most of my Kubernetes resources were gone.

Deleting deployments deleted pods, which in turn stopped and deleted containers.

Certificates and their secrets were deleted as well, resulting in traefik serving its default certificate.

The services and ingress routes went too, so traefik started returned 404 for any domain and any route. Eventually, traefik itself was gone!

I looked for ways to convince K3S to stop deleting what little was left, and quickly decided it was probably not worth it.

I am prepared for disaster recovery, I have backups of everything, and I had deployments I was barely using anyway — I decided it was time for, uhh, an earty spring cleaning.

I decided I would wipe the machine hosting the k3s server and start fresh — after all, how hard could it be?

No matter how prepared…

What follows is a quick account of putting everything back together and the things I learned discovered along the way.

rsync or swim

The first thing I did was grab everything out of /var/lib/rancher/k3s with rsync, just for safety.

rsync -avz --progress \
    root@brat:/var/lib/rancher/k3s \
    ./var-lib-rancher-k3s

I quickly discovered two things:

600K small files were hiding under agent/containerd
rsync is single-threaded

And rediscovered that, if you have a fast link, compression (at least whatever zlib implementation and setting rsync uses by default) is just holding you back.

I was impatient and eventually switched to:

rsync -av --progress \
    --exclude 'agent/containerd' \
    root@brat:/var/lib/rancher/k3s \
    ./var-lib-rancher-k3s

Which was done fairly quickly, despite being 30GB.

I’m really glad this happened at home, where I keep my comfort 2.5Gbps internet connection.

Debian 12 reinstall

After taking a quick look around the S3 console to make sure the backups I was counting on did, in fact, exist, I headed over to Hetzner Robot to enable:

The Hetzner robot rescue tab, letting you pick between Linux and Linux (old), a public key and a keyboard layout. — Believe it or not, I moved away from ed25519 keys because of a bug in some Go software.

The bad blood runs deep.

Hetzner robot

You then reboot, wait a bit, ssh back into it, run:

installimage

And then you see a nice ncurses TUI (terminal user interface)!

Hetzner Online GmBH - installimage. Your server will be installed now, this will take some minutes You can abort at any time with CTRL+C ... Reading configuration Loading image file variables etc. — Lowendbox Blog (because I forgot to take a screenshot myself)

…unless you’ve switched to Ghostty, in which case it just throws you directly into nano because ghostty has its own TERM value of xterm-ghostty and a lot of places don’t have the requisite terminfo. This is easily fixed by:

infocmp -x | ssh root@brat -- tic -x -

I briefly researched “is any of the RHEL derivatives a good idea for this?” and decided today was exciting enough already.

I have feelings about Debian package management, but really, the applications I ran are all containerized, so the host matters only somewhat.

In fact, Debian has a needlessly large attack surface and there are more suitable distros for that, but, shrug.

I like Hetzner’s installimage thingy. It’s small, it’ll set up RAID 1 for me, it’s okay in my book.

A new k3s cluster

Altough it’s entirely possible to export an etcd snapshot to S3 and restore it later, I was going for a “fresh install”, so I decided not to do that.

My poor edge nodes were crying for mommy at this point, so I put them to sleep for the time being by running the very aptly named ansible playbook:

./ansible-playbook playbooks/are-you-sure/k3s-nuke.yaml -l cloud

I could have easily torn down all the VMs altogether and created them anew through OpenTofu, but I didn’t need to mess with that part, and besides, I recently:

Migrated from Route53 back to GCore (they both do GeoDNS)
Migrated some PoPs from DigitalOcean to Upcloud then to Hetzner

…so I have seen enough HCL for the calendar year, methinks.

Setting up k3s on the leader node is as easy as:

./ansible-playbook playbooks/k3s.yaml -l brat

Just enough Ansible to be dangerous

Ansible is basically “Python reads YAML to run sudo on machines over SSH”.

RedHat maintains it. I think.

I also think nobody really likes it? But it’s been around long enough that people have made it do pretty much everything you would like to do.

In my case, that’s setting up k3s.

The playbook is defined as:

---
- hosts: k3s_cluster
  gather_facts: true
  roles:
      - role: k3s-prereqs
      - role: k3s-download
      - role: ssh

- hosts: k3s_leader
  roles:
      - role: k3s/leader

- hosts: k3s_node
  roles:
      - role: k3s/node

k3s_leader and k3s_node are groups in the inventory, which is generated automatically from OpenTofu state by a Rust script — hence the ./ansible-playbook wrapper:

# in `./ansible-playbook`
#!/bin/bash -eux
source ./config/ansible-env
ansible-playbook -u root -i ./config/inventory.yaml "$@"

Where ansible-env is:

./gen-inventory.rs
export ANSIBLE_CONFIG=$PWD/config/ansible.cfg

And gen-inventory.rs looks something like:

#!/usr/bin/env -S cargo +nightly -Zscript --quiet

---
[package]
edition = "2021"

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0", features = ["preserve_order"] }
indexmap = { version = "2.2.6", features = ["serde"] }
---

use serde::Deserialize;
use serde_json::Map;
use indexmap::IndexMap;
use std::net::{Ipv4Addr, Ipv6Addr};
use std::fs::File;

#[derive(Deserialize, Debug)]
struct TfState {
    outputs: Outputs,
}

#[derive(Deserialize, Debug)]
struct Outputs {
    all_servers: AllServers,
}

#[derive(Deserialize, Debug)]
struct AllServers {
    value: IndexMap<String, Server>,
}

#[derive(Deserialize, Debug)]
#[allow(unused)]
#[serde(deny_unknown_fields)]
struct Server {
    aws_region: String,
    ipv4: Ipv4Addr,
    ipv6: Ipv6Addr,
    location: Option<String>, // Since this is not present in all nodes, make it optional
    provider: String,
    node_type: String,
    ready: bool,
    latlong: (f64, f64),
}

fn main() {
    let tfstate: TfState = serde_json::from_reader(File::open("terraform.tfstate").unwrap()).unwrap();
    let all_servers = &tfstate.outputs.all_servers.value;

    let servers_of_type = |node_type: String| {
        all_servers.iter().filter(move |(_k, v)| v.node_type == node_type)
    };

    let mut children = serde_json::json!({
        "k3s_leader": {
            "children": {
                "dedicated": {}
            }
        },
        "k3s_node": {
            "children": {
                "cloud": {}
            }
        },
        "k3s_cluster": {
            "children": {
                "k3s_leader": {},
                "k3s_node": {}
            }
        },
        "dedicated": {
            "children": (servers_of_type("dedicated".into()).map(|(name, _)| (name.clone(), Map::default())).collect::<IndexMap<_, _>>())
        },
        "cloud": {
            "children": (servers_of_type("cloud".into()).map(|(name, _)| (name.clone(), Map::default())).collect::<IndexMap<_, _>>())
        }
    });

    {
        // add individual nodes
        let children = children.as_object_mut().unwrap();
        for (k, v) in all_servers.iter() {
            children.insert(k.clone(), serde_json::json!({
                "hosts": {
                    format!("{}-node", k): {
                        "ansible_host": v.ipv4.to_string(),
                        "ipv6": v.ipv6.to_string(),
                        "region": v.aws_region,
                        "node_type": v.node_type,
                        "node_name": k,
                        "provider": v.provider,
                        "ansible_user": "root",
                        "ansible_become": false,
                        "latlong": v.latlong,
                        "ansible_python_interpreter": "/usr/bin/python3.11",
                    }
                }
            }));
        }
    }

    let inventory = serde_json::json!({
        "all": {
            "children": children
        }
    });
    let output = serde_json::to_string_pretty(&inventory).unwrap();
    let out_path = "config/inventory.yaml";
    std::fs::write(out_path, output).unwrap();

    println!("Wrote inventory to {out_path}")
}

This uses the unstable -Zscript cargo flag, and I’m honestly thrilled about it.

Weird syntax for a python script but okay.

I won’t bore you with the detail of the k3s-leader and k3s-node roles (they are adapted from k3s-ansible, if memory serves), but I will point out, because this fact never ceases to amuse me, that Ansible has a whopping 22 levels of variable precedence.

CA troubles

After setting up both the leader and the nodes this way, the nodes kept complaining:

Mar 18 11:25:40 hawk k3s[174830]: time="2025-03-18T11:25:40Z" level=error
msg="token CA hash does not match the Cluster CA certificate hash: de13... != d262..."

A k3s server has its own certificate authority — when it bootstraps, it generates a keypair for it, and, well, the new install had a different cert than the old.

…but where was the old cert’s CA hash / token specified anyway?

In an ansible variable somewhere. Apparently I didn’t have enough faith to figure out how to pull out /var/lib/rancher/k3s/server/node-token from the leader and feed it to the nodes, so I just copied it manually into an ansible variable.

That variable is committed to the infra repo but encrypted via git-crypt, a compromise I’m happy with.

However, by mistake, the k3s-leader role also installed that file (the old value), so reading the node-token file fresh off the server didn’t help.

GPT-4o did though, letting me know that the structure of node-token was:

K10<token-ca-hash>::server:<random-token>

And that I just needed to replace what was between the K10 and the :: with the failed assertion value. Same way I write AUR packages and Homebrew formulas 😬.

I actually found out after the fact that GPT-4o explained that to me.

At the time, I figured it out by looking at the hashes — skipping over “K10” took some focus, but I did it. Please clap.

This took an embarassingly long time, but eventually, I had the nodes connected to the central k3s server.

Bringing back the essentials

Traefik v3

After that, I took some time to bring services back, starting from the most essential one: traefik.

Wait, doesn’t-

Yes, k3s does come with Traefik, but they ship v2. I want non-experimental HTTP/3 support, so I disable their traefik version, that’s what that line in deploy-manifests did:

touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip

And I add this manifest to grab traefik v3:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
    name: traefik
    namespace: traefik
spec:
    repo: https://traefik.github.io/charts
    chart: traefik
    version: 34.4.1
    valuesContent: |
        image:
          repository: "traefik"
          tag: "v3.3.4"
        deployment:
          kind: DaemonSet
        logs:
          general:
            level: "INFO"
        hostNetwork: true
        # ✂️

Cert-manager

cert-manager lets you provision TLS certificates through Let’s Encrypt (among others), again, it’s just one HelmChart resource away:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
    name: cert-manager
    namespace: kube-system
spec:
    repo: https://charts.jetstack.io
    chart: cert-manager
    version: 1.17.1
    valuesContent: |
        crds:
            enabled: true
        config:
          apiVersion: controller.config.cert-manager.io/v1alpha1
          kind: ControllerConfiguration

Minio

Sometimes you want object storage but you don’t need S3’s durability guarantee.

My dedicated server comes with 2x512GB SSD storage, it’d be a shame not to use it:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
    name: minio
    namespace: minio
spec:
    repo: https://charts.bitnami.com/bitnami
    chart: minio
    version: 12.13.1
    valuesContent: |
        auth:
          existingSecret: minio
        nodeSelector:
          kubernetes.io/hostname: brat
        persistence:
          enabled: true
          size: 250Gi

k8up

k8up allows backing up folders using restic.

Again, there’s a helm chart we can use to enjoy someone else’s default configuration:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
    name: k8up
    namespace: k8up
spec:
    repo: https://k8up-io.github.io/k8up
    chart: k8up
    version: 4.8.4
    valuesContent: |
        replicaCount: 1
        nodeSelector:
          kubernetes.io/hostname: brat
        k8up:
          enableLeaderElection: false
        metrics:
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
          prometheusRule:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack

Reflector

I’ve noticed you deploy various services to different namespaces, any reason?

Yes! I like things to be neatly separated, although my understanding is that at my scale it’s kinda ridiculous to bother.

Personally I like to be able to make a mess in one namespace, and delete it forcefully after I’m done, making sure that every resource in that namespace is gone for good.

Which causes one notable problem: you can’t read secrets from another namespace.

Hence, reflector:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
    name: reflector
    namespace: kube-system
spec:
    repo: https://emberstack.github.io/helm-charts
    chart: reflector
    version: 9.0.322

Now, when defining secrets, you can see in which namespaces they’re “reflected”:

---
kind: Secret
apiVersion: v1
metadata:
    name: s3-credentials
    namespace: default
    annotations:
        reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
        reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "umami,forgejo,k8up"
        reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
        reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "umami,forgejo,k8up"
type: Opaque
stringData:
    ACCESS_KEY: REDACTED
    SECRET_KEY: REDACTED

A screenshot of the k9s TUI tool. It shows 4 instances of the s3-credentials secret, in namespaces default, forgejo, k8up, and umami. The secret is of type opaque and it has 2 data. — No, seriously, k9s is really cool.

k9s

CloudNativePG

Running a database in Kubernetes is annoying because… databases are stateful. Very much so. Of all the Postgres controllers I’ve tried, cnpg is the one that’s gotten the least in my way.

It has, of course, a helm chart, and I want the controller to run on my dedicated node (brat, if you hadn’t picked up by now):

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
    name: cnpg
    namespace: cnpg
spec:
    repo: https://cloudnative-pg.io/charts/
    chart: cloudnative-pg
    version: 0.23.2
    valuesContent: |
        nodeSelector:
          kubernetes.io/hostname: brat

Bringing forgejo back

Forgejo is annoying, because it wants object storage (which lives in actual Amazon S3, since, I want it to be persistent), and it wants local storage, for git repositories and whatnot.

Postgres

First off, I needed to restore the Postgres database from backup — easy peasy:

---
kind: Cluster
apiVersion: postgresql.cnpg.io/v1
metadata:
    name: forgejo-db
    namespace: forgejo
    labels:
        cnpg.io/reload: "true"
spec:
    instances: 1
    imageName: ghcr.io/cloudnative-pg/postgresql:16 # Specify PostgreSQL 16 image
    primaryUpdateStrategy: unsupervised
    affinity:
        nodeSelector:
            kubernetes.io/hostname: brat
    storage:
        size: 10Gi
        pvcTemplate:
            metadata:
                annotations:
                    # do not back up the db volume with k8up (we back it up
                    # with barman, see below)
                    k8up.io/backup: "false"
    nodeSelector:
        kubernetes.io/hostname: brat
    bootstrap:
        # we're doing disaster recovery aw yiss
        recovery:
            source: cluster-backup
    externalClusters:
        - name: cluster-backup
          barmanObjectStore:
              destinationPath: "s3://bearcove-cnpg-backups/forgejo/brat-1/"
              serverName: forgejo-db
              wal:
                  compression: snappy
              data:
                  compression: snappy
              s3Credentials:
                  accessKeyId:
                      name: s3-credentials
                      key: ACCESS_KEY
                  secretAccessKey:
                      name: s3-credentials
                      key: SECRET_KEY
    backup:
        barmanObjectStore:
            # pro-tip: this needs to be a different path, otherwise it's very unhappy at you.
            destinationPath: "s3://bearcove-cnpg-backups/forgejo/brat-2025-03-18/"
            wal:
                compression: snappy
            data:
                compression: snappy
            s3Credentials:
                accessKeyId:
                    name: s3-credentials
                    key: ACCESS_KEY
                secretAccessKey:
                    name: s3-credentials
                    key: SECRET_KEY
        retentionPolicy: "30d"

I too recoil in horror too when looking at this amount of YAML but… it does the trick!

It’s also really annoying to read the cnpg 1.25 docs — I’m sure MkDocs is much better than nothing, I just think we needn’t stop there.

Anyway — this queues up a restore job (you can view those in k9s with :jobs), you can keep an eye on its logs from k9s as well, eventually, the data’s back.

Persistent volume restore

This one, believe it or not, was a lot more trouble?

Kubernetes really wants you to think of storage, and compute, and memory, as fungible resources. Say how much you want, not where.

Well, you can say where, but it’s annoying.

For example, k8s provides a local-path storage class:

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
    name: forgejo
    namespace: forgejo
    annotations:
        k8up.io/backup: "true"
spec:
    accessModes:
        - ReadWriteOnce
    storageClassName: local-path
    resources:
        requests:
            storage: 20Gi

And what it does is provision a volume under /var/lib/rancher/k3s/storage — as a regular folder.



root@brat /var/lib/rancher/k3s/storage # ls -lhA
total 20K
drwxrwsrwx 3 root 1001 4.0K Mar 18 14:08 pvc-02c60622-d5c9-4dc5-8ea8-22a51eee0b83_minio_minio
drwxrwsrwx 3 root tape 4.0K Mar 28 11:33 pvc-1d42f27a-b955-49ec-9f10-42ed222602f4_umami_umami-db-pg17-1
drwxrwsrwx 3 root tape 4.0K Mar 18 14:32 pvc-2a1953cf-8867-4980-8b59-56ff2ed6411c_forgejo_forgejo-db-1
drwxrwsrwx 4 root amos 4.0K Mar 18 15:31 pvc-a79bde12-25c0-40b9-98c3-ab16a6d12afa_forgejo_forgejo
drwxrwsrwx 3 root tape 4.0K Mar 28 11:49 pvc-bfa94fbd-4105-4860-aa19-06b95d7dd573_forgejo_forgejo-db-pg17-1

This is “odd” merely because the common case is to do “managed” k8s — you pay someone to run your cluster. You ask for compute and they bill you very precisely for what you use. And they have volumes, of course — so when you ask for a volume, it’s created using their volume provider.

But when you run k3s on a bunch of VMs, you don’t have that. And you can’t attach, for example, a DigitalOcean volume to a Hetzner VM — doesn’t work.

So either you roll with something like longhorn or ceph (probably via rook), for which you really honestly want at least 3 (preferably 5) dedicated server that have similar specs, in the same datacenter, etc. — or, you just use local storage because, well, it suits your usecase perfectly.

Problem is — the persistent volume claim doesn’t get “provisioned” (not sure what the proper term here is) until it’s actually needed by something — like a pod.

And if the first pod that needs it is a k8up restore pod (which runs restic), and that restore pod just happens to run on one of your small edge nodes rather than your big and round dedicated server, then…

Oh gosh — the volume ends up in the wrong place?

Correctamundo! Now you see why I’m used to deleting and re-creating stuff.

It’s also why, when I started redeploying everything, I started splitting it into files starting with three digits, to know in which order things should happen:



infra/manifests/forgejo on  main [$]
❯ l
Permissions Size User Date Modified Name
.rw-r--r--@  115 amos 18 Mar 18:27  000-forgejo-namespace.yaml
.rw-r--r--@ 4.8k amos 28 Mar 13:27  001-forgejo-config-secret.yaml
.rw-r--r--@ 4.2k amos 28 Mar 16:40  100-forgejo-db-cluster.yaml
.rw-r--r--@  643 amos 28 Mar 13:27  101-forgejo-db-backups.yaml
.rw-r--r--@  927 amos 18 Mar 18:27  200-forgejo-persistent-volumes.yaml
.rw-r--r--@  161 amos 18 Mar 18:27  201-forgejo-backup-secrets.yaml
.rw-r--r--@ 1.4k amos 18 Mar 18:27  202-forgejo-backups.yaml
.rw-r--r--@ 1.2k amos 18 Mar 18:27  203-forgejo-backup-schedule.yaml
.rw-r--r--@ 1.5k amos 23 Mar 16:57  300-forgejo-deployment.yaml
.rw-r--r--@ 1.2k amos 23 Mar 16:57  400-forgejo-ingress.yaml
.rw-r--r--    33 amos 18 Mar 17:14  README.md

k3s doesn’t actually interpret that ordering, I simply staged them under old-manifests/ and manually copied them into manifests/, applying them one by one.

Okay, but — what about the volume on the wrong node?

Right, well! We just force the volume to be provisioned where we want, by creating a dummy pod!

---
## This forces the volume to be created on node 'brat'
apiVersion: v1
kind: Pod
metadata:
    name: forgejo-ls2
    namespace: forgejo
spec:
    restartPolicy: Never
    securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
    containers:
        - name: ls-container
          image: busybox
          command: ["ls", "-lhA", "/workdir"]
          volumeMounts:
              - name: forgejo-workdir
                mountPath: /workdir
    nodeSelector:
        kubernetes.io/hostname: brat
    volumes:
        - name: forgejo-workdir
          persistentVolumeClaim:
              claimName: forgejo

Is this the stupidest shit ever? Probably. Does it work? Hell yeah.

As for actually restoring from k8up, that was hard too.

For some reason, I use a rootless image of forgejo, and that means that, well, inside the container, we’re not root:



~
❯ kubectl exec forgejo-f9dd988c4-9svkx -n forgejo -it -- /bin/bash
forgejo-f9dd988c4-9svkx:/var/lib/gitea$ whoami
git
forgejo-f9dd988c4-9svkx:/var/lib/gitea$ id
uid=1000(git) gid=1000(git) groups=1000(git)
forgejo-f9dd988c4-9svkx:/var/lib/gitea$
exit

We’re user git, with UID 1000.

So, if we do a normal k8up restore, we’ll end up with a bunch of files owned by root, which git won’t be able to read, and that’s no good!

Now, you are able to specify a podSecurityContext so that the restore pod (the one that runs restic) also runs as the same user:

apiVersion: k8up.io/v1
kind: Restore
# ✂️
spec:
    podSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        fsGroupChangePolicy: "OnRootMismatch"
    # ✂️

But then — amazingly — it fails to write to its own cache dir!!!

The way restic works, from what I can gather, is that it starts by making a whole bunch of S3 GET commands to figure out which backups were made, what do they contain, etc.

It stores all those in its local cache, on disk, because it refers back to that data a lot during the restore process.

If the cache isn’t writable, it still works, it’s just… very, very, VERY slow.

And I’m assuming, costly?

I think I would’ve died of old age first. Anyway, if you set RESTIC_CACHE_DIR to a path you can write from a non-root user, then it works fine. Oh, also if you give it a bunch more CPU and memory than the default:

---
##############################################################################
# k8up backup/restore setup
##############################################################################
apiVersion: v1
kind: ConfigMap
metadata:
    name: restic-vars
    namespace: forgejo
data:
    RESTIC_CACHE_DIR: /tmp/restic-cache
---
apiVersion: k8up.io/v1
kind: Restore
metadata:
    name: restore-workdir-2024-03-18-b
    namespace: forgejo
spec:
    snapshot: e9e9a75d # last `/data/forgejo` backup
    podSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        fsGroupChangePolicy: "OnRootMismatch"
    restoreMethod:
        folder:
            # restore to PVC forgejo
            claimName: forgejo
    backend:
        repoPasswordSecretRef:
            name: backup-repo
            key: password
        resources:
            requests:
                cpu: 10
                memory: 1Gi
            limits:
                cpu: 10
                memory: 4Gi
        envFrom:
            - configMapRef:
                  name: restic-vars
        s3:
            bucket: bearcove-k8up-backups
            endpoint: https://s3.eu-central-1.amazonaws.com
            accessKeyIDSecretRef:
                name: s3-credentials
                key: ACCESS_KEY
            secretAccessKeySecretRef:
                name: s3-credentials
                key: SECRET_KEY

Deployment, service, ingress

I’m not going to go into details for all these, because we’ve spent our quota of YAML for the year and it’s only March.

I guess I can show one combo for forgejo, since it’s not that complicated.

---
kind: Service
apiVersion: v1
metadata:
    name: forgejo
    namespace: forgejo
spec:
    selector:
        app: forgejo
    ipFamilyPolicy: RequireDualStack
    ports:
        - protocol: TCP
          port: 80
          targetPort: http

This is the service: it makes it so anyone in the same namespace is able to connect to forgejo.forgejo.svc.cluster.local:80.

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
    name: tls
    namespace: forgejo
spec:
    secretName: tls-secret
    issuerRef:
        name: letsencrypt-prod
        kind: ClusterIssuer
    dnsNames: [redacted.example.org]

This requests a TLS certificate for my forgejo instance’s domain

---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
    name: request-body-limit
    namespace: forgejo
spec:
    buffering:
        maxRequestBodyBytes: 1073741824 # 1 GiB
        memRequestBodyBytes: 67108864 # 64 MiB

This raises the max POST limit, since apparently that’s an issue when uploading “generic” packages to a forgejo registry 🤷

---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
    name: forgejo
    namespace: forgejo
spec:
    entryPoints:
        - websecure
    routes:
        - match: Host(`redacted.example.org`)
          kind: Rule
          services:
              - name: forgejo
                port: 80
          middlewares:
              - name: request-body-limit
                namespace: forgejo
    tls:
        secretName: tls-secret

And this is the ingress route!

Mhh this feels a little verbose, and a little ingress-specific, even for k8s?

Yeah — you can do shorter with labels (but I could never figure it out), and most people actually use nginx for ingress, but of C and Go, I’ll take the lesser evil.

Bringing home back

The software that powers my website, currently named home, is also deployed through k3s.

It doesn’t have a persistent database: assets (images, videos, etc.) are in object storage, there’s a central “mom” service that has one sqlite database per tenant — that database remembers which assets were already uploaded to object storage, which revision is the latest, and the list of sponsors fetched from GitHub/Patreon.

That’s about it:

################################################################################
# 🐻 MOM DEPLOYMENT
################################################################################
---
kind: Deployment
apiVersion: apps/v1
metadata:
    name: mom
    namespace: home
    labels:
        group: home
        app: mom
spec:
    replicas: 1
    selector:
        matchLabels:
            app: mom
    template:
        metadata:
            labels:
                group: home
                app: mom
        spec:
            affinity:
                nodeAffinity:
                    requiredDuringSchedulingIgnoredDuringExecution:
                        nodeSelectorTerms:
                            - matchExpressions:
                                  - key: node-type
                                    operator: In
                                    values:
                                        - dedicated
            imagePullSecrets:
                - name: forgejo-docker-pull-secrets
            containers:
                - name: mom
                  image: redacted.example.org/bearcove/home:32.2.4
                  command: ["home", "mom"]
                  workingDir: /var/lib/home
                  envFrom:
                      - secretRef:
                            name: home-vars
                      - secretRef:
                            name: home-conf
                  env:
                      - name: RUST_LOG
                        value: "info"
                      - name: POD_NAME
                        valueFrom:
                            fieldRef:
                                fieldPath: metadata.name
                      - name: NODE_NAME
                        valueFrom:
                            fieldRef:
                                fieldPath: spec.nodeName
                  ports:
                      - containerPort: 1118
                        name: http
                  readinessProbe:
                      httpGet:
                          path: /health
                          port: http
                      initialDelaySeconds: 1
                      periodSeconds: 1
                  resources:
                      requests:
                          memory: "400Mi"
                          cpu: "0.5"
                      limits:
                          memory: "16000Mi"
                          cpu: "20"
                  volumeMounts:
                      - name: mom
                        mountPath: /var/lib/home
                      - name: metadata
                        mountPath: /metadata
                        readOnly: true
            volumes:
                - name: mom
                  persistentVolumeClaim:
                      claimName: mom
                - name: metadata
                  hostPath:
                      path: /metadata
                      type: DirectoryOrCreate

As for edge nodes, they’re named “cubs”, and the deployment manifest is fun — it deploys one pod per edge node, while allowing from some pods to be down, and, while that’s the case, routing to the closest node.

################################################################################
# 🧸 CUB DEPLOYMENT
################################################################################
---
kind: Deployment
apiVersion: apps/v1
metadata:
    name: cub
    namespace: home
    labels:
        group: home
        app: cub
spec:
    replicas: 6 # 1 dedicated + 5 edge nodes
    topologySpreadConstraints:
        # Define topology spread constraints for the deployment
        - maxSkew: 1 # defines the maximum skew between the number of pods in different topology domains
          # Use zone as the topology key
          topologyKey: topology.kubernetes.io/zone
          # Allow scheduling even if constraints are not met
          whenUnsatisfiable: ScheduleAnyway
          # Specify the label selector for the pods
          labelSelector:
              matchLabels:
                  # Match pods with the label app: cub
                  app: cub
    affinity:
        podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  podAffinityTerm:
                      labelSelector:
                          matchExpressions:
                              - key: node-type
                                operator: In
                                values:
                                    - cloud
                      topologyKey: "kubernetes.io/hostname"
    selector:
        matchLabels:
            app: cub
    template:
        metadata:
            labels:
                app: cub
                group: home
        spec:
            imagePullSecrets:
                - name: forgejo-docker-pull-secrets
            containers:
                - name: cub
                  image: redacted.example.org/bearcove/home:32.2.4
                  command: ["home", "serve"]
                  envFrom:
                      - secretRef:
                            name: home-vars
                      - secretRef:
                            name: home-conf
                  env:
                      - name: POD_NAME
                        valueFrom:
                            fieldRef:
                                fieldPath: metadata.name
                      - name: NODE_NAME
                        valueFrom:
                            fieldRef:
                                fieldPath: spec.nodeName
                  ports:
                      - containerPort: 1111
                        name: http
                  # define readiness probe: must serve HTTP on port 1111
                  readinessProbe:
                      httpGet:
                          path: /
                          port: http
                          httpHeaders:
                              - name: x-forwarded-host
                                value: fasterthanli.me
                      initialDelaySeconds: 1
                      periodSeconds: 1
                  resources:
                      requests:
                          memory: "400Mi"
                          cpu: "0.5"
                      limits:
                          memory: "1200Mi"
                          cpu: "12"
                  volumeMounts:
                      - name: cub
                        mountPath: /var/lib/home
                      - name: metadata
                        mountPath: /metadata
                        readOnly: true
            volumes:
                - name: cub
                  persistentVolumeClaim:
                      claimName: cub
                - name: metadata
                  hostPath:
                      path: /metadata
                      type: DirectoryOrCreate

Zero-downtime deploys

This is nice, because it allows zero-downtime deploys.

When everything is fine, here’s what the home namespace looks like:

k9s screenshot, showing cub with 6/6 pods ready, and mom with 1/1 pods ready.

All the cub pods are healthy:

6 pods are listed now, they each have 1/1 containers ready, they have status running, 0 restarts, use between 30 and 49 CPU and between 360 and 522 Mem. They have internal IPs and are running on nodes named marl, flam, brat, kaya, heim, hawk.

Now let’s say I cause trouble on purpose, by pushing a change that crashes home on purpose:



home on  main via 🦀 v1.85.1
❯ g show
commit 984d0f5b7fa5cff9172ea1bc16091b1fbc6bed6a (HEAD -> main, origin/main, origin/HEAD)
Author: Amos Wenger <amos@bearcove.eu>
Date:   Fri Mar 28 17:35:46 2025 +0100

    crash on purpose

diff --git a/crates/home/src/main.rs b/crates/home/src/main.rs
index 119e3e91..959fa414 100644
--- a/crates/home/src/main.rs
+++ b/crates/home/src/main.rs
@@ -30,6 +30,8 @@ async fn real_main() -> eyre::Result<()> {
     errhandling::load().install();
     tracingsub::load().install();

+    panic!("woopsie doopsie I'm doing a demo");
+
     let args = clap::load().parse();

     let res = match args.sub {

Ah, uh, no:

forgejo actions screenshot, showing that my CI pipeline is running --version on the binary, so it failed the job.

Okay, FINE, let’s push a change that… only crashes on serve, and only if we’re in production.



home on  main [!] via 🦀 v1.85.1
❯ gwd
crates/mod-cub/src/lib.rs --- Rust
8    #[dylo::export]
9    impl Mod for ModImpl {
10        fn serve(&self, config: Config) -> BoxFuture<'static, Result<()>> {
11            Box::pin(async {
   12             if std::env::var("KUBERNETES_SERVICE_HOST").is_ok() {
   13                 panic!("is this a container? LET ME OUT! LEMMEOUT");
   14             }
   15
   16             impls::serve(config)
   17                 .await
   18                 .map_err(|e| noteyre::eyre!("{}", e))

This time the pipelines pass, I get a chance to show some of the automation I’ve been doing recently:



home on  main via 🦀 v1.85.1
❯ bd bump
Fetched all tags from remote.
Latest tag: v32.2.4
Choose version bump type:
1. Patch (32.2.5)
2. Minor (32.3.0)
3. Major (33.0.0)
1
Creating new tag: v32.2.5
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
To https://redacted.example.org/bearcove/home
 * [new tag]           v32.2.5 -> v32.2.5
Tag v32.2.5 created and pushed successfully

bd stands for beardist, by the way!

A screenshot of beards running in CI, showing various bits of info about the build system, environment, etc. You can see that the trigger-formula-update job was run this time around.

Those -build jobs pushed a generic package:

The generic package page for home, which contains curl -OJ instructions generated by forgejo. You can see it's built for x86_64-unknown-linux-gnu and for aarch64-apple-darwin

…and some homebrew tap was updated automatically (that’s what the trigger-formula-update job was about)

Along with a container image:

The home container package, containing `docker pull` instructions, a sha256 digest. it's for the linux/amd64 platform and is 427 Mib total.

That one isn’t deployed automatically — I like to be the one to give the greenlight, also via beardist.

The k8s subcommand first identifies where the given image is referenced:



infra on  main [$] via 🦀 v1.85.0
❯ bd k8s bearcove/home
Searching for manifests in: manifests
YAML files containing 'bearcove/home' are:
File: manifests/home/301-home-cub-deployment.yaml
  Version 32.2.4 at positions 1761 to 1808
  Context:       
    >>>                 - name: cub
    >>>                   image: redacted.example.org/bearcove/home:32.2.4
    >>>                   command: ["home", "serve"]
    >>>                   envFrom:
    >>>                       - secretRef:

File: manifests/home/300-home-mom-deployment.yaml
  Version 32.2.4 at positions 1074 to 1121
  Context:
    >>>                 - name: mom
    >>>                   image: redacted.example.org/bearcove/home:32.2.4
    >>>                   command: ["home", "mom"]
    >>>                   workingDir: /var/lib/home
    >>>                   envFrom:

And then, it repeatedly polls the forgejo instance, waiting for a ‘different’ version to pop up:



Initializing Forgejo client...
Checking for new versions...
Fetching latest version for package 'home' from 'https://redacted.example.org/api/v1/packages/bearcove'
Request completed in 385ms with status 200 OK
Received 13 packages in response
Filtered to 12 matching packages
Found 11 valid versions
Latest version found: 32.2.5
New version detected: 32.2.5

At which point, it updates the manifests:



Updating manifests...
Updated manifests/home/301-home-cub-deployment.yaml
Updated manifests/home/300-home-mom-deployment.yaml
Staging changes...
Showing staged changes:
manifests/home/300-home-mom-deployment.yaml --- YAML
33                imagePullSecrets:
34                    - name: forgejo-docker-pull-secrets
35                containers:
36                    - name: mom
37                      image: redacted.example.org/bearcove/home:32.2.4
   37                   image: redacted.example.org/bearcove/home:32.2.5
   38                   command: ["home", "mom"]
   39                   workingDir: /var/lib/home
   40                   envFrom:
   41                       - secretRef:

manifests/home/301-home-cub-deployment.yaml --- YAML
48                imagePullSecrets:
49                    - name: forgejo-docker-pull-secrets
50                containers:
51                    - name: cub
52                      image: redacted.example.org/bearcove/home:32.2.4
   52                   image: redacted.example.org/bearcove/home:32.2.5
   53                   command: ["home", "serve"]
   54                   envFrom:
   55                       - secretRef:
   56                             name: home-vars

Commits and pushes:



Committing changes...

> bearcove-infra@1.0.0 lint-staged
> lint-staged

→ No staged files match any configured task.
[main d37d8eb] bump bearcove/home to 32.2.5
 2 files changed, 2 insertions(+), 2 deletions(-)
Pushing changes...
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.17 KiB | 1.17 MiB/s, done.
Total 6 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 4 local objects.
To https://github.com/bearcove/infra.git
   89bddc6..d37d8eb  main -> main

And calls ./deploy-manifests, which learned a couple new tricks since last time:



Deploying manifests...
🔍 Performing dry run...
Source: ./manifests/
Destination: root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/
==================================================
🚨 REVIEW THIS CAREFULLY 🚨
==================================================
The following changes will be made:
==================================================
<fc.T.... home/300-home-mom-deployment.yaml
<fc.T.... home/301-home-cub-deployment.yaml
==================================================
Please review the above changes carefully before proceeding.
==================================================

🚨 Warning: This will perform the changes above.
Are you sure you want to continue? (y/n) y
🔧 Performing operations...
📁 Creating skip file: /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip
🔄 Syncing files...
📜 Viewing logs...
✂️
building file list ...
60 files to consider
home/300-home-mom-deployment.yaml
       2.92K 100%    2.12MB/s    0:00:00 (xfer#1, to-check=29/60)
home/301-home-cub-deployment.yaml
       3.71K 100%    3.53MB/s    0:00:00 (xfer#2, to-check=28/60)

sent 4.41K bytes  received 130 bytes  3.02K bytes/sec
total size is 1.07M  speedup is 236.91

By “learned a couple new tricks”, I mostly mean “I figured out the right set of flags to tell rsync to show what it’s actually going to do”:

# in `deploy-manifests`

#!/bin/bash -euo pipefail

SRC="./manifests/"
DST="root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/"

# Define rsync flags
RSYNC_FLAGS=(--recursive --delete --checksum --human-readable --progress --include='*/' --include='*.yaml' --exclude='*')

# Perform a dry run of rsync to show what would happen
echo "🔍 Performing dry run..."
printf "Source: \033[33m%s\033[0m\n" "$SRC"
printf "Destination: \033[33m%s\033[0m\n" "$DST"

printf "\033[2m==================================================\033[0m\n"
printf "\033[2m🚨 REVIEW THIS CAREFULLY 🚨\033[0m\n"
printf "\033[2m==================================================\033[0m\n"
printf "\033[2mThe following changes will be made:\033[0m\n"
printf "\033[2m==================================================\033[0m\n"
rsync "${RSYNC_FLAGS[@]}" --dry-run --itemize-changes "$SRC" "$DST"
printf "\033[2m==================================================\033[0m\n"
printf "\033[2mPlease review the above changes carefully before proceeding.\033[0m\n"
printf "\033[2m==================================================\033[0m\n"

# Ask for consent before continuing
printf "\n🚨 \033[1;31mWarning:\033[0m This will perform the changes above.\n"
read -p "Are you sure you want to continue? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]
then
    printf "❌ \033[1;31mOperation cancelled.\033[0m\n"
    exit 1
fi

# The actual operations
echo "🔧 Performing operations..."
printf "📁 Creating skip file: \033[33m/var/lib/rancher/k3s/server/manifests/traefik.yaml.skip\033[0m\n"
ssh root@brat "touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip" &
echo "🔄 Syncing files..."
rsync "${RSYNC_FLAGS[@]}" "$SRC" "$DST" &

echo "📜 Viewing logs..."
ssh -t root@brat "journalctl -fxu k3s | ccze -A"

Anyway, after that’s done, and after the k3s reconciler wakes up, we’re looking at this:

The cub deployment now has 5/8 pods ready.

More precisely (one Enter key later), this:

We have 5 healthy pods, the other 3 have 0/1 ready containers, and they're in status CrashLoopBackOff. They have 6 restarts already.

Two more Enter show us the logs:

But my site is still up!

A screenshot of keycdn's Performance Test. It shows TTFB varying from 47ms (in Singapore) to 75ms (most locations), to 162ms in Bangalore and 402ms in Sydney.

At this point, I can either roll back:



infra on  main [$] via 🦀 v1.85.0
❯ g revert d37d8eb5fdf101dfb70427a845fee8924c058dad
[main d9602ff] Revert "bump bearcove/home to 32.2.5"
 2 files changed, 2 insertions(+), 2 deletions(-)

infra on  main [$⇡] via 🦀 v1.85.0
❯ gp
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.23 KiB | 1.23 MiB/s, done.
Total 6 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 4 local objects.
To https://github.com/bearcove/infra.git
   35d0771..d9602ff  main -> main

infra on  main [$] via 🦀 v1.85.0
❯ ./deploy-manifests
🔍 Performing dry run...
Source: ./manifests/
Destination: root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/
==================================================
🚨 REVIEW THIS CAREFULLY 🚨
==================================================
The following changes will be made:
==================================================
<fc.T.... home/300-home-mom-deployment.yaml
<fc.T.... home/301-home-cub-deployment.yaml
==================================================
Please review the above changes carefully before proceeding.
==================================================

🚨 Warning: This will perform the changes above.
Are you sure you want to continue? (y/n) y
🔧 Performing operations...
📁 Creating skip file: /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip
🔄 Syncing files...
📜 Viewing logs...
Mar 28 18:21:43 brat k3s[3054412]: I0328 18:21:43.178939 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
Mar 28 18:21:43 brat k3s[3054412]: E0328 18:21:43.179199 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08"
Mar 28 18:21:46 brat k3s[3054412]: I0328 18:21:46.188632 3054412 range_allocator.go:247] "Successfully synced" key="brat"
Mar 28 18:21:57 brat k3s[3054412]: I0328 18:21:57.179142 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
Mar 28 18:21:57 brat k3s[3054412]: E0328 18:21:57.179508 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08"
Mar 28 18:22:09 brat k3s[3054412]: I0328 18:22:09.179891 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
Mar 28 18:22:09 brat k3s[3054412]: E0328 18:22:09.180186 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08"
Mar 28 18:22:21 brat k3s[3054412]: I0328 18:22:21.178435 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
 k3siles to consider 0 files...
home/300-home-mom-deployment.yaml
       2.92K 100%    2.12MB/s    0:00:00 (xfer#1, to-check=29/60)
home/301-home-cub-deployment.yaml
       3.71K 100%    3.53MB/s    0:00:00 (xfer#2, to-check=28/60)

sent 4.41K bytes  received 130 bytes  3.02K bytes/sec
total size is 1.07M  speedup is 236.91

Or I could just push a fix, and roll forward!

Which I did, right after writing this.

Closing words

So, how did this disaster recovery go? Pretty well, all in all.

I didn’t need to tear down and rebuild from scratch my control node.

Nor did I need to reorganize all my manifests carefully. But I was already down, and it’s been a shit month, so, I decided to treat myself.

The deploy-manifests script, even in its latest incarnation, still blows.

A better version would merge all resources in a single YAML file, and compare against what the control node already has, requiring --fuck-me-up if more than 2 resources are deleted, for example.

But… that would be making a faster horse, right? I probably do want a proper continuous deployment (CD) solution. Something that can do progressive rollouts, automated rollbacks, etc.

For now, I’ll just be very, very careful, and be proud that I didn’t actually lose any data!

I was a bit scared to store repositories only in forgejo (without mirroring them to GitHub) at first, but this isn’t my first disaster and I’ve recovered pretty well each time.

Oh, and in the end, did you end up using taiga for Kanban?

Nope! I’m happily on Teamhood’s free tier, which does everything I need for now.

Thanks for reading this far, and I hope to talk more about beardist on the next season of the self-directed research podcast!

I’ll leave you with a somewhat popular episode of it:

Video thumbnail showing the episode title: the embedded buddy system

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Understanding Rust futures by going way too deep

Jul 25, 2021

45 min #rust · #async

So! Rust futures! Easy peasy lemon squeezy. Until it’s not. So let’s do the easy thing, and then instead of waiting for the hard thing to sneak up on us, we’ll go for it intentionally.

Cool Bear's hot tip

That’s all-around solid life advice.

Choo choo here comes the easy part 🚂💨

We make a new project:

$ cargo new waytoodeep
     Created binary (application) `waytoodeep` package