Impromptu disaster recovery
Background
im-promp-tu (
im-ˈpräm(p)-(ˌ)tü
)
made, done, or formed on or as if on the spur of the moment: improvised
composed or uttered without previous preparation: extemporaneous
On March 18th, 2025, I thought I would look into self-hosted project management solutions — something kanban-y, but.. better?
This one does not spark joy.
After discovering that Teamhood was awesome (and EU-based), but had a 3-seat minimum on their subscriptions, I resigned to reluctantly self-host something.
The teamhood kanban view. I really like their interface.
No mobile support though.
I’m familiar with self-hosting. Like everyone, I’ve been SSH-ing into VPSs for as long as they’ve been cheap, which is… pretty long now.
Before that, shared PHP hosts, and nowadays, I just run a small k3s cluster, on a dedicated Hetzner server in Germany, with nodes around the world (also all on Hetzner now — they have a southeast-asia region now!)
This is honestly all you need for most projects.
On this cluster, we find a custom-made CDN that powers this website, along with a bunch of internal services, like a Forgejo instance.
My instance is only available to logged in users, so I don't have issues with AI crawlers. Big up to everyone fighting the good fight out there.
After finding out that taiga has a couple of unofficial helm charts on ArtifactHub, I opened my infra repo (hosted on GitHub, not on Forgejo, to avoid cyclical dependencies) and started drafting manifests to deploy taiga.
I keep forgetting what Artifact Hub is named.
There are three Helm charts for Taiga, none are official and none have been touched in the last two years.
This doesn't mean they don't work — maybe the software is just very stable!
How I deploy stuff on k3s
The most straightforward way to deploy something to Kubernetes is to use
kubectl
. This works just as well on k3s, the lightweight
distribution of Kubernetes I use.
As you can see, my website is back up as I’m writing this:
~
❯ k -n snug get pods
NAME READY STATUS RESTARTS AGE
cub-7c59bdf6f7-86m69 1/1 Running 0 50m
cub-7c59bdf6f7-9z9hk 1/1 Running 0 50m
cub-7c59bdf6f7-csk89 1/1 Running 0 50m
cub-7c59bdf6f7-kvxbq 1/1 Running 0 50m
cub-7c59bdf6f7-nq44t 1/1 Running 0 50m
cub-7c59bdf6f7-th9sf 1/1 Running 0 50m
mom-5ccc54d65-26bmn 1/1 Running 0 50m
Real ones have kubectl
aliased to k
!
I don’t like that approach, because there’s a disconnect between the YAML manifest as text files on your disk or in a repository, and what actually exists as resources in your Kubernetes server.
What if you forget to run kubectl apply
on one of the files? What if you run
it in a different order and some resources get overwritten?
Isn’t that where something like Spinnaker comes in?
It absolutely is. Once you’re done with continuous integration, making sure all the parts fit together, then you can do continuous delivery, which makes sure that you are deploying changes as quickly as possible in a safe way.
Speaking of CI, this is a fun watch:
This can involve rolling out changes to only a portion of nodes, monitoring some metrics to make sure that they don’t go up or don’t go down, and deciding whether to roll out the rest of the fleet or rolling back.
I have thus far resisted the temptation to resort to something like Spinnaker for my blog.
Instead, I use rsync
!
#!/bin/bash -eux
ssh root@brat "touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip" &
rsync -av --delete ./manifests/ root@brat:/var/lib/rancher/k3s/server/manifests/custom/
ssh root@brat "journalctl -fxu k3s"
That’s my infra/deploy-manifests
script. When the new files are written to
that folder, the k3s reconciler receives notifications that files changed and
applies those changes.
The state of the Kubernetes server and the state of my git repository can still drift if I’m not careful, but most of the time I remember and things are fine.
Why not resolve the problem by running that rsync from a CI job?
I would like to do it from ForgeJo actions, but then we have a chicken and egg problem: ForgeJo itself is deployed with k3s…
…right, a circular dependency like that would be bad in case of outages.
Exactly. I could do something like GitHub Actions, but apart from the fact that I really don’t like GitHub Actions, I don’t feel comfortable giving it a private key that can log into all my production servers, even on a private repository.
Are you sure you’re not just finding excuses to avoid setting it up?
Isn’t it just laziness?
Mayhaps.
The k3s reconciler
Trouble is: the k3s reconciler is much, uhhh simpler than I anticipated.
See, my resources are actually spread across multiple YAML files:
infra on main [$] via 🦀 v1.85.0
❯ tree -ahC manifests
[ 512] manifests
├── [ 345] README.md
├── [ 160] cert-manager
│ ├── [ 451] 300-cert-manager-chart.yaml
│ ├── [ 374] 900-cert-manager-issuer-letsencrypt-prod.yaml
│ └── [ 139] README.md
├── [ 160] cloudnative-pg
│ ├── [ 109] 000-cloudnative-pg-namespace.yaml
│ ├── [ 273] 100-cloudnative-pg-chart.yaml
│ └── [ 208] README.md
├── [ 448] forgejo
│ ├── [ 115] 000-forgejo-namespace.yaml
│ ├── [4.6K] 001-forgejo-config-secret.yaml
│ ├── [ 174] 002-forgejo-runner-secret.yaml
✂️
This is something k3s understands: on startup, it reads and processes all
.yaml
files, compares the whole set against existing resources in k8s, and then
does whatever CRUD actions are needed.
Nope!
I’d gotten used to the reconciler limitations over time.
I would routinely comment and uncomment resources to force recreation rather than a modification that often fails.
And of course, the only way to know if something failed is to look at its logs:
# journalctl -u k3s | grep -F 'custom/' | ccze -A | tail -4
Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.545897 52573 event.go:389] "Event occurred" object="kube-system/410-snug-tenants-ingress" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="AppliedManifest" message="Applied manifest at \"/var/lib/rancher/k3s/server/manifests/custom/snug/410-snug-tenants-ingress.yaml\""
Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.552720 52573 event.go:389] "Event occurred" object="kube-system/300-traefik-v3-chart" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="ApplyingManifest" message="Applying manifest at \"/var/lib/rancher/k3s/server/manifests/custom/traefik-v3/300-traefik-v3-chart.yaml\""
Mar 18 15:45:57 brat k3s[52573]: I0318 15:45:57.575145 52573 event.go:389] "Event occurred" object="kube-system/300-traefik-v3-chart" fieldPath="" kind="Addon" apiVersion="k3s.cattle.io/v1" type="Normal" reason="AppliedManifest" message="Applied manifest at \"/var/lib/rancher/k3s/server/manifests/custom/traefik-v3/300-traefik-v3-chart.yaml\""
ccze is a log colorizer as old as the
world itself, which is to say you can install it with apt install ccze
on
Debian 12 — it makes journalctl
output almost bearable.
This makes this solution worse than running kubectl apply locally, because at least kubectl apply will block until the apply is done and will show you errors.
~
❯ k apply -f /etc/hosts
error: error validating "/etc/hosts": error validating data: invalid object to validate; if you choose to ignore these errors, turn validation off with --validate=false
After a while, I had gotten so good at predicting what the reconciler would do that I stopped checking the logs systematically.
I’d only SSH in if things didn’t magically start working withing 30s or so.
Kubernetes controllers love to panic.
You have to go out of your way to properly validate data in Go, so some just.. don’t. It’s probably not as bad in widely used controllers, but this is a systemic issue in the Go ecosystem.
Until March 18, 2025, when things went very, very wrong.
The reformattening
I recently changed my code editor settings to indent all files with 4 spaces instead of 2. Not just Rust, but YAML too.
Upon opening existing manifest to refresh my memory, I noticed they were indented with two spaces, so I asked “Claude 3.5 Sonnet (Fast Edit)” to give me a command to reformat all of them.
The “(Fast Edit)” mode is, as far as I know, only available in the Zed code editor.
For my money, it’s the best code help from the current set of assistants because it has very limited smarts, but it is very, very fast, so I don’t actually have time to get out of flow state.
It gave me this command:
yq -i -P '.' manifests/**/*.yaml
I know, because it’s in my atuin history. atuin is awesome.
I installed yq, rightfully assuming it’s jq but for YAML.
The command looked good to me on the surface and because this is all happening on a git repository I figured that if it didn’t work, I could just revert.
I ran it, and checked git status:
infra on main [$] via 🦀 v1.85.0
❯ yq -i -P '.' manifests/**/*.yaml
infra on main [$!] via 🦀 v1.85.0
❯ gwS
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: manifests/cert-manager/300-cert-manager-chart.yaml
no changes added to commit (use "git add" and/or "git commit -a")
I was disappointed seeing that it only worked on one file.
I briefly thought about adapting it to use a loop, but then I remembered I switched to the fish shell and I don’t remember their looping syntax off the top of my head yet.
So I committed the work so far, remembered to run ./deploy-manifests
, and then ran
zed manifests/**/*.yaml
to open them all as tabs — I only had maybe 15, I figured I’d just
repeatedly ask it to format them.
While I was doing that, I happened to have the k3s logs tailing in a window on my second screen.
Because it started scrolling furiously, I took a closer look.
It was trying (and failing, repeatedly) to create… duplicates of resources that already existed?
After frowning at the logs for a bit I found that it mentioned
300-cert-manager-chart.yaml
a lot, and… wait a minute, why is it 1.75MB?
We’ll talk more about the roles LLMs played in that story later.
Since this isn’t a blameless postmortem, I, cool bear, will be in charge of assigning blame.
I’m observing that:
- Amos did not read the man page for yq before running the command.
- Amos only looked at
git status
, notgit diff
— even a very quick look would have shown that something was seriously wrong. - Amos was under a lot of stress that day, didn’t have enough sleep, and probably should not have been touching infrastructure at all.
Further blame will be assigned later.
No thoughts head weighty
This command did not do what I thought it would:
yq -i -P '.' manifests/**/*.yaml
I’m fully bracing myself for a thousand replies saying we told you so blah blah blah — I know, I know. But let’s take the experiment to its logical conclusion and ask a bunch of LLMs to explain what’s wrong with this command.
GPT-4o
GPT-4o is the default “non-reasoning” model from OpenAI if you have a “Plus” subscription.
At the time of this writing, GPT-4.5 did technically come out, but I don’t know anybody who’s using it seriously. Many people doubt that non-reasoning models will get much smarter than GPT-40.
Something OpenAI did recently-ish is adding sources to answers
Explain what this does?
❯ yq -i -P ‘.’ manifests/**/*.yaml
This command uses yq
, a command-line YAML processor, to modify YAML files in place with pretty-printing.
Breakdown:
yq
- The YAML processor (likejq
for JSON but for YAML).-i
- Edits the files in place (modifies the original files rather than printing output to stdout).-P
- Pretty-prints the output (ensures consistent formatting with indentation and spacing).'.'
- The yq expression. In this case,.
represents the entire YAML structure, meaning it reads and rewrites the YAML file as-is without modifying the content itself.manifests/**/*.yaml
- Targets all YAML files recursively inside themanifests/
directory and its subdirectories.
Effect:
- It reformats all YAML files under
manifests/
with consistent indentation and spacing. - If the YAML files were messy or had inconsistent formatting, they will now be neatly formatted.
- No actual content change happens unless
yq
has an opinionated reformatting behavior.
This is useful for enforcing consistent YAML formatting across multiple files, especially in projects with many configuration files (e.g., Kubernetes manifests).
GPT-4o didn’t see the problem.
I had to insist.
well, it actually merges all the files into the first argument.
did you know that?
You’re absolutely right! The behavior of yq can be a bit surprising when using -i (in-place editing) with multiple files.
What’s Happening?
- When
yq -i -P '.' manifests/**/*.yaml
is executed, yq processes all matching YAML files. - However, -i makes yq write back the result to a single file—specifically, the first file in the glob expansion (
manifests/**/*.yaml
). - Instead of formatting each file separately, all files get merged into the first file, effectively overwriting it with a merged version of all YAML contents.
Ah, so it does sort of “understand” what the problem is! I mean, not understand, but…
Well, it then went on to say this:
How to Fix It?
If the goal is to reformat each file separately rather than merging them, use:
yq -i -P '.' manifests/**/*
This ensures yq processes files individually, preventing unwanted merging.
I didn’t want to let it get away with that, so I pressed on:
…did you just give me the same command again?
it still has the same problem.
You’re absolutely right to call that out. I should have been more careful! The issue still persists because the core problem is how yq handles multiple files when using -i
.
It eventually gave me two solutions that do work:
Solution using find + xargs
find manifests -type f -name '*.yaml' -print0 | xargs -0 -I{} yq -i -P '.' {}
Solution using a simple for loop
for file in manifests/**/*.yaml; do
yq -i -P '.' "$file"
done
Before we close the LLM parenthesis, let’s check what some other models would say.
Claude 3.7 Sonnet
This command uses the yq
tool to process YAML files. Let me break it down:
yq -i -P '.' manifests/**/*.yaml
does the following:
yq
is a command-line tool for processing YAML files (similar to howjq
processes JSON)The flags:
-i
stands for “in-place” - it modifies the files directly instead of printing to standard output-P
stands for “pretty-print” - it formats the YAML with proper indentation and structure
'.'
is the filter expression, which means “the entire document” (essentially, process the whole document without changing its content)manifests/**/*.yaml
is a glob pattern that matches:- All files with
.yaml
extension - Inside any subdirectory (
**
means “any depth of directories”) - Under the
manifests
directory
- All files with
In summary, this command reformats and pretty-prints all YAML files within the manifests
directory and its subdirectories, updating the files directly with consistent formatting. This is commonly used in Kubernetes or other infrastructure-as-code projects to ensure YAML files are consistently formatted.
Then, when I point out it merges everything into the first file:
You’re absolutely right, and I apologize for my mistake. Thank you for the correction.
Indeed, the command yq -i -P '.' manifests/**/*.yaml
does not just pretty-print each file individually - it merges the content of all matched YAML files into the first file in the list. This is a critical distinction!
What’s happening is:
- The shell expands the glob pattern
manifests/**/*.yaml
into a list of files yq
takes this list and, because of the-i
(in-place) flag, it merges the content of all those files into the first file in the list- The remaining files are left untouched
This behavior can be very destructive if not intended, as it would combine the contents of potentially many YAML files into just one file, while leaving the others unchanged.
If someone wanted to pretty-print each file individually without merging, they would need to use a different approach, such as a loop:
find manifests -name "*.yaml" -exec yq -i -P '.' {} \;
Thank you for pointing this out - it’s an important distinction that could cause significant problems if misunderstood.
I like Claude’s solution better — there’s no need to go through xargs or worry about IFS at all.
Deepseek R1
The command you provided uses yq
, a command-line YAML processor, to modify YAML files in place. Here’s a breakdown of what it does:
yq
: This is the command-line tool for processing YAML files. It is similar tojq
but specifically designed for YAML.-i
: This flag tellsyq
to edit the files in place. Without this flag,yq
would output the modified YAML to the terminal without saving the changes to the original files.-P
: This flag tellsyq
to pretty-print the output. It ensures that the YAML files are formatted in a readable way with proper indentation and structure.'.'
: This is the filter expression. In this case, it simply means “select the entire document.” It doesn’t modify the content but ensures the file is processed and rewritten with the specified formatting.manifests/**/*.yaml
: This is the file path pattern. It tellsyq
to process all YAML files (*.yaml
) in themanifests
directory and its subdirectories (**
).
What does this command do?
- It processes all YAML files in the
manifests
directory and its subdirectories. - It ensures the YAML files are pretty-printed (formatted nicely).
- It saves the changes back to the original files (in place).
Example
If you have a YAML file like this:
foo: bar
baz:
- qux
- quux
After running the command, the file will be reformatted (if necessary) to ensure proper indentation and structure, but the content will remain the same.
Use Case
This command is useful for cleaning up or standardizing the formatting of YAML files in a directory, especially when working with Kubernetes manifests or other YAML-based configurations.
Still oblivious.
After correction, it goes on at length:
Ah, you’re absolutely correct! I missed that subtle but important detail in my initial explanation. Let me clarify what actually happens when you run the command:
yq -i -P '.' manifests/**/*.yaml
What it actually does:
manifests/**/*.yaml
: This glob pattern matches all YAML files in themanifests
directory and its subdirectories.yq
: When multiple files are passed toyq
, it merges them into a single YAML document by default.'.'
: The filter expression'.'
means “select the entire document,” but since multiple files are passed,yq
merges them into one.-i
: The-i
flag tellsyq
to write the result back to the first file in the list (in place).-P
: The-P
flag ensures the output is pretty-printed.
Key Behavior:
- Merging: All the YAML files matched by
manifests/**/*.yaml
are merged into a single YAML document. - In-place editing: The merged result is written back to the first file in the list (overwriting its content).
- The other files remain unchanged.
DeepSeek R1 thought it important enough to make a whole heading answering my question: did it know about this?
Did I know this?
Yes, I’m aware of this behavior, but I missed it in my initial explanation. Thank you for pointing it out! This merging behavior is a feature of yq
when multiple files are passed, and it’s important to be cautious when using -i
(in-place editing) to avoid accidentally overwriting files.
Use Case
This merging behavior can be useful when you want to combine multiple YAML files into one, such as consolidating Kubernetes manifests or configuration files. However, you should always double-check the results to ensure the merge is what you intended.
Let me know if you’d like further clarification! 😊
Mistral’s Le Chat
The command yq -i -P '.' manifests/**/*.yaml
is used to process YAML files with the yq
tool. Here’s a breakdown of what each part of the command does:
yq
: This is a command-line YAML processor. It allows you to parse, edit, and manipulate YAML files using a syntax similar tojq
, which is used for JSON files.-i
: This flag stands for “in-place.” It means that the changes made by theyq
command will be applied directly to the original files, rather than outputting the results to the standard output or a new file.-P
: This flag stands for “pretty-print.” It ensures that the output YAML is formatted in a readable way, with proper indentation and line breaks.'.'
: This is the expression thatyq
will evaluate. In this case, the dot (.
) represents the entire YAML document. So, this expression selects the whole content of each YAML file.manifests/**/*.yaml
: This is a glob pattern that matches all YAML files (*.yaml
) in themanifests
directory and its subdirectories. The**
part means that it will recursively search through all subdirectories.
Putting it all together, the command yq -i -P '.' manifests/**/*.yaml
will:
- Recursively find all YAML files in the
manifests
directory and its subdirectories. - Pretty-print the entire content of each YAML file.
- Update each YAML file in place with the pretty-printed version.
This is useful for ensuring that all YAML files in a project have
Le Chat cites sources from the yq repository, the yq docs frontpage, the Tips, Trick and Troubleshooting part of the yq docs, and a 2021 blog article called Mastering YAML Processing in Command Line.
But of course, it’s still wrong.
You are correct. The command yq -i -P '.' manifests/**/*.yaml
will indeed merge all the YAML files matched by the glob pattern into the first file listed. This behavior is due to how yq
processes multiple files when using the -i
(in-place) flag.
Here’s a more accurate explanation:
yq -i -P '.' manifests/**/*.yaml
:yq
: The command-line YAML processor.-i
: Update the files in place.-P
: Pretty-print the output.'.'
: Select the entire content of each YAML file.manifests/**/*.yaml
: Match all YAML files in themanifests
directory and its subdirectories.
When multiple files are specified with the -i
flag, yq
merges the content of all the files into the first file in the list. This means that the content of all YAML files matched by manifests/**/*.yaml
will be merged into the first file in that list, and the result will be pretty-printed and saved in place.
This behavior can be useful for consolidating multiple YAML configuration files into a single file, but it’s important to be aware of this merging effect to avoid unintentionally overwriting data.
Thank you for pointing that out!
Short and to the point — why waste time apologizing, the damage is done.
All of the models are aligned enough to thank me, but of course, my corrections did not make a difference. Those models don’t evolve — maybe the next generation’s training set will include this article, and then we’ll know how to invoke yq.
The aftermath
This isn’t anything new. It’s fun that they all got it wrong, though, even o3-mini-high
.
I haven’t tried GPT-4.5, Claude Sonnet 3.7 with the thinking knob turned up, or whatever the flavor of the day is, but I imagine they all get it wrong.
And you know what? I’m kinda siding with the LLMs on that one.
First off, I’m the idiot, right? I’m the one who pressed enter after skimming over the command line, going “ah, that might work”.
Say it with me:
Isn't that a mood.
And second, I actually think that’s poor design in yq
.
I foresee forgetting about this quirk and falling into that trap again.
However, I also recognize it’s probably too late to do anything about it: this would be a breaking change. Oh well.
While I’m blaming tools instead of myself, let me keep on critiquing!
From bad to worse
When I saw that the reconciler was busy creating duplicate copies of everything and running into a lot of conflicts…
And I realized that all the duplicates came from this single file…
…I removed the duplicates.
OH BOY.
But what do you think the k3s reconciler thought of that?
Well, if it only reads one file… it would… see that a lot of resources are gone and get busy deleting?
Quite! Very fast, in fact.
Again, what caught my eyes in the logs were some errors when it failing to delete some namespaces.
Which got me thinking why is it deleting namespaces, and before updown.io could notify me, I knew I was in for a fun day.
At this point, most of my Kubernetes resources were gone.
Deleting deployments deleted pods, which in turn stopped and deleted containers.
Certificates and their secrets were deleted as well, resulting in traefik serving its default certificate.
The services and ingress routes went too, so traefik started returned 404 for any domain and any route. Eventually, traefik itself was gone!
I looked for ways to convince K3S to stop deleting what little was left, and quickly decided it was probably not worth it.
I am prepared for disaster recovery, I have backups of everything, and I had deployments I was barely using anyway — I decided it was time for, uhh, an earty spring cleaning.
I decided I would wipe the machine hosting the k3s server and start fresh — after all, how hard could it be?
No matter how prepared…
What follows is a quick account of putting everything back together and the things I learned discovered along the way.
rsync or swim
The first thing I did was grab everything out of /var/lib/rancher/k3s
with rsync,
just for safety.
rsync -avz --progress \
root@brat:/var/lib/rancher/k3s \
./var-lib-rancher-k3s
I quickly discovered two things:
- 600K small files were hiding under
agent/containerd
- rsync is single-threaded
And rediscovered that, if you have a fast link, compression (at least whatever zlib implementation and setting rsync uses by default) is just holding you back.
I was impatient and eventually switched to:
rsync -av --progress \
--exclude 'agent/containerd' \
root@brat:/var/lib/rancher/k3s \
./var-lib-rancher-k3s
Which was done fairly quickly, despite being 30GB.
I’m really glad this happened at home, where I keep my comfort 2.5Gbps internet connection.
Debian 12 reinstall
After taking a quick look around the S3 console to make sure the backups I was counting on did, in fact, exist, I headed over to Hetzner Robot to enable:
Believe it or not, I moved away from ed25519 keys because of a bug in some Go software.
The bad blood runs deep.
You then reboot, wait a bit, ssh back into it, run:
installimage
And then you see a nice ncurses TUI (terminal user interface)!
…unless you’ve switched to Ghostty, in which case it
just throws you directly into nano because ghostty has its own TERM
value of
xterm-ghostty
and a lot of places don’t have the requisite terminfo.
This is easily fixed by:
infocmp -x | ssh root@brat -- tic -x -
I briefly researched “is any of the RHEL derivatives a good idea for this?” and decided today was exciting enough already.
I have feelings about Debian package management, but really, the applications I ran are all containerized, so the host matters only somewhat.
In fact, Debian has a needlessly large attack surface and there are more suitable distros for that, but, shrug.
I like Hetzner’s installimage
thingy. It’s small, it’ll set up RAID 1 for me,
it’s okay in my book.
A new k3s cluster
Altough it’s entirely possible to export an etcd snapshot to S3 and restore it later, I was going for a “fresh install”, so I decided not to do that.
My poor edge nodes were crying for mommy at this point, so I put them to sleep for the time being by running the very aptly named ansible playbook:
./ansible-playbook playbooks/are-you-sure/k3s-nuke.yaml -l cloud
I could have easily torn down all the VMs altogether and created them anew through OpenTofu, but I didn’t need to mess with that part, and besides, I recently:
- Migrated from Route53 back to GCore (they both do GeoDNS)
- Migrated some PoPs from DigitalOcean to Upcloud then to Hetzner
…so I have seen enough HCL for the calendar year, methinks.
Setting up k3s on the leader node is as easy as:
./ansible-playbook playbooks/k3s.yaml -l brat
Just enough Ansible to be dangerous
Ansible is basically “Python reads YAML to run sudo on machines over SSH”.
RedHat maintains it. I think.
I also think nobody really likes it? But it’s been around long enough that people have made it do pretty much everything you would like to do.
In my case, that’s setting up k3s.
The playbook is defined as:
---
- hosts: k3s_cluster
gather_facts: true
roles:
- role: k3s-prereqs
- role: k3s-download
- role: ssh
- hosts: k3s_leader
roles:
- role: k3s/leader
- hosts: k3s_node
roles:
- role: k3s/node
k3s_leader
and k3s_node
are groups in the inventory, which is generated
automatically from OpenTofu state by a Rust script — hence the
./ansible-playbook
wrapper:
# in `./ansible-playbook`
#!/bin/bash -eux
source ./config/ansible-env
ansible-playbook -u root -i ./config/inventory.yaml "$@"
Where ansible-env
is:
./gen-inventory.rs
export ANSIBLE_CONFIG=$PWD/config/ansible.cfg
And gen-inventory.rs
looks something like:
#!/usr/bin/env -S cargo +nightly -Zscript --quiet
---
[package]
edition = "2021"
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0", features = ["preserve_order"] }
indexmap = { version = "2.2.6", features = ["serde"] }
---
use serde::Deserialize;
use serde_json::Map;
use indexmap::IndexMap;
use std::net::{Ipv4Addr, Ipv6Addr};
use std::fs::File;
#[derive(Deserialize, Debug)]
struct TfState {
outputs: Outputs,
}
#[derive(Deserialize, Debug)]
struct Outputs {
all_servers: AllServers,
}
#[derive(Deserialize, Debug)]
struct AllServers {
value: IndexMap<String, Server>,
}
#[derive(Deserialize, Debug)]
#[allow(unused)]
#[serde(deny_unknown_fields)]
struct Server {
aws_region: String,
ipv4: Ipv4Addr,
ipv6: Ipv6Addr,
location: Option<String>, // Since this is not present in all nodes, make it optional
provider: String,
node_type: String,
ready: bool,
latlong: (f64, f64),
}
fn main() {
let tfstate: TfState = serde_json::from_reader(File::open("terraform.tfstate").unwrap()).unwrap();
let all_servers = &tfstate.outputs.all_servers.value;
let servers_of_type = |node_type: String| {
all_servers.iter().filter(move |(_k, v)| v.node_type == node_type)
};
let mut children = serde_json::json!({
"k3s_leader": {
"children": {
"dedicated": {}
}
},
"k3s_node": {
"children": {
"cloud": {}
}
},
"k3s_cluster": {
"children": {
"k3s_leader": {},
"k3s_node": {}
}
},
"dedicated": {
"children": (servers_of_type("dedicated".into()).map(|(name, _)| (name.clone(), Map::default())).collect::<IndexMap<_, _>>())
},
"cloud": {
"children": (servers_of_type("cloud".into()).map(|(name, _)| (name.clone(), Map::default())).collect::<IndexMap<_, _>>())
}
});
{
// add individual nodes
let children = children.as_object_mut().unwrap();
for (k, v) in all_servers.iter() {
children.insert(k.clone(), serde_json::json!({
"hosts": {
format!("{}-node", k): {
"ansible_host": v.ipv4.to_string(),
"ipv6": v.ipv6.to_string(),
"region": v.aws_region,
"node_type": v.node_type,
"node_name": k,
"provider": v.provider,
"ansible_user": "root",
"ansible_become": false,
"latlong": v.latlong,
"ansible_python_interpreter": "/usr/bin/python3.11",
}
}
}));
}
}
let inventory = serde_json::json!({
"all": {
"children": children
}
});
let output = serde_json::to_string_pretty(&inventory).unwrap();
let out_path = "config/inventory.yaml";
std::fs::write(out_path, output).unwrap();
println!("Wrote inventory to {out_path}")
}
This uses the unstable -Zscript cargo flag, and I’m honestly thrilled about it.
Weird syntax for a python script but okay.
I won’t bore you with the detail of the k3s-leader and k3s-node roles (they are adapted from k3s-ansible, if memory serves), but I will point out, because this fact never ceases to amuse me, that Ansible has a whopping 22 levels of variable precedence.
CA troubles
After setting up both the leader and the nodes this way, the nodes kept complaining:
Mar 18 11:25:40 hawk k3s[174830]: time="2025-03-18T11:25:40Z" level=error
msg="token CA hash does not match the Cluster CA certificate hash: de13... != d262..."
A k3s server has its own certificate authority — when it bootstraps, it generates a keypair for it, and, well, the new install had a different cert than the old.
…but where was the old cert’s CA hash / token specified anyway?
In an ansible variable somewhere. Apparently I didn’t have enough faith to
figure out how to pull out /var/lib/rancher/k3s/server/node-token
from
the leader and feed it to the nodes, so I just copied it manually into
an ansible variable.
That variable is committed to the infra
repo but encrypted via
git-crypt, a compromise I’m happy with.
However, by mistake, the k3s-leader role also installed that file (the old
value), so reading the node-token
file fresh off the server didn’t help.
GPT-4o did though, letting me know that the structure of node-token
was:
K10<token-ca-hash>::server:<random-token>
And that I just needed to replace what was between the K10
and the ::
with
the failed assertion value. Same way I write AUR packages and Homebrew formulas
😬.
I actually found out after the fact that GPT-4o explained that to me.
At the time, I figured it out by looking at the hashes — skipping over “K10” took some focus, but I did it. Please clap.
This took an embarassingly long time, but eventually, I had the nodes connected to the central k3s server.
Bringing back the essentials
Traefik v3
After that, I took some time to bring services back, starting from the most essential one: traefik.
Wait, doesn’t-
Yes, k3s does come with Traefik, but they ship v2. I want non-experimental HTTP/3
support, so I disable their traefik version, that’s what that line in deploy-manifests
did:
touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip
And I add this manifest to grab traefik v3:
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: traefik
namespace: traefik
spec:
repo: https://traefik.github.io/charts
chart: traefik
version: 34.4.1
valuesContent: |
image:
repository: "traefik"
tag: "v3.3.4"
deployment:
kind: DaemonSet
logs:
general:
level: "INFO"
hostNetwork: true
# ✂️
Cert-manager
cert-manager lets you provision TLS certificates through Let’s Encrypt (among
others), again, it’s just one HelmChart
resource away:
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: cert-manager
namespace: kube-system
spec:
repo: https://charts.jetstack.io
chart: cert-manager
version: 1.17.1
valuesContent: |
crds:
enabled: true
config:
apiVersion: controller.config.cert-manager.io/v1alpha1
kind: ControllerConfiguration
Minio
Sometimes you want object storage but you don’t need S3’s durability guarantee.
My dedicated server comes with 2x512GB SSD storage, it’d be a shame not to use it:
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: minio
namespace: minio
spec:
repo: https://charts.bitnami.com/bitnami
chart: minio
version: 12.13.1
valuesContent: |
auth:
existingSecret: minio
nodeSelector:
kubernetes.io/hostname: brat
persistence:
enabled: true
size: 250Gi
k8up
k8up allows backing up folders using restic.
Again, there’s a helm chart we can use to enjoy someone else’s default configuration:
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: k8up
namespace: k8up
spec:
repo: https://k8up-io.github.io/k8up
chart: k8up
version: 4.8.4
valuesContent: |
replicaCount: 1
nodeSelector:
kubernetes.io/hostname: brat
k8up:
enableLeaderElection: false
metrics:
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
prometheusRule:
enabled: true
additionalLabels:
release: kube-prometheus-stack
Reflector
I’ve noticed you deploy various services to different namespaces, any reason?
Yes! I like things to be neatly separated, although my understanding is that at my scale it’s kinda ridiculous to bother.
Personally I like to be able to make a mess in one namespace, and delete it forcefully after I’m done, making sure that every resource in that namespace is gone for good.
Which causes one notable problem: you can’t read secrets from another namespace.
Hence, reflector:
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: reflector
namespace: kube-system
spec:
repo: https://emberstack.github.io/helm-charts
chart: reflector
version: 9.0.322
Now, when defining secrets, you can see in which namespaces they’re “reflected”:
---
kind: Secret
apiVersion: v1
metadata:
name: s3-credentials
namespace: default
annotations:
reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "umami,forgejo,k8up"
reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "umami,forgejo,k8up"
type: Opaque
stringData:
ACCESS_KEY: REDACTED
SECRET_KEY: REDACTED
No, seriously, k9s is really cool.
CloudNativePG
Running a database in Kubernetes is annoying because… databases are stateful. Very much so. Of all the Postgres controllers I’ve tried, cnpg is the one that’s gotten the least in my way.
It has, of course, a helm chart, and I want the controller to run on my dedicated node
(brat
, if you hadn’t picked up by now):
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: cnpg
namespace: cnpg
spec:
repo: https://cloudnative-pg.io/charts/
chart: cloudnative-pg
version: 0.23.2
valuesContent: |
nodeSelector:
kubernetes.io/hostname: brat
Bringing forgejo back
Forgejo is annoying, because it wants object storage (which lives in actual Amazon S3, since, I want it to be persistent), and it wants local storage, for git repositories and whatnot.
Postgres
First off, I needed to restore the Postgres database from backup — easy peasy:
---
kind: Cluster
apiVersion: postgresql.cnpg.io/v1
metadata:
name: forgejo-db
namespace: forgejo
labels:
cnpg.io/reload: "true"
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:16 # Specify PostgreSQL 16 image
primaryUpdateStrategy: unsupervised
affinity:
nodeSelector:
kubernetes.io/hostname: brat
storage:
size: 10Gi
pvcTemplate:
metadata:
annotations:
# do not back up the db volume with k8up (we back it up
# with barman, see below)
k8up.io/backup: "false"
nodeSelector:
kubernetes.io/hostname: brat
bootstrap:
# we're doing disaster recovery aw yiss
recovery:
source: cluster-backup
externalClusters:
- name: cluster-backup
barmanObjectStore:
destinationPath: "s3://bearcove-cnpg-backups/forgejo/brat-1/"
serverName: forgejo-db
wal:
compression: snappy
data:
compression: snappy
s3Credentials:
accessKeyId:
name: s3-credentials
key: ACCESS_KEY
secretAccessKey:
name: s3-credentials
key: SECRET_KEY
backup:
barmanObjectStore:
# pro-tip: this needs to be a different path, otherwise it's very unhappy at you.
destinationPath: "s3://bearcove-cnpg-backups/forgejo/brat-2025-03-18/"
wal:
compression: snappy
data:
compression: snappy
s3Credentials:
accessKeyId:
name: s3-credentials
key: ACCESS_KEY
secretAccessKey:
name: s3-credentials
key: SECRET_KEY
retentionPolicy: "30d"
I too recoil in horror too when looking at this amount of YAML but… it does the trick!
It’s also really annoying to read the cnpg 1.25 docs — I’m sure MkDocs is much better than nothing, I just think we needn’t stop there.
Anyway — this queues up a restore job (you can view those in k9s with :jobs
), you can
keep an eye on its logs from k9s as well, eventually, the data’s back.
Persistent volume restore
This one, believe it or not, was a lot more trouble?
Kubernetes really wants you to think of storage, and compute, and memory, as fungible resources. Say how much you want, not where.
Well, you can say where, but it’s annoying.
For example, k8s provides a local-path
storage class:
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: forgejo
namespace: forgejo
annotations:
k8up.io/backup: "true"
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 20Gi
And what it does is provision a volume under /var/lib/rancher/k3s/storage
— as a regular
folder.
root@brat /var/lib/rancher/k3s/storage # ls -lhA
total 20K
drwxrwsrwx 3 root 1001 4.0K Mar 18 14:08 pvc-02c60622-d5c9-4dc5-8ea8-22a51eee0b83_minio_minio
drwxrwsrwx 3 root tape 4.0K Mar 28 11:33 pvc-1d42f27a-b955-49ec-9f10-42ed222602f4_umami_umami-db-pg17-1
drwxrwsrwx 3 root tape 4.0K Mar 18 14:32 pvc-2a1953cf-8867-4980-8b59-56ff2ed6411c_forgejo_forgejo-db-1
drwxrwsrwx 4 root amos 4.0K Mar 18 15:31 pvc-a79bde12-25c0-40b9-98c3-ab16a6d12afa_forgejo_forgejo
drwxrwsrwx 3 root tape 4.0K Mar 28 11:49 pvc-bfa94fbd-4105-4860-aa19-06b95d7dd573_forgejo_forgejo-db-pg17-1
This is “odd” merely because the common case is to do “managed” k8s — you pay someone to run your cluster. You ask for compute and they bill you very precisely for what you use. And they have volumes, of course — so when you ask for a volume, it’s created using their volume provider.
But when you run k3s on a bunch of VMs, you don’t have that. And you can’t attach, for example, a DigitalOcean volume to a Hetzner VM — doesn’t work.
So either you roll with something like longhorn or ceph (probably via rook), for which you really honestly want at least 3 (preferably 5) dedicated server that have similar specs, in the same datacenter, etc. — or, you just use local storage because, well, it suits your usecase perfectly.
Problem is — the persistent volume claim doesn’t get “provisioned” (not sure what the proper term here is) until it’s actually needed by something — like a pod.
And if the first pod that needs it is a k8up restore pod (which runs restic
), and
that restore pod just happens to run on one of your small edge nodes rather than
your big and round dedicated server, then…
Oh gosh — the volume ends up in the wrong place?
Correctamundo! Now you see why I’m used to deleting and re-creating stuff.
It’s also why, when I started redeploying everything, I started splitting it into files starting with three digits, to know in which order things should happen:
infra/manifests/forgejo on main [$]
❯ l
Permissions Size User Date Modified Name
.rw-r--r--@ 115 amos 18 Mar 18:27 000-forgejo-namespace.yaml
.rw-r--r--@ 4.8k amos 28 Mar 13:27 001-forgejo-config-secret.yaml
.rw-r--r--@ 4.2k amos 28 Mar 16:40 100-forgejo-db-cluster.yaml
.rw-r--r--@ 643 amos 28 Mar 13:27 101-forgejo-db-backups.yaml
.rw-r--r--@ 927 amos 18 Mar 18:27 200-forgejo-persistent-volumes.yaml
.rw-r--r--@ 161 amos 18 Mar 18:27 201-forgejo-backup-secrets.yaml
.rw-r--r--@ 1.4k amos 18 Mar 18:27 202-forgejo-backups.yaml
.rw-r--r--@ 1.2k amos 18 Mar 18:27 203-forgejo-backup-schedule.yaml
.rw-r--r--@ 1.5k amos 23 Mar 16:57 300-forgejo-deployment.yaml
.rw-r--r--@ 1.2k amos 23 Mar 16:57 400-forgejo-ingress.yaml
.rw-r--r-- 33 amos 18 Mar 17:14 README.md
k3s doesn’t actually interpret that ordering, I simply staged them under old-manifests/
and manually
copied them into manifests/
, applying them one by one.
Okay, but — what about the volume on the wrong node?
Right, well! We just force the volume to be provisioned where we want, by creating a dummy pod!
---
## This forces the volume to be created on node 'brat'
apiVersion: v1
kind: Pod
metadata:
name: forgejo-ls2
namespace: forgejo
spec:
restartPolicy: Never
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
containers:
- name: ls-container
image: busybox
command: ["ls", "-lhA", "/workdir"]
volumeMounts:
- name: forgejo-workdir
mountPath: /workdir
nodeSelector:
kubernetes.io/hostname: brat
volumes:
- name: forgejo-workdir
persistentVolumeClaim:
claimName: forgejo
Is this the stupidest shit ever? Probably. Does it work? Hell yeah.
As for actually restoring from k8up, that was hard too.
For some reason, I use a rootless image of forgejo, and that means that, well, inside the container, we’re not root:
~
❯ kubectl exec forgejo-f9dd988c4-9svkx -n forgejo -it -- /bin/bash
forgejo-f9dd988c4-9svkx:/var/lib/gitea$ whoami
git
forgejo-f9dd988c4-9svkx:/var/lib/gitea$ id
uid=1000(git) gid=1000(git) groups=1000(git)
forgejo-f9dd988c4-9svkx:/var/lib/gitea$
exit
We’re user git
, with UID 1000.
So, if we do a normal k8up restore, we’ll end up with a bunch of files owned by root,
which git
won’t be able to read, and that’s no good!
Now, you are able to specify a podSecurityContext
so that the restore pod (the one
that runs restic) also runs as the same user:
apiVersion: k8up.io/v1
kind: Restore
# ✂️
spec:
podSecurityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
fsGroupChangePolicy: "OnRootMismatch"
# ✂️
But then — amazingly — it fails to write to its own cache dir!!!
The way restic works, from what I can gather, is that it starts by making a whole bunch of S3 GET commands to figure out which backups were made, what do they contain, etc.
It stores all those in its local cache, on disk, because it refers back to that data a lot during the restore process.
If the cache isn’t writable, it still works, it’s just… very, very, VERY slow.
And I’m assuming, costly?
I think I would’ve died of old age first. Anyway, if you set RESTIC_CACHE_DIR
to a path
you can write from a non-root user, then it works fine. Oh, also if you give it a bunch
more CPU and memory than the default:
---
##############################################################################
# k8up backup/restore setup
##############################################################################
apiVersion: v1
kind: ConfigMap
metadata:
name: restic-vars
namespace: forgejo
data:
RESTIC_CACHE_DIR: /tmp/restic-cache
---
apiVersion: k8up.io/v1
kind: Restore
metadata:
name: restore-workdir-2024-03-18-b
namespace: forgejo
spec:
snapshot: e9e9a75d # last `/data/forgejo` backup
podSecurityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
fsGroupChangePolicy: "OnRootMismatch"
restoreMethod:
folder:
# restore to PVC forgejo
claimName: forgejo
backend:
repoPasswordSecretRef:
name: backup-repo
key: password
resources:
requests:
cpu: 10
memory: 1Gi
limits:
cpu: 10
memory: 4Gi
envFrom:
- configMapRef:
name: restic-vars
s3:
bucket: bearcove-k8up-backups
endpoint: https://s3.eu-central-1.amazonaws.com
accessKeyIDSecretRef:
name: s3-credentials
key: ACCESS_KEY
secretAccessKeySecretRef:
name: s3-credentials
key: SECRET_KEY
Deployment, service, ingress
I’m not going to go into details for all these, because we’ve spent our quota of YAML for the year and it’s only March.
I guess I can show one combo for forgejo, since it’s not that complicated.
---
kind: Service
apiVersion: v1
metadata:
name: forgejo
namespace: forgejo
spec:
selector:
app: forgejo
ipFamilyPolicy: RequireDualStack
ports:
- protocol: TCP
port: 80
targetPort: http
This is the service: it makes it so anyone in the same namespace is able to
connect to forgejo.forgejo.svc.cluster.local:80
.
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: tls
namespace: forgejo
spec:
secretName: tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames: [redacted.example.org]
This requests a TLS certificate for my forgejo instance’s domain
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: request-body-limit
namespace: forgejo
spec:
buffering:
maxRequestBodyBytes: 1073741824 # 1 GiB
memRequestBodyBytes: 67108864 # 64 MiB
This raises the max POST limit, since apparently that’s an issue when uploading “generic” packages to a forgejo registry 🤷
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: forgejo
namespace: forgejo
spec:
entryPoints:
- websecure
routes:
- match: Host(`redacted.example.org`)
kind: Rule
services:
- name: forgejo
port: 80
middlewares:
- name: request-body-limit
namespace: forgejo
tls:
secretName: tls-secret
And this is the ingress route!
Mhh this feels a little verbose, and a little ingress-specific, even for k8s?
Yeah — you can do shorter with labels (but I could never figure it out), and most people actually use nginx for ingress, but of C and Go, I’ll take the lesser evil.
Bringing home back
The software that powers my website, currently named home
, is also deployed through
k3s.
It doesn’t have a persistent database: assets (images, videos, etc.) are in object storage, there’s a central “mom” service that has one sqlite database per tenant — that database remembers which assets were already uploaded to object storage, which revision is the latest, and the list of sponsors fetched from GitHub/Patreon.
That’s about it:
################################################################################
# 🐻 MOM DEPLOYMENT
################################################################################
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: mom
namespace: home
labels:
group: home
app: mom
spec:
replicas: 1
selector:
matchLabels:
app: mom
template:
metadata:
labels:
group: home
app: mom
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- dedicated
imagePullSecrets:
- name: forgejo-docker-pull-secrets
containers:
- name: mom
image: redacted.example.org/bearcove/home:32.2.4
command: ["home", "mom"]
workingDir: /var/lib/home
envFrom:
- secretRef:
name: home-vars
- secretRef:
name: home-conf
env:
- name: RUST_LOG
value: "info"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
ports:
- containerPort: 1118
name: http
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 1
periodSeconds: 1
resources:
requests:
memory: "400Mi"
cpu: "0.5"
limits:
memory: "16000Mi"
cpu: "20"
volumeMounts:
- name: mom
mountPath: /var/lib/home
- name: metadata
mountPath: /metadata
readOnly: true
volumes:
- name: mom
persistentVolumeClaim:
claimName: mom
- name: metadata
hostPath:
path: /metadata
type: DirectoryOrCreate
As for edge nodes, they’re named “cubs”, and the deployment manifest is fun — it deploys one pod per edge node, while allowing from some pods to be down, and, while that’s the case, routing to the closest node.
################################################################################
# 🧸 CUB DEPLOYMENT
################################################################################
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: cub
namespace: home
labels:
group: home
app: cub
spec:
replicas: 6 # 1 dedicated + 5 edge nodes
topologySpreadConstraints:
# Define topology spread constraints for the deployment
- maxSkew: 1 # defines the maximum skew between the number of pods in different topology domains
# Use zone as the topology key
topologyKey: topology.kubernetes.io/zone
# Allow scheduling even if constraints are not met
whenUnsatisfiable: ScheduleAnyway
# Specify the label selector for the pods
labelSelector:
matchLabels:
# Match pods with the label app: cub
app: cub
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: node-type
operator: In
values:
- cloud
topologyKey: "kubernetes.io/hostname"
selector:
matchLabels:
app: cub
template:
metadata:
labels:
app: cub
group: home
spec:
imagePullSecrets:
- name: forgejo-docker-pull-secrets
containers:
- name: cub
image: redacted.example.org/bearcove/home:32.2.4
command: ["home", "serve"]
envFrom:
- secretRef:
name: home-vars
- secretRef:
name: home-conf
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
ports:
- containerPort: 1111
name: http
# define readiness probe: must serve HTTP on port 1111
readinessProbe:
httpGet:
path: /
port: http
httpHeaders:
- name: x-forwarded-host
value: fasterthanli.me
initialDelaySeconds: 1
periodSeconds: 1
resources:
requests:
memory: "400Mi"
cpu: "0.5"
limits:
memory: "1200Mi"
cpu: "12"
volumeMounts:
- name: cub
mountPath: /var/lib/home
- name: metadata
mountPath: /metadata
readOnly: true
volumes:
- name: cub
persistentVolumeClaim:
claimName: cub
- name: metadata
hostPath:
path: /metadata
type: DirectoryOrCreate
Zero-downtime deploys
This is nice, because it allows zero-downtime deploys.
When everything is fine, here’s what the home
namespace looks like:
All the cub pods are healthy:
Now let’s say I cause trouble on purpose, by pushing a change that crashes home on purpose:
home on main via 🦀 v1.85.1
❯ g show
commit 984d0f5b7fa5cff9172ea1bc16091b1fbc6bed6a (HEAD -> main, origin/main, origin/HEAD)
Author: Amos Wenger <amos@bearcove.eu>
Date: Fri Mar 28 17:35:46 2025 +0100
crash on purpose
diff --git a/crates/home/src/main.rs b/crates/home/src/main.rs
index 119e3e91..959fa414 100644
--- a/crates/home/src/main.rs
+++ b/crates/home/src/main.rs
@@ -30,6 +30,8 @@ async fn real_main() -> eyre::Result<()> {
errhandling::load().install();
tracingsub::load().install();
+ panic!("woopsie doopsie I'm doing a demo");
+
let args = clap::load().parse();
let res = match args.sub {
Ah, uh, no:
Okay, FINE, let’s push a change that… only crashes on serve, and only if we’re in production.
home on main [!] via 🦀 v1.85.1
❯ gwd
crates/mod-cub/src/lib.rs --- Rust
8 #[dylo::export]
9 impl Mod for ModImpl {
10 fn serve(&self, config: Config) -> BoxFuture<'static, Result<()>> {
11 Box::pin(async {
12 if std::env::var("KUBERNETES_SERVICE_HOST").is_ok() {
13 panic!("is this a container? LET ME OUT! LEMMEOUT");
14 }
15
16 impls::serve(config)
17 .await
18 .map_err(|e| noteyre::eyre!("{}", e))
This time the pipelines pass, I get a chance to show some of the automation I’ve been doing recently:
home on main via 🦀 v1.85.1
❯ bd bump
Fetched all tags from remote.
Latest tag: v32.2.4
Choose version bump type:
1. Patch (32.2.5)
2. Minor (32.3.0)
3. Major (33.0.0)
1
Creating new tag: v32.2.5
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
To https://redacted.example.org/bearcove/home
* [new tag] v32.2.5 -> v32.2.5
Tag v32.2.5 created and pushed successfully
bd
stands for beardist
, by the way!
Those -build
jobs pushed a generic package:
…and some homebrew tap was updated automatically (that’s what the trigger-formula-update
job was about)
Along with a container image:
That one isn’t deployed automatically — I like to be the one to give the greenlight, also via beardist.
The k8s
subcommand first identifies where the given image is referenced:
infra on main [$] via 🦀 v1.85.0
❯ bd k8s bearcove/home
Searching for manifests in: manifests
YAML files containing 'bearcove/home' are:
File: manifests/home/301-home-cub-deployment.yaml
Version 32.2.4 at positions 1761 to 1808
Context:
>>> - name: cub
>>> image: redacted.example.org/bearcove/home:32.2.4
>>> command: ["home", "serve"]
>>> envFrom:
>>> - secretRef:
File: manifests/home/300-home-mom-deployment.yaml
Version 32.2.4 at positions 1074 to 1121
Context:
>>> - name: mom
>>> image: redacted.example.org/bearcove/home:32.2.4
>>> command: ["home", "mom"]
>>> workingDir: /var/lib/home
>>> envFrom:
And then, it repeatedly polls the forgejo instance, waiting for a ‘different’ version to pop up:
Initializing Forgejo client...
Checking for new versions...
Fetching latest version for package 'home' from 'https://redacted.example.org/api/v1/packages/bearcove'
Request completed in 385ms with status 200 OK
Received 13 packages in response
Filtered to 12 matching packages
Found 11 valid versions
Latest version found: 32.2.5
New version detected: 32.2.5
At which point, it updates the manifests:
Updating manifests...
Updated manifests/home/301-home-cub-deployment.yaml
Updated manifests/home/300-home-mom-deployment.yaml
Staging changes...
Showing staged changes:
manifests/home/300-home-mom-deployment.yaml --- YAML
33 imagePullSecrets:
34 - name: forgejo-docker-pull-secrets
35 containers:
36 - name: mom
37 image: redacted.example.org/bearcove/home:32.2.4
37 image: redacted.example.org/bearcove/home:32.2.5
38 command: ["home", "mom"]
39 workingDir: /var/lib/home
40 envFrom:
41 - secretRef:
manifests/home/301-home-cub-deployment.yaml --- YAML
48 imagePullSecrets:
49 - name: forgejo-docker-pull-secrets
50 containers:
51 - name: cub
52 image: redacted.example.org/bearcove/home:32.2.4
52 image: redacted.example.org/bearcove/home:32.2.5
53 command: ["home", "serve"]
54 envFrom:
55 - secretRef:
56 name: home-vars
Commits and pushes:
Committing changes...
> bearcove-infra@1.0.0 lint-staged
> lint-staged
→ No staged files match any configured task.
[main d37d8eb] bump bearcove/home to 32.2.5
2 files changed, 2 insertions(+), 2 deletions(-)
Pushing changes...
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.17 KiB | 1.17 MiB/s, done.
Total 6 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 4 local objects.
To https://github.com/bearcove/infra.git
89bddc6..d37d8eb main -> main
And calls ./deploy-manifests
, which learned a couple new tricks since last time:
Deploying manifests...
🔍 Performing dry run...
Source: ./manifests/
Destination: root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/
==================================================
🚨 REVIEW THIS CAREFULLY 🚨
==================================================
The following changes will be made:
==================================================
<fc.T.... home/300-home-mom-deployment.yaml
<fc.T.... home/301-home-cub-deployment.yaml
==================================================
Please review the above changes carefully before proceeding.
==================================================
🚨 Warning: This will perform the changes above.
Are you sure you want to continue? (y/n) y
🔧 Performing operations...
📁 Creating skip file: /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip
🔄 Syncing files...
📜 Viewing logs...
✂️
building file list ...
60 files to consider
home/300-home-mom-deployment.yaml
2.92K 100% 2.12MB/s 0:00:00 (xfer#1, to-check=29/60)
home/301-home-cub-deployment.yaml
3.71K 100% 3.53MB/s 0:00:00 (xfer#2, to-check=28/60)
sent 4.41K bytes received 130 bytes 3.02K bytes/sec
total size is 1.07M speedup is 236.91
By “learned a couple new tricks”, I mostly mean “I figured out the right set of flags to tell rsync to show what it’s actually going to do”:
# in `deploy-manifests`
#!/bin/bash -euo pipefail
SRC="./manifests/"
DST="root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/"
# Define rsync flags
RSYNC_FLAGS=(--recursive --delete --checksum --human-readable --progress --include='*/' --include='*.yaml' --exclude='*')
# Perform a dry run of rsync to show what would happen
echo "🔍 Performing dry run..."
printf "Source: \033[33m%s\033[0m\n" "$SRC"
printf "Destination: \033[33m%s\033[0m\n" "$DST"
printf "\033[2m==================================================\033[0m\n"
printf "\033[2m🚨 REVIEW THIS CAREFULLY 🚨\033[0m\n"
printf "\033[2m==================================================\033[0m\n"
printf "\033[2mThe following changes will be made:\033[0m\n"
printf "\033[2m==================================================\033[0m\n"
rsync "${RSYNC_FLAGS[@]}" --dry-run --itemize-changes "$SRC" "$DST"
printf "\033[2m==================================================\033[0m\n"
printf "\033[2mPlease review the above changes carefully before proceeding.\033[0m\n"
printf "\033[2m==================================================\033[0m\n"
# Ask for consent before continuing
printf "\n🚨 \033[1;31mWarning:\033[0m This will perform the changes above.\n"
read -p "Are you sure you want to continue? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]
then
printf "❌ \033[1;31mOperation cancelled.\033[0m\n"
exit 1
fi
# The actual operations
echo "🔧 Performing operations..."
printf "📁 Creating skip file: \033[33m/var/lib/rancher/k3s/server/manifests/traefik.yaml.skip\033[0m\n"
ssh root@brat "touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip" &
echo "🔄 Syncing files..."
rsync "${RSYNC_FLAGS[@]}" "$SRC" "$DST" &
echo "📜 Viewing logs..."
ssh -t root@brat "journalctl -fxu k3s | ccze -A"
Anyway, after that’s done, and after the k3s reconciler wakes up, we’re looking at this:
More precisely (one Enter
key later), this:
Two more Enter
show us the logs:
But my site is still up!
At this point, I can either roll back:
infra on main [$] via 🦀 v1.85.0
❯ g revert d37d8eb5fdf101dfb70427a845fee8924c058dad
[main d9602ff] Revert "bump bearcove/home to 32.2.5"
2 files changed, 2 insertions(+), 2 deletions(-)
infra on main [$⇡] via 🦀 v1.85.0
❯ gp
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.23 KiB | 1.23 MiB/s, done.
Total 6 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 4 local objects.
To https://github.com/bearcove/infra.git
35d0771..d9602ff main -> main
infra on main [$] via 🦀 v1.85.0
❯ ./deploy-manifests
🔍 Performing dry run...
Source: ./manifests/
Destination: root@brat.bearcove.cloud:/var/lib/rancher/k3s/server/manifests/custom/
==================================================
🚨 REVIEW THIS CAREFULLY 🚨
==================================================
The following changes will be made:
==================================================
<fc.T.... home/300-home-mom-deployment.yaml
<fc.T.... home/301-home-cub-deployment.yaml
==================================================
Please review the above changes carefully before proceeding.
==================================================
🚨 Warning: This will perform the changes above.
Are you sure you want to continue? (y/n) y
🔧 Performing operations...
📁 Creating skip file: /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip
🔄 Syncing files...
📜 Viewing logs...
Mar 28 18:21:43 brat k3s[3054412]: I0328 18:21:43.178939 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
Mar 28 18:21:43 brat k3s[3054412]: E0328 18:21:43.179199 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08"
Mar 28 18:21:46 brat k3s[3054412]: I0328 18:21:46.188632 3054412 range_allocator.go:247] "Successfully synced" key="brat"
Mar 28 18:21:57 brat k3s[3054412]: I0328 18:21:57.179142 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
Mar 28 18:21:57 brat k3s[3054412]: E0328 18:21:57.179508 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08"
Mar 28 18:22:09 brat k3s[3054412]: I0328 18:22:09.179891 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
Mar 28 18:22:09 brat k3s[3054412]: E0328 18:22:09.180186 3054412 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cub\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cub pod=cub-764bfbdfd-pmst4_home(d8884c82-0983-476a-ae97-ef6a89de2e08)\"" pod="home/cub-764bfbdfd-pmst4" podUID="d8884c82-0983-476a-ae97-ef6a89de2e08"
Mar 28 18:22:21 brat k3s[3054412]: I0328 18:22:21.178435 3054412 scope.go:117] "RemoveContainer" containerID="125d2b6f93b70a5d2ff61c941c1e44128d6d847e82b7f526a8a6609caf970e11"
k3siles to consider 0 files...
home/300-home-mom-deployment.yaml
2.92K 100% 2.12MB/s 0:00:00 (xfer#1, to-check=29/60)
home/301-home-cub-deployment.yaml
3.71K 100% 3.53MB/s 0:00:00 (xfer#2, to-check=28/60)
sent 4.41K bytes received 130 bytes 3.02K bytes/sec
total size is 1.07M speedup is 236.91
Or I could just push a fix, and roll forward!
Which I did, right after writing this.
Closing words
So, how did this disaster recovery go? Pretty well, all in all.
I didn’t need to tear down and rebuild from scratch my control node.
Nor did I need to reorganize all my manifests carefully. But I was already down, and it’s been a shit month, so, I decided to treat myself.
The deploy-manifests
script, even in its latest incarnation, still blows.
A better version would merge all resources in a single YAML file, and compare
against what the control node already has, requiring --fuck-me-up
if more than
2 resources are deleted, for example.
But… that would be making a faster horse, right? I probably do want a proper continuous deployment (CD) solution. Something that can do progressive rollouts, automated rollbacks, etc.
For now, I’ll just be very, very careful, and be proud that I didn’t actually lose any data!
I was a bit scared to store repositories only in forgejo (without mirroring them to GitHub) at first, but this isn’t my first disaster and I’ve recovered pretty well each time.
Oh, and in the end, did you end up using taiga for Kanban?
Nope! I’m happily on Teamhood’s free tier, which does everything I need for now.
Thanks for reading this far, and I hope to talk more about beardist on the next season of the self-directed research podcast!
I’ll leave you with a somewhat popular episode of it:
Here's another article just for you:
Rust generics vs Java generics
In my previous article, I said I needed to stop thinking of Rust generics as Java generics, because in Rust, generic types are erased.
Someone gently pointed out that they are also erased in Java, the difference was elsewhere. And so, let’s learn the difference together.
Java generics
I learned Java first (a long, long time ago), and their approach to generics made sense to me at the time.