More devops than I bargained for

Apr 07, 2025 21 min #kubernetes · #calico · #ipv6

Background

I recently had a bit of impromptu disaster recovery, and it gave me a hunger for more! More downtime! More kubernetes manifest! More DNS! Ahhhh!

The plan was really simple. I love dedicated Hetzner servers with all my heart but they are not very fungible.

You have to wait entire minutes for a new dedicated server to be provisioned. Sometimes you pay a setup fee, et cetera. And at some point to server static websites and serve as a K3S server, it’s simply just too big, and approximately twice the price that I should pay.

I have gotten nervous about the world economy — Amos wrote on April 7th, as the American and Japanese stock markets just crashed — but it’s also a fun optimization problem. How much money do I actually need to spend on my infrastructure to get it to perform the way I want it to?

So I decided to move from an x86_64 dedicated server with 32 gigs of RAM and 16 cores, which cost me about 41 euros per month, to an aarch64 instance with 8 Ampere cores, 16 gigs of RAM, which costs 12 euros a month!

See, it’s not a significant saving, but it’s the first in my fleet of servers that is arm64 — And I figured, well, I recently set up continuous integration and continuous delivery for my CMS software so that it will build and ship x86_64-unknown-linux-gnu and aarch64-apple-darwin binaries to as Forgejo generic packages and a private Homebrew tap so… what’s one more target?

Right?

Ha ha ha.

Has anyone ever built it for arm64 linux before?

For most things, the answer is yes.

On the “main” / “control” / “k3s server” node, I run services like:

well, k3s itself, obvs
cert-manager
traefik v3 (and I get HTTP/3)
a full prometheus stack, including grafana
a couple postgres clusters
umami for analytics

All of those are either ubiquitous or written in Go, which has excellent tooling for cross compilation, which means they’ve had ARM64 images forever.

A few of my Dockerfile(s) downloaded binaries for stuff like regclient, an ffmpeg static build, etc. — a simple “make this work for arm64 too” prompt to Claude 3.5 Sonnet was enough to add the requisite bashisms:

# Download the archive
echo -e "\033[1;34m📥 Downloading home-drawio \033[1;33m${HOME_DRAWIO_VERSION}\033[0m for \033[1;36m${ARCH_NAME}\033[0m..."

# Map platform architecture to package architecture string
if [ "${ARCH_NAME}" == "amd64" ]; then
    PKG_ARCH="x86_64-unknown-linux-gnu"
elif [ "${ARCH_NAME}" == "arm64" ]; then
    PKG_ARCH="aarch64-unknown-linux-gnu"
else
    echo -e "\033[1;31m❌ Error: Unsupported architecture: ${ARCH_NAME}\033[0m" >&2
    exit 1
fi

curl --fail --location --retry 3 --retry-delay 5 -H "Authorization: token ${FORGEJO_READWRITE_TOKEN}" \
    "https://code.bearcove.cloud/api/packages/bearcove/generic/home-drawio/${HOME_DRAWIO_VERSION}/${PKG_ARCH}.tar.xz" \
    -o "${TEMP_DIR}/home-drawio.tar.xz"

I like to request colors to make the log output more readable to me and emojis which also help with readability. I ask LLMs to generate tools that always show a plan for what they’re going to do first, ask the user for consent, report progress while doing it, and print a summary of actions takens and errors encountered at the end.

I have used them with great success with “devops”: there are a few pieces you need to be really solid, but the rest is all glue. I typically prototype in bash or TypeScript and then port it to rust if I need it to run fast or be more correct.

Didn’t LLMs lead you astray last time?

Babe, I mean bear, they lead me astray _every time. But I’m the one driving.

Fair enough — cool bear said, uttering words Amos had written.

I had forgotten how many moving parts were involved in my own software?

Most native dependencies are just an APT install away, since I use Debian 12 as a base image, and the Debian project has done the hard work of packaging just about everything.

I think the only thing I built from source is libdav1d, so that it’s recent enough?

home-drawio is one of my custom components: it’s a binary that’s able to convert draw.io diagrams to SVG. I used to shell out to node.js instead, but decided I didn’t like it, so now it’s bundled with bun as bytecode:

build:
    #!/bin/bash -eu
    echo -e "\033[1;34m🚀 Starting build process...\033[0m"
    echo -e "\033[1;33m📦 Installing dependencies...\033[0m"
    pnpm install
    echo -e "\033[1;35m🗑️  Cleaning up dist directory...\033[0m"
    rm -rf dist
    mkdir -p dist

    echo -e "\033[1;36m📜 Logging build information...\033[0m"
    echo "Build started at: $(date)"
    echo "Bun version: $(bun --version)"

    echo -e "\033[1;36m📊 Listing contents of dist directory...\033[0m"
    ls -lhA dist

    echo -e "\033[1;32m🏗️  Building project...\033[0m"
    DEBUG='*' bun build --compile src/index.js --bytecode --outfile dist/home-drawio

    echo -e "\033[1;33m🚀 Running the native executable...\033[0m"
    DRAWIO_DEBUG=1 dist/home-drawio convert ./sample.drawio
    echo -e "\033[1;32m✅ Build process completed successfully!\033[0m"
    echo -e "\033[1;36m📊 Checking binary size...\033[0m"
    ls -lh dist/home-drawio
    file dist/home-drawio
    echo -e "\033[1;36m📊 Checking xz compressed binary size...\033[0m"
    xz -2 -T0 -c dist/home-drawio > dist/home-drawio.xz
    ls -lh dist/home-drawio.xz
    rm dist/home-drawio.xz
    echo -e "\033[1;36m🔍 Checking binary dependencies...\033[0m"
    if [[ "$OSTYPE" == "darwin"* ]]; then
        otool -L dist/home-drawio
    elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
        ldd dist/home-drawio
    else
        echo "Unsupported operating system for dependency check."
    fi
    rm -f .*.bun-build

This is a Justfile, for the just task runner, which replaces make for me, since I already have one (or three) build systems.

Its output is pleasing:



amos in 🌐 souffle in home-drawio on  main
❯ just build
🚀 Starting build process...
📦 Installing dependencies...
Lockfile is up to date, resolution step is skipped
Already up to date
Done in 213ms using pnpm v10.7.1
🗑️  Cleaning up dist directory...
📜 Logging build information...
Build started at: Mon Apr  7 13:18:32 CEST 2025
Bun version: 1.2.8
📊 Listing contents of dist directory...
total 0
🏗️  Building project...
 [571ms]  bundle  700 modules
 [143ms] compile  dist/home-drawio
🚀 Running the native executable...
[2025-04-07T11:18:34.317Z] Parse XML took 0.31ms
[2025-04-07T11:18:34.318Z] Select diagram took 1.11ms
[2025-04-07T11:18:34.318Z] Base64 decode took 0.08ms
[2025-04-07T11:18:34.319Z] Inflate took 1.16ms
[2025-04-07T11:18:34.320Z] URI decode took 0.03ms
[2025-04-07T11:18:34.320Z] Decompressed diagram, set DRAWIO_VERBOSE=1 to see it
[2025-04-07T11:18:34.321Z] XML parse took 1.30ms
[2025-04-07T11:18:34.359Z] Get SVG took 11.74ms
[2025-04-07T11:18:34.362Z] Get XML took 2.90ms
<svg xmlns="http:// ✂️
✅ Build process completed successfully!
📊 Checking binary size...
-rwxrwxrwx@ 1 amos  staff    95M Apr  7 13:18 dist/home-drawio
dist/home-drawio: Mach-O 64-bit executable arm64
📊 Checking xz compressed binary size...
-rw-r--r--@ 1 amos  staff    20M Apr  7 13:18 dist/home-drawio.xz
🔍 Checking binary dependencies...
dist/home-drawio:
        /usr/lib/libicucore.A.dylib (compatibility version 1.0.0, current version 74.1.0)
        /usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1700.255.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.100.2)

Multi-arch container images

One problem I ran into pretty early is that I had no idea how to make and push a container image that works for multiple architectures.

Up until now, I’d always been building images, and pushing them immediately with tags like:

code.bearcove.cloud/bearcove/beardist:latest
code.bearcove.cloud/bearcove/home:33.0.0

As far as I can tell, the way to go is to pick a convention for arch-specific tags, like:

code.bearcove.cloud/bearcove/beardist:latest-arm64
code.bearcove.cloud/bearcove/beardist:latest-amd64

The fact arm64 and amd64 look so close from afar is a disgrace, btw.

And then create a multi-arch manifest that is the thing pushed under :latest.

If you’re using docker to build images, then you can do something like this:

echo -e "\033[1;31m🗑️  Removing existing manifest: \033[0;32m{{BASE}}/$target:latest\033[0m" && \
docker manifest rm "{{BASE}}/$target:latest" || true && \
echo -e "\033[1;36m📝 Creating manifest: \033[0;32m{{BASE}}/$target:latest\033[0m" && \
docker manifest create "{{BASE}}/$target:latest" \
    $(for platform in $PLATFORMS; do \
        arch=$(echo $platform | cut -d/ -f2); \
        echo "{{BASE}}/$target:latest-$arch"; \
    done) && \
echo -e "\033[1;32m📤 Pushing manifest: \033[0;32m{{BASE}}/$target:latest\033[0m" && \
docker manifest push "{{BASE}}/$target:latest"; \
echo -e "\033[1;32m✅ Completed \033[1;33m$target\033[1;32m successfully!\033[0m"; \

If you’re using docker buildx, then it can do multi-arch builds for you! But that is not supported by OrbStack, or at least, I couldn’t get it working.

However, that’s irrelevant to me, because, most of my Dockerfiles are just there to declare dependencies — I don’t actually build inside of them.

Base images + regctl

See, it’s annoying to have access to a docker daemon in CI. Really, I’m a grown up: I can take on the risk of making the build environment and runtime environment match — I just want to copy my binary into a base image I know and control.

So… I have this repack.sh script:

#!/usr/bin/env bash
set -euo pipefail

# This script creates a container image for beardist
# The process involves:
#  1. Creating an OCI layout directory for beardist
#  2. Adding beardist to the image
#  3. Pushing the final image with proper tags

if [ "${IMAGE_PLATFORM}" != "linux/arm64" ] && [ "${IMAGE_PLATFORM}" != "linux/amd64" ]; then
    echo -e "\033[1;31m❌ Error: IMAGE_PLATFORM must be set to linux/arm64 or linux/amd64\033[0m" >&2
    exit 1
fi

ARCH_NAME=$(echo "${IMAGE_PLATFORM}" | cut -d'/' -f2)

# Check if we're on a tag and get the version
TAG_VERSION=""
if [[ "${GITHUB_REF:-}" == refs/tags/* ]]; then
  TAG_VERSION="${GITHUB_REF#refs/tags/}"
  # Remove 'v' prefix if present
  TAG_VERSION="${TAG_VERSION#v}"
  echo -e "\033[1;33m📦 Detected tag: ${TAG_VERSION}\033[0m"
fi

# Declare variables
OCI_LAYOUT_DIR="/tmp/beardist-oci-layout"
OUTPUT_DIR="/tmp/beardist-output"
IMAGE_NAME="code.bearcove.cloud/bearcove/beardist:${TAG_VERSION:+${TAG_VERSION}-}${ARCH_NAME}"
BASE_IMAGE="code.bearcove.cloud/bearcove/build:${ARCH_NAME}"

# Clean up and create layout directory
rm -rf "$OCI_LAYOUT_DIR"
mkdir -p "$OCI_LAYOUT_DIR/usr/bin"

# Copy beardist to the layout directory
echo -e "\033[1;34m📦 Copying beardist binary to layout directory\033[0m"
cp -v "$OUTPUT_DIR/beardist" "$OCI_LAYOUT_DIR/usr/bin/"

# Reset all timestamps to epoch for reproducible builds
touch -t 197001010000.00 "$OCI_LAYOUT_DIR/usr/bin/beardist"

# Create the image
echo -e "\033[1;36m🔄 Creating image from base\033[0m"
regctl image mod "$BASE_IMAGE" --create "$IMAGE_NAME" \
    --layer-add "dir=$OCI_LAYOUT_DIR"

# Push the image
echo -e "\033[1;32m🚀 Pushing image: \033[1;35m$IMAGE_NAME\033[0m"
regctl image copy "$IMAGE_NAME"{,}

# Push tagged image if we're in CI and there's a tag
if [ -n "${CI:-}" ] && [ -n "${GITHUB_REF:-}" ]; then
    if [[ "$GITHUB_REF" == refs/tags/* ]]; then
        TAG=${GITHUB_REF#refs/tags/}
        if [[ "$TAG" == v* ]]; then
            TAG=${TAG#v}
        fi
        TAGGED_IMAGE_NAME="code.bearcove.cloud/bearcove/beardist:$TAG"
        echo -e "\033[1;32m🏷️ Tagging and pushing: \033[1;35m$TAGGED_IMAGE_NAME\033[0m"
        regctl image copy "$IMAGE_NAME" "$TAGGED_IMAGE_NAME"
    fi
fi

# Test the image if not in CI
if [ -z "${CI:-}" ]; then
    echo -e "\033[1;34m🧪 Testing image locally\033[0m"
    docker pull "$IMAGE_NAME"
    docker run --rm "$IMAGE_NAME" beardist --help

    # Display image info
    echo -e "\033[1;35m📋 Image layer information:\033[0m"
    docker image inspect "$IMAGE_NAME" --format '{{.RootFS.Layers | len}} layers'
fi

The timestamp stuff is particularly load-bearing.

This is, like… not nix, but it provides a lot of the value that nix gave me — assembling docker images without docker, allowing us the nice property “if a layer didn’t change, then it can just be reused”.

The rest of the value I got from nix, and from earthly after that (Cthulhu rest its eternal soul), is “don’t rebuild if you don’t need to rebuild”, which I achieved through timelord, a simple utility that saves and restores file timestamps, unless their contents has changed.

I look forward to timelord being completely deprecated by cargo’s checksum-freshness feature, just like I look forward to replacing cargo-sweep with gc, and cargo-hakari with feature unification.

So anyway, this is the important part of the script:

regctl image mod "$BASE_IMAGE" --create "$IMAGE_NAME" \
    --layer-add "dir=$OCI_LAYOUT_DIR"

regctl comes from regclient and does not need a docker daemon present.

This adds a layer from a directory (which means it has to tar it and sha256 it — that’s basically all an OCI layer is).

Then we push it to the registry:

regctl image copy "$IMAGE_NAME"{,}

And then… then what? Then we can’t actually create a manifest because, contrary to base images, we need to build images like home (the name my CMS has this week, for those who follow along at… well, at home) from a machine with a matching infrastructure because the process is:

In a Debian 12 arm64/amd64 container
Build with beardist (which invokes cargo build, copies around dynamic libraries, does verifications, compression, uploads)
Add built binary on top of base layer of the correct arch, and push OCI image with regctl

So the architecture outside the image and inside the image must match.

beardist itself is distributed as a (multi-arch) image, and in fact, is built using itself, which means it has to be bootstrapped somehow.

And the way it’s bootstrapped is:

From a build environment matching the target env…
Run cargo install --path . in beardist/
Run BEARDIST_CACHE=/tmp/beardist beardist build
Run ./repack.sh

And voila! Now, beardist can build itself in CI, using its own docker image, which will be overwritten on every tag release.

If needed, the bootstrap can be redone, or an earlier “working” tag can simply be used. The chain hasn’t broken yet, a couple weeks in.

Once both architectures are built in CI, as two different Forgejo Actions jobs, a third job is triggered:

name: check
on:
    push:
        branches: [main]
        tags:
            - "*"
    pull_request:
        branches: [main]
jobs:
    mac-build:
        runs-on: mac-arm
        env:
            BEARDIST_CACHE_DIR: /Users/filler/beardist-cache
            BEARDIST_ARTIFACT_NAME: aarch64-apple-darwin
            FORGEJO_READWRITE_TOKEN: ${{ secrets.FORGEJO_READWRITE_TOKEN }}
            CLICOLOR: 1
            CLICOLOR_FORCE: 1
        steps:
            - name: Check out repository code
              uses: actions/checkout@v4
            - name: Build
              shell: bash
              run: |
                  beardist build
                  if [[ "${{ github.ref }}" == refs/tags/* ]]; then
                      echo "we're building a tag! "
                      echo "installing latest beardist locally (mac runners are non-containerized)"
                      # beardist is statically-linked (not a dylo binary) so we can just
                      # copy it in-place
                      cp /tmp/beardist-output/beardist $(which beardist)
                  fi
    linux-build:
        strategy:
            matrix:
                include:
                    - runs-on: linux-arm64
                      artifact: aarch64-unknown-linux-gnu
                      platform: linux/arm64
                    - runs-on: linux-amd64
                      artifact: x86_64-unknown-linux-gnu
                      platform: linux/amd64
        runs-on: ${{ matrix.runs-on }}
        container:
            image: code.bearcove.cloud/bearcove/beardist:latest
            volumes:
                - /var/persistent-build-storage:/var/persistent-build-storage
            credentials:
                username: "token"
                password: "${{ secrets.BEARCOVE_PULL_PASSWORD }}"
        env:
            BEARDIST_CACHE_DIR: /var/persistent-build-storage/beardist-cache
            BEARDIST_ARTIFACT_NAME: ${{ matrix.artifact }}
            FORGEJO_READWRITE_TOKEN: ${{ secrets.FORGEJO_READWRITE_TOKEN }}
            CLICOLOR: 1
            CLICOLOR_FORCE: 1
            IMAGE_PLATFORM: ${{ matrix.platform }}
        steps:
            - name: Check out repository code
              uses: actions/checkout@v4
            - name: Build
              shell: bash
              run: |
                  beardist build
                  if [[ "${{ github.ref }}" == refs/tags/* ]]; then
                      regctl registry login code.bearcove.cloud -u token -p "${{ secrets.FORGEJO_READWRITE_TOKEN }}"
                      ./repack.sh
                  fi
    trigger-formula-update:
        needs: [mac-build, linux-build]
        if: startsWith(github.ref, 'refs/tags/')
        runs-on: linux-arm64
        container:
            image: code.bearcove.cloud/bearcove/beardist:latest
            credentials:
                username: "token"
                password: "${{ secrets.BEARCOVE_PULL_PASSWORD }}"
        env:
            FORGEJO_READWRITE_TOKEN: ${{ secrets.FORGEJO_READWRITE_TOKEN }}
        steps:
            - name: Check out repository code
              uses: actions/checkout@v4

            - name: Login to registry
              run: |
                  regctl registry login code.bearcove.cloud -u token -p "${{ secrets.FORGEJO_READWRITE_TOKEN }}"

            - name: Create and push multi-platform manifest
              run: |
                  ./multify.sh

            - name: Trigger formula update
              run: |
                  curl -f -X POST \
                    -H "Authorization: token $FORGEJO_READWRITE_TOKEN" \
                    -H "Accept: application/json" \
                    -H "Content-Type: application/json" \
                    -d '{"ref": "main", "inputs": {"repository": "'$GITHUB_REPOSITORY'"}}' \
                    https://code.bearcove.cloud/api/v1/repos/bearcove/tap/actions/workflows/bump.yml/dispatches

That multify.sh script is shown here, and… is a bit more manual than the previous strategy since, we don’t have docker! Only regctl.

Which doesn’t come with “manifest-building” utilities.

Luckily, it’s “just JSON”, right?

#!/usr/bin/env -S bash -euo pipefail

# Define colors
GREEN='\033[0;32m'
CYAN='\033[0;36m'
RED='\033[0;31m'
YELLOW='\033[0;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

echo -e "${CYAN}🔍 Starting multi-architecture container manifest creation...${NC}"

# Check if we're on a tag and get the version
TAG_VERSION=""
if [[ "${GITHUB_REF:-}" == refs/tags/* ]]; then
  TAG_VERSION="${GITHUB_REF#refs/tags/}"
  # Remove 'v' prefix if present
  TAG_VERSION="${TAG_VERSION#v}"
  echo -e "${YELLOW}📦 Detected tag: ${TAG_VERSION}${NC}"
fi

# Define the image tags to use
if [[ -n "$TAG_VERSION" ]]; then
  AMD64_TAG="${TAG_VERSION}-amd64"
  ARM64_TAG="${TAG_VERSION}-arm64"
  MANIFEST_TAGS=("${TAG_VERSION}" "latest")
else
  AMD64_TAG="latest-amd64"
  ARM64_TAG="latest-arm64"
  MANIFEST_TAGS=("latest")
fi

echo -e "${YELLOW}📦 Getting digests and sizes...${NC}"

echo -e "${BLUE}⬇️  Fetching AMD64 digest...${NC}"
AMD64_DIGEST=$(regctl manifest head code.bearcove.cloud/bearcove/beardist:${AMD64_TAG} --platform linux/amd64)

echo -e "${BLUE}⬇️  Fetching ARM64 digest...${NC}"
ARM64_DIGEST=$(regctl manifest head code.bearcove.cloud/bearcove/beardist:${ARM64_TAG} --platform linux/arm64)

# Check if ARM64_DIGEST is empty or not properly set
if [ -z "$ARM64_DIGEST" ]; then
  echo -e "${RED}❌ Error: Unable to get ARM64 digest. Exiting.${NC}"
  exit 1
else
  echo -e "${GREEN}✅ ARM64 digest retrieved successfully!${NC}"
fi

# Check if AMD64_DIGEST is empty or not properly set
if [ -z "$AMD64_DIGEST" ]; then
  echo -e "${RED}❌ Error: Unable to get AMD64 digest. Exiting.${NC}"
  exit 1
else
  echo -e "${GREEN}✅ AMD64 digest retrieved successfully!${NC}"
fi

echo -e "${BLUE}📏 Calculating AMD64 manifest size...${NC}"
AMD64_SIZE=$(regctl manifest get code.bearcove.cloud/bearcove/beardist:${AMD64_TAG} --platform linux/amd64 --format raw-body | wc -c)
echo -e "${GREEN}✅ AMD64 size: ${AMD64_SIZE} bytes${NC}"

echo -e "${BLUE}📏 Calculating ARM64 manifest size...${NC}"
ARM64_SIZE=$(regctl manifest get code.bearcove.cloud/bearcove/beardist:${ARM64_TAG} --platform linux/arm64 --format raw-body | wc -c)
echo -e "${GREEN}✅ ARM64 size: ${ARM64_SIZE} bytes${NC}"

echo -e "${YELLOW}📝 Creating manifest.json...${NC}"
cat <<EOF > manifest.json
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": $AMD64_SIZE,
      "digest": "$AMD64_DIGEST",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": $ARM64_SIZE,
      "digest": "$ARM64_DIGEST",
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    }
  ]
}
EOF
echo -e "${GREEN}✅ manifest.json created successfully!${NC}"

echo -e "${YELLOW}🚀 Pushing manifest.json to registry...${NC}"
for TAG in "${MANIFEST_TAGS[@]}"; do
  echo -e "${BLUE}📤 Pushing manifest for tag: ${TAG}${NC}"
  regctl manifest put \
    --content-type application/vnd.docker.distribution.manifest.list.v2+json \
    code.bearcove.cloud/bearcove/beardist:${TAG} < manifest.json
  echo -e "${GREEN}✅ Successfully pushed manifest for tag: ${TAG}${NC}"
done
echo -e "${GREEN}🎉 Multi-architecture manifest(s) successfully pushed to registry!${NC}"

This script, too, is pleasing:



amos in 🌐 souffle in beardist on  main [!] via 🦀 v1.86.0
❯ time GITHUB_REF=refs/tags/v3.8.9 ./multify.sh
🔍 Starting multi-architecture container manifest creation...
📦 Detected tag: 3.8.9
📦 Getting digests and sizes...
⬇️  Fetching AMD64 digest...
⬇️  Fetching ARM64 digest...
✅ ARM64 digest retrieved successfully!
✅ AMD64 digest retrieved successfully!
📏 Calculating AMD64 manifest size...
✅ AMD64 size:     2243 bytes
📏 Calculating ARM64 manifest size...
✅ ARM64 size:     2242 bytes
📝 Creating manifest.json...
✅ manifest.json created successfully!
🚀 Pushing manifest.json to registry...
📤 Pushing manifest for tag: 3.8.9
✅ Successfully pushed manifest for tag: 3.8.9
📤 Pushing manifest for tag: latest
✅ Successfully pushed manifest for tag: latest
🎉 Multi-architecture manifest(s) successfully pushed to registry!

________________________________________________________
Executed in    1.45 secs      fish           external
   usr time  110.06 millis    0.29 millis  109.77 millis
   sys time  110.70 millis    2.23 millis  108.47 millis

…and will probably be rewritten in Rust eventually, or collapsed into beardist, which, isn’t linked because it’s not open-source. It’s custom-made for my needs — make your own!

Here’s the generated manifest:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size":     2243,
      "digest": "sha256:b2dc52ed0fc06d10b4681405289004da8dab86776223466beb4a84a86fbc8ade",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size":     2242,
      "digest": "sha256:f79fbe3ae00713394d69970bb5f74af0d043dbf703d8b9ccb2a3f3c110cbd88d",
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    }
  ]
}

And uhh yeah, it works!

Well. It worked for beardist — and then I could have other builds operate from the beardist:latest image, no matter whether they were running on arm64 workers or amd64 workers…

…but by this point, I didn’t really have any good amd64 workers left.

I had:

Some VM I run with UTM on macOS
That 8-core arm64 machine (good enough for Rust CI builds)
5 2-core amd64 machines

And uhhhh… I tried. But after 30 minutes, the forgejo actions job timeout kicked in and.. yeah. It couldn’t build my entire website software.

Which, to be fair:



home on  main via 🦀 v1.85.1
❯ cat Cargo.lock | grep -F '[[package' | wc -l
     838

…is not surprising.

Because there is a persistent build storage, I could’ve just retried until it finally built, but… my site was still down at this point! I had preemptively migrated everything else, including postgres clusters, forgejo volumes etc. — but had left my CMS for last because, well, it’s my CMS! I know this!

At this point I realized my Mac Studio has to be on all the time, since it’s running a VM which does the macOS builds. And it has 32GB RAM… it can probably fit another x86_64 Linux VM, right?

Well, it can, but:

-6GB RAM is kinda brutal when I’m editing 4K videos
x86_64 emulation via qemu is slow, multicore emulation even moreso
USB SATA SSDs are slow (I don’t have enough internal storage for all my VMs)

I only realized that after hours of fiddling around to get IPv6 to work inside a container inside the VM inside my Mac Studio, becauuseeeee….

More like IPv5

I don’t know, okay? At some point I’ll do a deep dive, but… it was past 1AM, I don’t know, I just needed things to work.

Here’s what I think I understood. Maybe.

In Kubernetes, workloads are performed in “containers”, which are run in “pods”, which are scheduled on “nodes”.

In my setup, “nodes” are just, the Hetzner Cloud VMs:

A screenshot of the Hetzner cloud dashboard that shows 28 resources, including 6 servers. — I like the nice little map visualization. I think more cloud providers should do that.

Their API is also very fast.

And my x86_64 VM that I ran on my MacBookPro.

To k3s, they’re the same, they’re all just… nodes:



~
❯ k get nodes
NAME     STATUS   ROLES                       AGE   VERSION
domino   Ready    <none>                      15h   v1.31.6+k3s1
flam     Ready    <none>                      18h   v1.31.6+k3s1
hawk     Ready    <none>                      18h   v1.31.6+k3s1
heim     Ready    <none>                      18h   v1.31.6+k3s1
kaya     Ready    <none>                      18h   v1.31.6+k3s1
marl     Ready    <none>                      18h   v1.31.6+k3s1
styx     Ready    control-plane,etcd,master   18h   v1.31.6+k3s1

Those nodes need not have a publicly routable IPv4 or IPv6 address: they can be behind NAT (Network Address Translation), and they’ll still be able to:

reach out to the k3s server
register themselves as nodes (given the proper auth token)
and join the overlay network

Why an overlay network? Because pods have their own IP address.

And in a simple setup like this, the pod IP addresses are not publicly routable either.

In my current setup…



infra on  main [$] via 🦀 v1.85.0
❯ rg 'cidr' roles/
roles/k3s/leader/templates/config.yaml.j2
1:cluster-cidr: 10.42.0.0/16,fd00:42::/48
2:service-cidr: 10.43.0.0/16,fd00:43::/112

…neither the IPv4 or IPv6 addresses are “publicly routable” — if you send a packet to any of these set as destination to an internet router, it will chuckle and drop the packet.

The IPv4 address block is called “private address space” and the IPv6 address block is called “Unique Local Address” or ULA.

However, these are perfectly fine to use for a private overlay network like the one set up by k3s so that pods can talk to each other.

What’s the level of granularity of a pod? Like.. how many pods to an app?

To give you an example: traefik is the “ingress”, aka the HTTP reverse proxy, so it needs one pod per edge node:



fasterthanli.me on  main [$!]
❯ k get pods -n 'traefik' -o json | jq -c '.items[] | {nodeName: .spec.nodeName, podIP: .status.podIP}'           
{"nodeName":"domino","podIP":"192.168.210.3"}
{"nodeName":"styx","podIP":"49.13.119.8"}
{"nodeName":"domino","podIP":"192.168.1.100"}
{"nodeName":"heim","podIP":"157.180.27.172"}
{"nodeName":"hawk","podIP":"116.202.24.111"}
{"nodeName":"kaya","podIP":"5.223.56.87"}
{"nodeName":"marl","podIP":"5.78.90.129"}
{"nodeName":"flam","podIP":"5.161.220.244"}

But those pods are a little special — they’re using host networking.

When I point a DNS record for fasterthanli.me at one of my nodes, I need it to listen on port 80 and 443, and I need those connections to go straight to traefik — hence, the pod IP is actually the publicly routable IP of that node.



fasterthanli.me on  main [$!]
❯ ssh root@49.13.119.8 -- "ip addr show eth0 | grep --color=always -E '(inet|inet6) ([0-9a-f:.]+)'"
    inet 49.13.119.8/32 brd 49.13.119.8 scope global dynamic eth0
    inet6 2a01:4f8:c17:34b1::1/64 scope global
    inet6 fe80::9400:4ff:fe32:8ea/64 scope link

Redundant, I know!

But most pods are not special. They have an IP address that comes from the CIDR we defined earlier: that’s the case of pods in the home namespace:



fasterthanli.me on  main [$!]
❯ k get pods -n 'home' -o json | jq -c '.items[] | {nodeName: .spec.nodeName, podIP: .status.podIP}'
{"nodeName":"heim","podIP":"10.42.40.130"}
{"nodeName":"hawk","podIP":"10.42.123.3"}
{"nodeName":"marl","podIP":"10.42.71.66"}
{"nodeName":"kaya","podIP":"10.42.29.194"}
{"nodeName":"hawk","podIP":"10.42.123.2"}
{"nodeName":"heim","podIP":"10.42.40.131"}
{"nodeName":"styx","podIP":"10.42.29.2"}

These are all in 10.42.0.0/16!

From one pod, we can reach another:



fasterthanli.me on  main [$!]
❯ k exec -n home cub-dc9f5b494-bhnjr -it -- curl -H 'Host: fasterthanli.me' -I http://10.42.40.130:1111
HTTP/1.1 200 OK
content-type: text/html; charset=utf-8
cache-control: no-cache
x-source: eu-north-1.heim.cub-dc9f5b494-bhnjr
content-length: 105153
date: Mon, 07 Apr 2025 17:04:57 GMT

And that is what the overlay network is about.

But it’s not the same thing as having actual connectivity to the internet, or “egress”.

I’ll save you all the different troubleshooting steps I went through, but basically, here’s how things ended up working out: I ended up installing Calico to replace Flannel.

The first big difference is: instead of sending overlay packets as VXLAN over UDP, it actually establishes a wireguard network — traffic between nodes is now encrypted properly.

Apparently Flannel supports that too, it’s just not enabled by default.

And the second big difference is that it’s actually able to do something called NAT66.

Wait wait wait. What?

The NAT king calls

Okay, so let’s look at the simple case, right? We have a pod on a Hetzner cloud VM.

It makes an outbound request to a public IPv4 address — how is it routed?

I don’t know, let’s check traceroute?

Good instinct! Let’s do that. So we’ll create a pod with net-shooter

---
apiVersion: v1
kind: Pod
metadata:
    name: net-shooter
    labels:
        app: net-shooter
spec:
    containers:
        - name: net-shooter
          image: nicolaka/netshoot
          command:
              - sleep
              - infinity
    nodeSelector:
        provider: hcloud

Oh yeah, by the way, I changed my deploy script to not use yq or rsync and.. just use kubectl with a bunch of flags:



infra on  main [$?] via 🦀 v1.85.0
❯ ./deploy manifests/tests/
🔍 Performing dry run of kubectl apply...
pod/net-shooter created (server dry run)
❓ Do you want to apply these changes? (y/n)
y
✅ Applying changes...
pod/net-shooter created
📤 Preparing to commit and push changes...
❓ Enter a commit message:
create test pod
[main 1ae5b54] create test pod
 1 file changed, 14 insertions(+)
 create mode 100644 manifests/tests/000-ip-routing-test.yaml
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 531 bytes | 531.00 KiB/s, done.
Total 5 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/bearcove/infra.git
   f60c806..1ae5b54  main -> main
✅ Changes have been committed and pushed.

Here’s the complete deploy script if you’re interested:

#!/bin/zsh -euo pipefail

# Define color function
colorize() {
  sed -E $'s/(unchanged)/\033[1;34m\\1\033[0m/g; s/(created)/\033[1;32m\\1\033[0m/g; s/(configured)/\033[1;33m\\1\033[0m/g; s/(deleted)/\033[1;31m\\1\033[0m/g'
}

# Prepare kubectl apply arguments
kubectl_args=("-R")
if [ $# -eq 0 ]; then
  kubectl_args+=("-f" "manifests/")
else
  for manifest in "$@"; do
    kubectl_args+=("-f" "$manifest")
  done
fi

# Dry run
echo "\033[1;33m🔍 Performing dry run of kubectl apply...\033[0m"
kubectl apply "${kubectl_args[@]}" --dry-run=server | colorize

# Ask for consent
echo "\033[1;35m❓ Do you want to apply these changes? (y/n)\033[0m"
read -r response

# Apply if consent given
if [[ "$response" =~ ^[Yy]$ ]]; then
  echo "\033[1;32m✅ Applying changes...\033[0m"
  kubectl apply "${kubectl_args[@]}" | colorize
else
  echo "\033[1;31m❌ Operation cancelled.\033[0m"
  exit 1
fi

# Commit and push changes
echo "\033[1;34m📤 Preparing to commit and push changes...\033[0m"

echo "\033[1;35m❓ Enter a commit message:\033[0m"
read -r commit_message

git add .
git commit -m "$commit_message"
git push

echo "\033[1;32m✅ Changes have been committed and pushed.\033[0m"

Is our pod running?



infra on  main [$] via 🦀 v1.85.0
❯ k get pods -l app=net-shooter
NAME          READY   STATUS    RESTARTS   AGE
net-shooter   1/1     Running   0          10s

Yes, good! Let’s run some tests shall we?



infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -- ip addr show dev eth0 | grep -E 'inet |inet6 '
    inet 10.42.29.15/32 scope global eth0
    inet6 fd00:42:0:1d1b:89d4:e2d6:158f:6f0f/128 scope global
    inet6 fe80::ccf8:a1ff:fe55:ac8d/64 scope link proto kernel_ll

Okay, it definitely has an IPv4 address and an IPv6 address taken from our respective CIDR ranges and also a link local address starting with fe80.

Let’s try to do a traceroute to one of the other pods:



infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -- traceroute 10.42.123.3
traceroute to 10.42.123.3 (10.42.123.3), 30 hops max, 46 byte packets
 1  styx (49.13.119.8)  0.007 ms  0.015 ms  0.007 ms
 2  10.42.123.1 (10.42.123.1)  2.354 ms  1.786 ms  0.844 ms
 3  10.42.123.3 (10.42.123.3)  0.688 ms  1.061 ms  0.592 ms

Pretty straightforward.

Pods also have IPv6 addresses, since we’re dual-stack!



fasterthanli.me on  main [$]
❯ k get pods -n 'home' -o json | jq -c '.items[] | {nodeName: .spec.nodeName, name: .metadata.name, podIPs: .status.podIPs}'
{"nodeName":"flam","name":"cub-695b7f6fdd-42z5m","podIPs":[{"ip":"10.42.52.80"},{"ip":"fd00:42:0:42b4:6c65:9873:2890:3453"}]}
{"nodeName":"flam","name":"cub-695b7f6fdd-65dj5","podIPs":[{"ip":"10.42.52.78"},{"ip":"fd00:42:0:42b4:6c65:9873:2890:3451"}]}
{"nodeName":"marl","name":"cub-695b7f6fdd-89ztk","podIPs":[{"ip":"10.42.71.75"},{"ip":"fd00:42:0:4746:36a6:b9d9:c23:ef8e"}]}
{"nodeName":"hawk","name":"cub-695b7f6fdd-c5s5s","podIPs":[{"ip":"10.42.123.13"},{"ip":"fd00:42:0:f6da:458d:a644:59c2:d4e"}]}
{"nodeName":"marl","name":"cub-695b7f6fdd-knh47","podIPs":[{"ip":"10.42.71.77"},{"ip":"fd00:42:0:4746:36a6:b9d9:c23:ef90"}]}
{"nodeName":"heim","name":"cub-695b7f6fdd-rhll9","podIPs":[{"ip":"10.42.40.136"},{"ip":"fd00:42:0:28ae:bd57:7c2d:4a15:a89"}]}
{"nodeName":"styx","name":"mom-85d6745745-vkjw2","podIPs":[{"ip":"10.42.29.20"},{"ip":"fd00:42:0:1d1b:89d4:e2d6:158f:6f16"}]}

And similarly we can trace that route:



infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -- traceroute fd00:42:0:4746:36a6:b9d9:c23:ef8e
traceroute to fd00:42:0:4746:36a6:b9d9:c23:ef8e (fd00:42:0:4746:36a6:b9d9:c23:ef8e), 30 hops max, 72 byte packets
 1  fd00:42:0:1d1b:89d4:e2d6:158f:6f15 (fd00:42:0:1d1b:89d4:e2d6:158f:6f15)  0.013 ms  0.010 ms  0.011 ms
 2  fd00:42:0:4746:36a6:b9d9:c23:ef89 (fd00:42:0:4746:36a6:b9d9:c23:ef89)  170.372 ms  169.522 ms  169.491 ms
 3  fd00:42:0:4746:36a6:b9d9:c23:ef8e (fd00:42:0:4746:36a6:b9d9:c23:ef8e)  169.560 ms  169.474 ms  169.501 ms

But traceroute is… bordeline useless.

What we want to know here is answered better by the ip route command:



infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -it -- ip -4 route show
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link

infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -it -- ip -6 route show
fd00:42:0:1d1b:89d4:e2d6:158f:6f0f dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::ecee:eeff:feee:eeee dev eth0 metric 1024 pref medium

To me, this is interesting, because, well… both 169.254.1.1 and fe80::/64 are “link-local” addresses: the only other place I’ve seen them is when DHCP fails and your computer decides to pick an address that, I guess, would allow you to communicate with something at the other end even without DHCP?

So, actually, the trick is coming from outside the pod… because if we ask a random server from the internet what’s our IP address, it will have a radically different answer than what we’ve seen so far:



infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -it -- curl -4 https://icanhazip.com
49.13.119.8

infra on  main [$] via 🦀 v1.85.0
❯ k exec net-shooter -it -- curl -6 https://icanhazip.com
2a01:4f8:c17:34b1::1

This is the IP address of the node, not the pod — NAT is happening, both for IPv4 (NAT44):



root@styx ~# iptables -4 -t nat -L cali-POSTROUTING -v -n  
Chain cali-POSTROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination
12104  740K cali-fip-snat  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:Z-c7XtVd2Bq7s_hA */
12104  740K cali-nat-outgoing  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:nYKhEzDlr11Jccal */
    0     0 MASQUERADE  0    --  *      vxlan.calico  0.0.0.0/0            0.0.0.0/0            /* cali:e9dnSgSVNmIcpVhP */ ADDRTYPE match src-type !LOCAL limit-out ADDRTYPE match src-type LOCAL random-fully
    0     0 MASQUERADE  0    --  *      wireguard.cali  0.0.0.0/0            0.0.0.0/0            /* cali:kgfCOPW4UKtzMAmO */ ADDRTYPE match src-type !LOCAL limit-out ADDRTYPE match src-type LOCAL random-fully

And for IPv6 (NAT66):



root@styx ~# ip6tables -t nat -L cali-POSTROUTING -v -n
Chain cali-POSTROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination
 2015  173K cali-fip-snat  0    --  *      *       ::/0                 ::/0                 /* cali:Z-c7XtVd2Bq7s_hA */
 2015  173K cali-nat-outgoing  0    --  *      *       ::/0                 ::/0                 /* cali:nYKhEzDlr11Jccal */
    0     0 MASQUERADE  0    --  *      vxlan-v6.calico  ::/0                 ::/0                 /* cali:MtS-9OgAQy-fAM-w */ ADDRTYPE match src-type !LOCAL limit-out ADDRTYPE match src-type LOCAL random-fully

And this is fine and great for virtual machines hosted on Hetzner which have a public IPv4 address and have a public IPv6 prefix.

But what happens when, say, you tell your home computer to join your Kubernetes cluster? That’s exactly what I ended up doing: let’s see what happens with it!



infra on  main [$] via 🦀 v1.85.0
❯ ssh root@domino ip addr show dev enp3s0 | grep -E 'inet6?'        
    inet 192.168.1.100/24 brd 192.168.1.255 scope global dynamic noprefixroute enp3s0
    inet6 2a01:e0a:de8:a760:bf51:36ff:d905:1432/64 scope global temporary dynamic
    inet6 2a01:e0a:de8:a760:7656:3cff:fe28:5746/64 scope global dynamic mngtmpaddr noprefixroute
    inet6 fe80::7656:3cff:fe28:5746/64 scope link noprefixroute

It does have a publicly routable IPv6 because NAT is not required there. NAT is only being done for IPv4.



infra on  main [$] via 🦀 v1.85.0
❯ ssh root@domino -- ip -6 route show default
default via fe80::3a07:16ff:fec2:bc19 dev enp3s0 proto ra metric 100 pref medium

infra on  main [$] via 🦀 v1.85.0
❯ ssh root@domino -- ip -4 route show default
default via 192.168.1.254 dev enp3s0 proto dhcp src 192.168.1.100 metric 100

The default routes for IPv4 and IPv6 go directly to the router, and icanhazip reveal our actual public IPv4:



infra on  main [$] via 🦀 v1.85.0
❯ ssh root@domino -- curl -s -6 https://icanhazip.com
2a01:e0a:de8:a760:bf51:36ff:d905:1432

infra on  main [$] via 🦀 v1.85.0
❯ ssh root@domino -- curl -s -4 https://icanhazip.com
87.182.152.211

And you know what this means?

No, but you’ll tell us, right?

Why the fuck is everything broken

Again and again and again ever since I had this home node join my Kubernetes cluster I have had nothing but issues.

And every time it’s been the exact same issue and it’s taken me so long to realize what was happening.

This node, called Domino, has IPv4 egress, but doesn’t have IPv4 ingress!

domino is able to establish a connection with google.com over IPv4 and exchange packets no problem, but it cannot host an IPv4 service! If it does, then it’s going to be giving out an IP that is not routable!

Back to our node IPs, let’s take kaya for example:



infra on  main [$] via 🦀 v1.85.0
❯ k get nodes -o json | jq -c '.items[] | select(.metadata.name == "kaya") | .status.addresses'
[{"address":"5.223.56.87","type":"InternalIP"},{"address":"2a01:4ff:2f0:10be::1","type":"InternalIP"},{"address":"kaya","type":"Hostname"}]

It has an internal IP of 5.223.56.87 — very well. What’s the IP of the traefik pod on that node?



infra on  main [$] via 🦀 v1.85.0
❯ k get pods -n traefik -o json | jq -c -r '.items[] | select(.spec.nodeName == "kaya") | .status.podIPs'
[{"ip":"5.223.56.87"},{"ip":"2a01:4ff:2f0:10be::1"}]

The very same! It uses host networking.

But on domino?



infra on  main [$] via 🦀 v1.85.0
❯ k get pods -n traefik -o json | jq -c -r '.items[] | select(.spec.nodeName == "domino") | .status.podIPs'
[{"ip":"192.168.1.100"},{"ip":"2a01:e0a:de8:a760:17c3:ece0:634:8ec7"}]

It’s that LAN IP, 192.168.1.100.

And that caused me some problems when, after restarting pods, the Kubernetes server decided to schedule cert manager challenge pods on Domino.

What’s cert-manager? It’s a neat thing that provisions TLS certificates automatically: you create a Certificate, like so:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
    name: fasterthanli-me-cert
    namespace: home
spec:
    secretName: fasterthanli-me-cert-secret
    issuerRef:
        name: letsencrypt-prod
        kind: ClusterIssuer
    dnsNames: [fasterthanli.me, cdn.fasterthanli.me]

And then internally it makes certificate request objects, it makes orders, it talks with the Let’s Encrypt system, and it’s able to do changes of different kind. The one that I was using was HTTP-01.

Which works by making a request on some well-known path. Literally a path that starts with /.well-known/acme-challenge/ - the cert-manager creates a temporary HTTP endpoint that the Let’s Encrypt servers can access to verify that you control the domain you’re requesting a certificate for.

This works by creating an ingress resource which in turn is handled by traefik, to serve just that path, over HTTP (and the usual site over HTTPS). As soon as the TLS certificate is created, it’s swapped in and traefik starts using it.

And that’s all well and good. But this is where we find out that there is actually at least two types of services that can be used in a Kubernetes setup like mine: ClusterIP and NodePort.

A list of services as seen from the K9S text TUI. We see Calico, we see Forgejo, we see cert manager, Kube DNS, Traefik, Umami.

And really, in my situation, there’s no good reason for any service to be NodePort except for traefik, which must use host networking since there’s no load balancing in front of it, I’m not doing managed kubernetes, and I also can’t roll my own load balancer at Level 3.

And yet, the cert manager challenge services defaulted to NodePort for some reason — which used to always work on the nodes that were actually hosted on Hed’s Node Cloud VMs, but didn’t work on domino!

Because domino is doing double NAT for IPv4, and, for IPv6, is still doing single NAT, because even though the node would be able to route a whole /64’s worth of IPv6 addresses, Calico is picking pod IP addresses from the pools we gave it:

---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
    name: ipv4-pool
spec:
    cidr: 10.42.0.0/16
    ipipMode: Never
    vxlanMode: Always
    natOutgoing: true
    disabled: false
    nodeSelector: all()

---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
    name: ipv6-pool
spec:
    cidr: fd00:42::/48
    ipipMode: Never
    vxlanMode: Always
    natOutgoing: true
    disabled: false
    nodeSelector: all()

And that’s… just not great… to learn about, at 4 in the morning, when everything’s been down for hours.

Like… I’ve never asked to learn all this, man. I was just trying to throw a little arm64 in the mix. I miss solving problems with Dockerfile. Let me out. LET ME OUT.

Wait, wait, wait, so is there a way to disable NAT66 just for that node?

I think there is, but I’ve just been too scared to touch it so far.

But… where’s the fun in that?

Ah, damn it, you’re right.

Bye NATalie

If I understood everything correctly, we need to create another IP pool, just for our node:

# ✂️: ipv4 pool

---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
    name: ipv6-pool
spec:
    cidr: fd00:42::/48
    ipipMode: Never
    vxlanMode: Always
    natOutgoing: true
    disabled: false
    nodeSelector: "kubernetes.io/hostname != 'domino'"

---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
    name: public-ipv6-pool
spec:
    cidr: 2a01:e0a:de8:a760::/64
    ipipMode: Never
    vxlanMode: Always
    natOutgoing: false
    disabled: false
    nodeSelector: "kubernetes.io/hostname == 'domino'"

And apply it… and restart the pods, and… I don’t know, let’s test it:

---
apiVersion: v1
kind: Pod
metadata:
    name: ipv6-server
spec:
    nodeSelector:
        kubernetes.io/hostname: domino
    containers:
        - name: server
          image: python:3
          command:
              - python3
              - -m
              - http.server
              - "8080"
              - "--bind"
              - "::" # <-- Listen on all IPv6 interfaces
          ports:
              - containerPort: 8080
                protocol: TCP



infra on  main [$?] via 🦀 v1.85.0
❯ kubectl apply -f manifests/tests/100-python.yaml
pod/ipv6-server created

Let’s see what IPs were assigned…



infra on  main [$?] via 🦀 v1.85.0
❯ kubectl get pod ipv6-server -o jsonpath='{.status.podIPs}' | jq -c .
[{"ip":"10.42.210.16"},{"ip":"2a01:e0a:de8:a760:8ccd:f32f:73e5:da03"}]

…oooh, promising!

Now let’s see if we can access that pod?



infra on  main [$] via 🦀 v1.85.0
❯ curl --connect-timeout 2 -I 'http://[2a01:e0a:de8:a760:8ccd:f32f:73e5:da03]:8080'
curl: (28) Failed to connect to 2a01:e0a:de8:a760:8ccd:f32f:73e5:da03 port 8080 after 2006 ms: Timeout was reached

Oh. We can’t.

Wait, but that’s from inside the LAN.

And? You think it’s going to work better outside the LAN?

Try it

Fine, fine if y-



amos in 🌐 styx in ~
❯ curl --connect-timeout 2 -I 'http://[2a01:e0a:de8:a760:8ccd:f32f:73e5:da03]:8080'
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.13.2
Date: Mon, 07 Apr 2025 19:50:04 GMT
Content-type: text/html; charset=utf-8
Content-Length: 832

You… what? The fuck?

Claude tells me this can be caused by “LAN Hairpinning” or “NDP Scope Problems”.

Well, I guess that’s why we have happy eyeballs, so that the IPv4 path will work on LAN and the IPv6 path will work on the public internet.

Good night, everyone — and thanks for following along!

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Why is my Rust build so slow?

Dec 30, 2021

36 min #rust · #release-engineering

I’ve recently come back to an older project of mine (that powers this website), and as I did some maintenance work: upgrade to newer crates, upgrade to a newer rustc, I noticed that my build was taking too damn long!

For me, this is a big issue. Because I juggle a lot of things at any given time, and I have less and less time to just hyperfocus on an issue, I try to make my setup as productive as possible.