Skip to main content

Command Palette

Search for a command to run...

Homelab HA Kubernetes Cluster Upgrade: My New Shrine / Altar

Published
11 min read
Homelab HA Kubernetes Cluster Upgrade: My New Shrine / Altar

INTRODUCTION

In the beginning, there was MicroK8s on a Mac Studio. It was fast with 3 controlplane and 3 worker nodes, it was ARM64, but it was lonely. Today, I stand before a high-availability monument built on Proxmox with Terraform, orchestrated with Ansible, and maintained with GitOps using FluxCD.

Not long ago, my entire Kubernetes universe lived inside a humble Mac Studio — a single microk8s cluster with 6 nodes running on ARM64. It was cute, quiet, and completely unfit for the kind of multi‑DC, production‑grade nonsense I wanted to learn.

So I burned it down. And built this new place of worship.

Today, I run a high‑availability kubeadm cluster across three bare‑metal Proxmox Datacenters, all managed with Terraform, Ansible, and FluxCD. No cloud vendor lock‑in. No magic. Just a rack full of metal, a bunch of cables, and a lot of terminal time.

This is the story of my shrine — and how you can build one too.

UGLY WIRING:

MAJOR REASON WHY I CALLED IT SHRINE 😂

Traffic Flow at a Glance

Before we dive into the layers, here's how the traffic moves from my "pulpit" (Mac Studio) to the "shrine" (the cluster):

No inbound holes – all management traffic originates from my Mac or the cluster itself (GitOps pulls). This is how real datacenters work.

┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🖥️ macOS COMMAND CENTER (The Pulpit)                         │
│                                                                                      │
│              kubectl  │  Terraform  │  Ansible  │  Flux CLI  │  Git                  │
│                                                                                      │
│                        (All management tools installed locally)                      │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          │
                          SSH │ API (HTTPS) │ Git (SSH/HTTPS)
                                          │
                                          ▼
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🛡️ OPNsense Firewall (10.0.1.x)                              │
│                                                                                      │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │
│   │ DHCP Server     │  │ Static DHCP     │  │ WireGuard VPN   │                      │
│   │ 10.0.1.100-xxx  │  │ MAC → IP Pinning │  │ Remote Access   │                      │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │
│                                                                                      │
│   • Split-Horizon DNS: *.georgehomelab.com → 10.0.1.x                               │
│   • Gateway for all Proxmox + Kubernetes traffic                                     │
│   • Firewall rules: WAN → LAN passes for management                                  │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          │
                                          │ LAN (10.0.1.0/16)
                                          │ 2.5GbE Links
                                          ▼
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🔌 Zyxel XMG1915-10E Switch                                  │
│                                                                                      │
│                     Star topology │ 8× 2.5GbE + 2× SFP+                              │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          │
              ┌───────────────────────────┼───────────────────────────┐
              │                           │                           │
              ▼                           ▼                           ▼
┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│                         │ │                         │ │                         │
│  🏗️ Proxmox Node 1      │ │  🏗️ Proxmox Node 2      │ │  🏗️ Proxmox Node 3      │
│  (proxmox-dc-1)         │ │  (proxmox-dc-2)         │ │  (proxmox-dc-3)         │
│  10.0.1.1x              │ │  10.0.1.1x              │ │  10.0.1.1x              │
│                         │ │                         │ │                         │
│  • Local ZFS Storage    │ │  • Local ZFS Storage    │ │  • Local ZFS Storage    │
│  • vmbr0 Bridge         │ │  • vmbr0 Bridge         │ │  • vmbr0 Bridge         │
│  • NFS Client (Backups) │ │  • NFS Client (Backups) │ │  • NFS Client (Backups) │
│                         │ │                         │ │                         │
│  Terraform → VM Creation via Proxmox API (telmate/provider)                          │
│  Packer → Ubuntu Cloud-Init Templates                                                │
│                                                                                      │
└─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
              │                           │                           │
              │ Cloud-Init DHCP (Static Reservations → Predictable IPs)                │
              │                           │                           │
              └───────────────────────────┼───────────────────────────┘
                                          ▼
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         ☸️ HA Kubernetes Cluster (kubeadm)                           │
│                                                                                      │
│   ┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│   │ Control Plane Node 1    │ │ Control Plane Node 2    │ │ Control Plane Node 3    │
│   │ k8s-cp-1                │ │ k8s-cp-2                │ │ k8s-cp-3                │
│   │ 10.0.1.1xx              │ │ 10.0.1.1xx              │ │ 10.0.1.1xx              │
│   │                         │ │                         │ │                         │
│   │ • etcd (stacked)        │ │ • etcd (stacked)        │ │ • etcd (stacked)        │
│   │ • kube-apiserver        │ │ • kube-apiserver        │ │ • kube-apiserver        │
│   │ • kube-vip (VIP)        │ │ • kube-vip (VIP)        │ │ • kube-vip (VIP)        │
│   └─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
│                                                                                      │
│   ┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│   │ Worker Node 1           │ │ Worker Node 2           │ │ Worker Node 3           │
│   │ k8s-worker-1            │ │ k8s-worker-2            │ │ k8s-worker-3            │
│   │ 10.0.1.1xx              │ │ 10.0.1.1xx              │ │ 10.0.1.1xx              │
│   │                         │ │                         │ │                         │
│   │ • Calico CNI (BGP)      │ │ • Calico CNI (BGP)      │ │ • Calico CNI (BGP)      │
│   │ • kube-proxy            │ │ • kube-proxy            │ │ • kube-proxy            │
│   │ • Workload Pods         │ │ • Workload Pods         │ │ • Workload Pods         │
│   └─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
│                                                                                      │
│   Pod CIDR: 10.244.0.0/16 │ Service CIDR: 10.245.0.0/16 │ MetalLB: 10.0.1.2xx-2xx  │
│                                                                                      │
│   🔧 Bootstrapped entirely by Ansible (kubeadm playbook)                            │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          │
                                          │ GitOps Sync (Outbound Only)
                                          │ FluxCD pulls from GitHub (no inbound!)
                                          ▼
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🔄 FluxCD System (Inside Cluster)                            │
│                                                                                      │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │
│   │ source-         │  │ kustomize-      │  │ helm-           │                      │
│   │ controller      │  │ controller      │  │ controller      │                      │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │
│                                                                                      │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │
│   │ notification-   │  │ image-reflector-│  │ image-          │                      │
│   │ controller      │  │ controller      │  │ automation-     │                      │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │
│                                                                                      │
│   • Deployed as part of Ansible playbook (not a separate step)                       │
│   • Continuously reconciles cluster state with Git                                   │
│   • Auto-heals configuration drift                                                   │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          │
                                          │ HTTPS/SSH (Outbound Pull)
                                          │
                                          ▼
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                              📦 GitHub Private Repository                            │
│                                                                                      │
│   ┌─────────────────────────────────────────────────────────────────────────────┐    │
│   │  clusters/prod/                                                             │    │
│   │  ├── flux-system/          # Flux bootstrapping config                      │    │
│   │  │   ├── gotk-components.yaml                                               │    │
│   │  │   └── gotk-sync.yaml                                                     │    │
│   │  ├── apps/                  # Application deployments                       │    │
│   │  │   ├── metallb/                                                          │    │
│   │  │   ├── istio-ingress/                                                    │    │
│   │  │   └── prometheus-stack/                                                 │    │
│   │  └── infrastructure/        # Cluster-wide config                          │    │
│   │      ├── namespaces.yaml                                                   │    │
│   │      └── storage-class.yaml                                                │    │
│   └─────────────────────────────────────────────────────────────────────────────┘    │
│                                                                                      │
│   🔑 Source of Truth: Every change starts as a PR, reviewed, merged, then applied   │
│                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────┘

Level 1: The Physical Layer (The Foundations)

Every altar begins with something tangible.

  • Hardware: A fleet of Minisforum MS-01 machines acting as compute Datacenter (96GB RAM and 8GB nvidia GPU on each machine) . That's Total of 288GB RAM and 24GB nvidia GPU in the 3 Minisforum MS-01

  • Network Entry Point: My Mac Studio (the “pulpit”), connected via Wi-Fi

  • Firewall: OPNsense bridging external (192.168.1.x) to internal lab network (10.0.1.x)

  • Out-of-Band Access: TinyPilot Voyager 2a and TESmart 4‑port HDMI KVM — BIOS-level control even when the OS is down

  • Switch: Zyxel XMG1915-10E (2.5GbE + SFP+) is the Central Nervous System. With High-Velocity East-West Traffic ( Low Latency / High Throughput for etcd and storage )

Why I worship here:

Physical simplicity enables logical complexity.

No mystery cables. Everything is deliberate. This playground makes me the opportunity to play with any cloud-native tool with ease.

Level 2: The Infrastructure Layer (Proxmox Datacenter)

Before automation, there must be a foundation.

  • Proxmox VE installed manually on all three Minisforum MS-01 machine

  • Clustered into a single datacenter abstraction

  • Networking:

    • vmbr0 → Kubernetes network

    • Static host IPs:

      • 10.0.1.1x

      • 10.0.1.1x

      • 10.0.1.1x

    • Gateway: 10.0.1.x (OPNsense)

  • Storage:

    • Local ZFS (NVMe)

    • NFS for shared ISO + backups

The ritual:

I installed the first Proxmox VE manually on each machine via TinyPilot’s virtual media from my MAC-Studio browser over wifi.

No HDMI cable ever touched my desk.

Level 3: The Node Layer (Terraform Automation)

I no longer click buttons to create infrastructure.

I declare it.

Using the Proxmox Terraform provider, I define:

  • VM CPU, memory, disk

  • Network interfaces

  • Clone source (Ubuntu template from Packer)

resource "proxmox_vm_qemu" "k8s_node" {
  for_each = var.nodes

  name        = each.value.name
  target_node = each.value.proxmox_node
  clone       = "ubuntu-24-04-template"
  cores       = each.value.cores
  memory      = each.value.memory

  network {
    model     = "virtio"
    bridge    = "vmbr0"
    ipconfig0 = "ip=dhcp"
  }
}

The DHCP Decision (And Why It Matters)

This was one of the most important lessons in my journey.

In my old Mac Studio setup, I used pure DHCP for Kubernetes nodes.

It worked… until every restart broke my cluster access.

What went wrong?

  • Control plane nodes changed IPs

  • kubeconfig became invalid

  • API server endpoints broke

  • etcd stability was at risk

Even with 3 control planes, the cluster wasn’t truly stable.

Why Not Static IPs?

Because static IPs inside the OS mean:

  • Manual netplan configuration

  • Hardcoding network logic into templates

  • Reduced rebuild flexibility

That’s not how cloud-native systems behave.

The Solution: DHCP + Reservations

I used DHCP everywhere — but configured static reservations in OPNsense.

✔ Nodes auto-configure
✔ IPs never change
✔ Rebuilds are seamless
✔ etcd remains stable

💡 The Real Insight

Kubernetes doesn’t care how IPs are assigned — only that they don’t change.

Level 4: The Cluster Layer (Ansible + Kubeadm)

Once the infrastructure exists, it must be transformed.

Using Ansible:

  • OS hardening

  • Swap disabled

  • containerd installed

  • kubeadm, kubelet, kubectl configured

HA Control Plane

  • 3 control plane nodes

  • Stacked etcd (homelab-friendly)

  • kube-vip for API virtual IP

Level 5: The Application Layer (GitOps with FluxCD)

This is where everything changes.

Instead of imperative deployments or declarative deployment with kubectl, I use GitOps FluxCD.

GitOps From Day One

FluxCD is not an add-on.

It is deployed during cluster creation via Ansible.

That means:

  • Cluster is GitOps-ready immediately

  • No manual bootstrap later

  • No drift from day one

The Pull Model

  • Flux runs inside the cluster

  • Watches Git repository

  • Pulls changes automatically

No inbound access required.

Traffic Flow

Mac Studio (192.168.1.x)
        │
        ▼
OPNsense Firewall (10.0.1.x)
        │
        ▼
Proxmox Cluster (10.0.1.1x–1x)
        │
        ▼
Kubernetes Nodes (DHCP → Reserved IPs)
        │
        ▼
FluxCD Controllers (inside cluster)
        │
        ▼
GitHub (OUTBOUND pull model)

Key Insight:

  • ❌ GitHub never connects to your cluster

  • ❌ No firewall holes needed

  • ✅ Flux initiates outbound sync

Current State of the Shrine

  • 3 control plane nodes ✅

  • 3 worker nodes ✅

  • etcd cluster healthy ✅

  • Flux controllers distributed across nodes ✅

  • Calico networking active ✅

This is no longer a lab.

It is a self-healing platform.

Before vs After

Feature Old (Mac Studio) New Shrine (Proxmox HA)
Architecture Single Node 3-Node HA
Provisioning Manual Terraform
Configuration Scripts Ansible
Deployment kubectl GitOps (FluxCD)
Network DHCP (unstable) DHCP + Reservations
Resilience Low High

What I Learned

  1. DHCP + reservations is the sweet spot

  2. etcd requires stable identity, not static config

  3. GitOps removes human drift completely

  4. Terraform + Ansible + FluxCD = powerful combination

  5. Firewalls must allow internal routing for automation

  6. Never use root API for automation — use scoped tokens

What’s Next on the Altar

  • Ceph or Longhorn for HA storage

  • Velero for cluster backups

  • External Secrets + Vault

  • Cluster autoscaler experiments

Final Words

This homelab is more than a project.

It is a practice ground for real-world platform engineering.

The move from a single ARM node to a distributed HA cluster wasn’t just an upgrade in hardware — it was an upgrade in mindset.

My Mac Studio is no longer the host.

It is the pulpit.

The Shrine runs independently.

If you’re thinking of building something like this — do it.
Start small. Break things. Rebuild them better.

Now go build your own altar. 🛐

🤝 Stay Connected

Found this guide helpful? Follow my journey into Homelabing on LinkedIn! Click the blue LinkedIn button to connect: George Ezejiofor . Let’s keep building scalable, secure cloud-native systems, one project at a time! 🌐🔧

More from this blog

George Ezejiofor

8 posts

"Insights on DevSecOps, cloud-native tech, and microservices. Practical guides and real-world projects to help secure, scale, and automate infrastructures in the DevOps landscape." Stay Tuned!!!