Azure Kubernetes Service (AKS)

Azure Kubernetes Service reduces the operational burden of running Kubernetes by managing the control plane — API server, etcd, scheduler — so you don’t have to. What it doesn’t manage is everything else: node pool configuration, workload identity, storage class selection, networking topology, upgrade timing, cost governance, and security posture. The “managed” label covers a narrow slice of what actually requires operational attention.

The articles in this collection address the decisions AKS leaves to platform engineers. Identity configuration covers Workload Identity Federation — why it’s more complex than service account tokens and where credentials still leak despite federation. Storage articles examine what happens to PVCs when node pools get replaced and which storage classes survive real restore scenarios. Networking content covers multi-cluster hub-spoke topologies and when mesh complexity becomes justified rather than premature.

Cluster upgrades are a recurring theme because the documentation describes them optimistically. Cordon and drain behavior, Pod Disruption Budget configuration that actually prevents downtime rather than just satisfying a checkbox, and multi-node-pool rollout strategies that make upgrades reproducible rather than heroic one-off events.

Cost governance gets specific attention because resource limits are not a cost strategy. Node pool design decisions, spot VM integration without reliability regressions, and FinOps tagging that produces actionable attribution are each covered with the trade-offs named explicitly.

At scale — clusters above a few hundred nodes — AKS behavior changes in ways that smaller clusters don’t expose. etcd limits under high object churn, network saturation, observability overhead that compounds with cluster size, and cost patterns that emerge from early architectural decisions are addressed in articles grounded in production experience rather than constructed scenarios.

Platform Engineering Without Backstage: Pragmatic IDPs on Azure

Platform Engineering Without Backstage: Pragmatic IDPs on Azure

Every platform engineering conference talk in the last two years has had a Backstage slide. Glossy catalogue screenshot, a scaffolder demo that creates a repo in four clicks, a knowing nod about “developer experience”. What the slide never shows is the six months the team spent building plugins, the Postgres instance somebody now babysits, the TechDocs theme nobody asked for, and the 0.4 of an engineer permanently assigned to chasing Backstage’s two-week release cadence.

There is no shame in any of this. Backstage is a serious project and serious teams run it well. The shame is treating it as the default (the thing you reach for on day one) when most teams could ship 80% of the value with a tenth of the effort and a fraction of the running cost. Backstage is a platform for building platforms. Most teams need a platform, not a platform-platform.

This post is the Internal Developer Platform (IDP) I keep building when nobody is forcing me to use Backstage. It is small, opinionated, runs on Azure plumbing you already pay for, and ships value in the first quarter instead of the third year.

AKS Cluster Upgrades: Zero-Downtime Operations That Actually Work

AKS Cluster Upgrades: Zero-Downtime Operations That Actually Work

AKS cluster upgrades involve node replacement and pod eviction, which can cause service disruption without proper controls. This article explains cordon and drain mechanics, Pod Disruption Budget configuration, and multi-node-pool rollout strategies with validation-driven automation for reliable zero-downtime upgrades.