Azure Kubernetes Service reduces the operational burden of running Kubernetes by managing the control plane — API server, etcd, scheduler — so you don’t have to. What it doesn’t manage is everything else: node pool configuration, workload identity, storage class selection, networking topology, upgrade timing, cost governance, and security posture. The “managed” label covers a narrow slice of what actually requires operational attention.
The articles in this collection address the decisions AKS leaves to platform engineers. Identity configuration covers Workload Identity Federation — why it’s more complex than service account tokens and where credentials still leak despite federation. Storage articles examine what happens to PVCs when node pools get replaced and which storage classes survive real restore scenarios. Networking content covers multi-cluster hub-spoke topologies and when mesh complexity becomes justified rather than premature.
Cluster upgrades are a recurring theme because the documentation describes them optimistically. Cordon and drain behavior, Pod Disruption Budget configuration that actually prevents downtime rather than just satisfying a checkbox, and multi-node-pool rollout strategies that make upgrades reproducible rather than heroic one-off events.
Cost governance gets specific attention because resource limits are not a cost strategy. Node pool design decisions, spot VM integration without reliability regressions, and FinOps tagging that produces actionable attribution are each covered with the trade-offs named explicitly.
At scale — clusters above a few hundred nodes — AKS behavior changes in ways that smaller clusters don’t expose. etcd limits under high object churn, network saturation, observability overhead that compounds with cluster size, and cost patterns that emerge from early architectural decisions are addressed in articles grounded in production experience rather than constructed scenarios.
