{"authors":[{"name":"Martin Stühmer","url":"https://daily-devops.net/authors/martin/"},{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"description":"Recent content in Platform Engineering for .NET \u0026 Azure Teams on Daily DevOps \u0026 .NET","favicon":"https://daily-devops.net/images/logo_hu_6465d873dfa490cf.png","feed_url":"https://daily-devops.net/tags/platform-engineering/feed.json","home_page_url":"https://daily-devops.net/tags/platform-engineering/","icon":"https://daily-devops.net/images/logo_hu_5926de77762241ba.png","items":[{"authors":[{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"content_html":"\u003cp\u003eEvery platform engineering conference talk in the last two years has had a Backstage slide. Glossy catalogue screenshot, a scaffolder demo that creates a repo in four clicks, a knowing nod about \u0026ldquo;developer experience\u0026rdquo;. What the slide never shows is the six months the team spent building plugins, the Postgres instance somebody now babysits, the TechDocs theme nobody asked for, and the 0.4 of an engineer permanently assigned to chasing Backstage\u0026rsquo;s two-week release cadence.\u003c/p\u003e\n\u003cp\u003eThere is no shame in any of this. Backstage is a serious project and serious teams run it well. The shame is treating it as the \u003cem\u003edefault\u003c/em\u003e (the thing you reach for on day one) when most teams could ship 80% of the value with a tenth of the effort and a fraction of the running cost. Backstage is a platform for building platforms. Most teams need a platform, not a platform-platform.\u003c/p\u003e\n\u003cp\u003eThis post is the Internal Developer Platform (IDP) I keep building when nobody is forcing me to use Backstage. It is small, opinionated, runs on Azure plumbing you already pay for, and ships value in the first quarter instead of the third year.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-an-idp-actually-needs-to-do\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#what-an-idp-actually-needs-to-do\" title=\"What an IDP Actually Needs to Do\"\u003eWhat an IDP Actually Needs to Do\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eBefore the tooling argument, the value list. An Internal Developer Platform exists to:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eLower the time to first deployment\u003c/strong\u003e for a new service. Day one, not day thirty.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eMake the right thing the easy thing.\u003c/strong\u003e Security, observability, and cost defaults arrive for free, not as a checklist the team forgets.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eProvide a catalogue\u003c/strong\u003e so anyone can answer \u0026ldquo;who owns this?\u0026rdquo; without a Slack archaeology session at 02:00.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eStop the SRE from writing the same answer a third time.\u003c/strong\u003e Every repeated question is a missing platform feature.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eEverything else is decoration. If a feature doesn\u0026rsquo;t move one of those four numbers, skip it.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"the-pragmatic-stack-on-azure\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#the-pragmatic-stack-on-azure\" title=\"The Pragmatic Stack on Azure\"\u003eThe Pragmatic Stack on Azure\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eFive components. None of them require Backstage. All of them you can stand up with tools your org already owns.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eGolden-path repositories:\u003c/strong\u003e skeleton projects for the shapes you support (API, worker, frontend), each pre-wired to the platform.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eReusable GitHub Actions workflows:\u003c/strong\u003e a small set of \u003ccode\u003eworkflow_call\u003c/code\u003e templates that do the right thing so teams don\u0026rsquo;t copy-paste CI.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eOpinionated Bicep modules:\u003c/strong\u003e one module per platform service (Container App, Azure Kubernetes Service (AKS) namespace, Key Vault, Storage), with sane defaults and clearly marked escape hatches.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eA service catalogue:\u003c/strong\u003e one YAML file per service repo, aggregated into a queryable, static site. No database.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eScaffolding:\u003c/strong\u003e \u003ccode\u003egh repo create --template\u003c/code\u003e is enough for most teams. A CLI comes \u003cem\u003elater\u003c/em\u003e, if ever.\u003c/li\u003e\n\u003c/ul\u003e\n\n\n\n\n\u003ch3 id=\"golden-path-repository-the-highest-leverage-artefact\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#golden-path-repository-the-highest-leverage-artefact\" title=\"Golden-Path Repository: The Highest-Leverage Artefact\"\u003eGolden-Path Repository: The Highest-Leverage Artefact\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eA new service starts as a fork of a template repo. Day one, the team has CI, CD, infrastructure-as-code, and platform integration already wired together.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-text\" data-lang=\"text\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003epayments-api/                      (created from my-org/template-dotnet-api)\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e├── .github/\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e│   └── workflows/\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e│       ├── ci.yml                 → calls the platform reusable workflow\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e│       └── cd.yml                 → calls the platform reusable workflow\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e├── infra/\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e│   └── main.bicep                 → consumes the platform Bicep module\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e├── src/                           → service code (the only part the team writes)\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e├── catalog-info.yaml              → one file, registers the service in the catalogue\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e├── .editorconfig                  → org formatting defaults\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e├── .gitignore\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e└── README.md                      → pre-filled: how to run, deploy, and page on-call\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe discipline that makes this work: the team owns \u003ccode\u003esrc/\u003c/code\u003e and almost nothing else. The moment teams start hand-editing the workflow files or the Bicep, the golden path is dead and you\u0026rsquo;re back to snowflakes. Keep the platform-owned files thin pointers to versioned platform artefacts, and the template stays maintainable for years.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"reusable-workflows-one-place-to-improve-everyones-ci\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#reusable-workflows-one-place-to-improve-everyones-ci\" title=\"Reusable Workflows: One Place to Improve Everyone\u0026rsquo;s CI\"\u003eReusable Workflows: One Place to Improve Everyone\u0026rsquo;s CI\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eThis is the highest-leverage line of code you will write all quarter. When you fix something in a reusable workflow (a vulnerable action version, a missing Software Bill of Materials (SBOM) step, a flaky cache), \u003cem\u003eevery\u003c/em\u003e service that calls it picks up the fix on its next run. No PR to fifty repos. No migration guide nobody reads.\u003c/p\u003e\n\u003cp\u003eThe platform owns the real workflow:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c\"\u003e# my-org/platform-workflows/.github/workflows/dotnet-api.yml\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edotnet-api\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eon\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eworkflow_call\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003einputs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003eservice\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erequired\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"kc\"\u003etrue\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003etype\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003eenvironment\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erequired\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"kc\"\u003efalse\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003etype\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003edefault\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003estaging\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003epermissions\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eid-token\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ewrite   \u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"c\"\u003e# OIDC federation to Azure — no long-lived secrets in any repo\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003econtents\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eread\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003epackages\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ewrite\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ejobs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ebuild-test\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003eruns-on\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eubuntu-latest\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003esteps\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eactions/checkout@v4\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eactions/setup-dotnet@v4\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003ewith\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003edotnet-version\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;10.0.x\u0026#34;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edotnet restore\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edotnet build --no-restore -c Release\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edotnet test --no-build -c Release --logger trx\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"c\"\u003e# SBOM + image build are platform defaults, not per-team decisions.\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eBuild and push image\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e|\u003c/span\u003e\u003cspan class=\"sd\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          az acr build --registry ${{ vars.PLATFORM_ACR }} \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            --image ${{ inputs.service }}:${{ github.sha }} .\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003edeploy\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003eneeds\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ebuild-test\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003eruns-on\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eubuntu-latest\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003eenvironment\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003e${{ inputs.environment }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003esteps\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eazure/login@v2\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003ewith\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003eclient-id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003e${{ secrets.AZURE_CLIENT_ID }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003etenant-id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003e${{ vars.AZURE_TENANT_ID }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003esubscription-id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003e${{ vars.AZURE_SUBSCRIPTION_ID }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eactions/checkout@v4\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eDeploy infra + app\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e|\u003c/span\u003e\u003cspan class=\"sd\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          az deployment group create \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            --resource-group rg-${{ inputs.service }}-${{ inputs.environment }} \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            --template-file infra/main.bicep \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            --parameters image=${{ vars.PLATFORM_ACR }}.azurecr.io/${{ inputs.service }}:${{ github.sha }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe team\u0026rsquo;s entire deployment pipeline is then twelve lines that nobody needs to understand:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c\"\u003e# payments-api/.github/workflows/cd.yml (complete file)\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ecd\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eon\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003epush\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003ebranches\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"l\"\u003emain]\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ejobs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eship\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003emy-org/platform-workflows/.github/workflows/dotnet-api.yml@v3\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003ewith\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003eservice\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003epayments-api\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003eenvironment\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eproduction\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003esecrets\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003einherit\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003ePin the \u003ccode\u003e@v3\u003c/code\u003e and treat platform releases like a product: a changelog, semantic versions, so teams adopt fixes on a tag bump instead of being silently broken by \u003ccode\u003e@main\u003c/code\u003e.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"bicep-modules-the-opinions-codified\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#bicep-modules-the-opinions-codified\" title=\"Bicep Modules: The Opinions, Codified\"\u003eBicep Modules: The Opinions, Codified\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eA Bicep module per platform service replaces hours of copy-paste, and more importantly, replaces hours of \u003cem\u003earguing about defaults\u003c/em\u003e. Teams call the module with the three or four parameters that matter for their service. The platform team owns the parameters that matter for everyone: tags, managed identity, network posture, the things an auditor will ask about. This is the same argument I make in \u003ca href=\"/posts/infrastructure-as-code-compliance-bicep/\"\u003eWhy Your Azure Portal Clicks Will Fail the Next Audit\u003c/a\u003e: defaults that live in code are defaults you can prove.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bicep\" data-lang=\"bicep\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e// modules/container-app.bicep — one module, sane defaults, marked escape hatches\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e@\u003c/span\u003e\u003cspan class=\"nf\"\u003edescription\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;Service name. Drives naming, tags, and identity.\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eservice\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e@\u003c/span\u003e\u003cspan class=\"nf\"\u003edescription\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;Container image including registry and tag.\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eimage\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003elocation\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nf\"\u003eresourceGroup\u003c/span\u003e\u003cspan class=\"p\"\u003e().\u003c/span\u003e\u003cspan class=\"nv\"\u003elocation\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e// --- Escape hatches: teams override only what their service genuinely needs. ---\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e@\u003c/span\u003e\u003cspan class=\"nf\"\u003eallowed\u003c/span\u003e\u003cspan class=\"p\"\u003e([\u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;0.25\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;0.5\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;1.0\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;2.0\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e])\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003ecpu\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;0.5\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003ememory\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;1Gi\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eminReplicas\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eint\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003e1\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eparam\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003emaxReplicas\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eint\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003e10\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e// --- Platform-owned defaults teams do NOT get to weaken. ---\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003evar\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003etags\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003eservice\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eservice\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;managed-by\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;platform\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;cost-center\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;engineering\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eresource\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eenv\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;Microsoft.App/managedEnvironments@2024-03-01\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"kd\"\u003eexisting\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;platform-aca-env\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eresource\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eidentity\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;id-\u003c/span\u003e\u003cspan class=\"si\"\u003e${\u003c/span\u003e\u003cspan class=\"nv\"\u003eservice\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003elocation\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003elocation\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003etags\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003etags\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eresource\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eapp\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;Microsoft.App/containerApps@2024-03-01\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eservice\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003elocation\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003elocation\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003etags\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003etags\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003eidentity\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"kd\"\u003etype\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;UserAssigned\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nv\"\u003euserAssignedIdentities\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;\u003c/span\u003e\u003cspan class=\"si\"\u003e${\u003c/span\u003e\u003cspan class=\"nv\"\u003eidentity\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nv\"\u003eid\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{}\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nv\"\u003eproperties\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nv\"\u003emanagedEnvironmentId\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eenv\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nv\"\u003eid\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nv\"\u003econfiguration\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nv\"\u003eingress\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eexternal\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"kc\"\u003etrue\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003etargetPort\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003e8080\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003etransport\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#39;auto\u0026#39;\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nv\"\u003etemplate\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nv\"\u003econtainers\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eservice\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eimage\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eimage\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eresources\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003ecpu\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nf\"\u003ejson\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"nv\"\u003ecpu\u003c/span\u003e\u003cspan class=\"p\"\u003e),\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003ememory\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003ememory\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nv\"\u003escale\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eminReplicas\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eminReplicas\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003emaxReplicas\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003emaxReplicas\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kd\"\u003eoutput\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003efqdn\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003estring\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e=\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nv\"\u003eapp\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nv\"\u003eproperties\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nv\"\u003econfiguration\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nv\"\u003eingress\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nv\"\u003efqdn\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe escape hatches are explicit and few. If a team needs to override something that isn\u0026rsquo;t a parameter, it\u0026rsquo;s a signal: either the default is wrong for everyone (fix the module), or the service is a genuine snowflake. Never allow a quiet fork of the module into a service repo. The day that happens, you have N modules to patch, not one.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"service-catalogue-a-yaml-file-and-a-build-step\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#service-catalogue-a-yaml-file-and-a-build-step\" title=\"Service Catalogue: A YAML File and a Build Step\"\u003eService Catalogue: A YAML File and a Build Step\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eSkipping the catalogue is the most expensive shortcut on this list, and teams skip it first because it feels like paperwork. It\u0026rsquo;s the difference between a five-second answer and a forty-minute incident dig. You don\u0026rsquo;t need Postgres, a graph database, or Backstage to have one.\u003c/p\u003e\n\u003cp\u003eOne file per service repo:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c\"\u003e# payments-api/catalog-info.yaml\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eservice\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003epayments-api\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eowner\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eteam-payments\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003etier\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"m\"\u003e1\u003c/span\u003e\u003cspan class=\"w\"\u003e                       \u003c/span\u003e\u003cspan class=\"c\"\u003e# 1 = revenue-critical; drives alerting + review policy\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003elanguage\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edotnet\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ehosting\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003econtainer-apps\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eon_call\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003epayments-oncall@example.com\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003elinks\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003erepo\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ehttps://github.com/example/payments-api\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003erunbook\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ehttps://example.atlassian.net/wiki/payments-api\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003edashboard\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ehttps://example.grafana.net/d/payments\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003edepends_on\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e- \u003cspan class=\"l\"\u003epostgres-payments\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e- \u003cspan class=\"l\"\u003eauth-api\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eA nightly job in the platform repo collects every one of those files across the org and emits a single static JSON the catalogue site reads. The whole \u0026ldquo;backend\u0026rdquo; is the GitHub search API, \u003ccode\u003ejq\u003c/code\u003e, and \u003ccode\u003eyq\u003c/code\u003e:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c\"\u003e# my-org/platform/.github/workflows/build-catalogue.yml\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eon\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eschedule\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e- \u003cspan class=\"nt\"\u003ecron\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;0 3 * * *\u0026#34;\u003c/span\u003e\u003cspan class=\"w\"\u003e     \u003c/span\u003e\u003cspan class=\"c\"\u003e# nightly; the org doesn\u0026#39;t reorganise faster than that\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eworkflow_dispatch\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ejobs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eaggregate\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003eruns-on\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eubuntu-latest\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003esteps\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eCollect every catalog-info.yaml in the org\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003eenv\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003eGH_TOKEN\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003e${{ secrets.CATALOG_READ_TOKEN }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e|\u003c/span\u003e\u003cspan class=\"sd\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          gh search code --owner my-org --filename catalog-info.yaml \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            --json path,repository \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          | jq -r \u0026#39;.[] | \u0026#34;\\(.repository.nameWithOwner) \\(.path)\u0026#34;\u0026#39; \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          | while read -r repo path; do\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e              gh api \u0026#34;repos/$repo/contents/$path\u0026#34; --jq \u0026#39;.content\u0026#39; | base64 -d\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            done \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          | yq ea \u0026#39;[.]\u0026#39; -o=json \u0026gt; catalog.json\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ePublish to the static catalogue site\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e|\u003c/span\u003e\u003cspan class=\"sd\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          az storage blob upload --account-name platformcatalogue \\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e            --container-name \u0026#39;$web\u0026#39; --name catalog.json --file catalog.json --overwrite\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThat \u003ccode\u003ecatalog.json\u003c/code\u003e lands in an Azure Storage static website (\u003ccode\u003e$web\u003c/code\u003e container) that costs roughly the price of a coffee per year to host. The catalogue front-end is a single page that fetches the JSON and renders a searchable table. \u0026ldquo;Who owns \u003ccode\u003epayments-api\u003c/code\u003e? What\u0026rsquo;s the runbook? What does it depend on?\u0026rdquo; All answered in one query, by anyone, without a database to back up or a service to keep alive at 02:00.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"from-new-service-to-first-deploy-under-30-minutes\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#from-new-service-to-first-deploy-under-30-minutes\" title=\"From New Service to First Deploy: Under 30 Minutes\"\u003eFrom New Service to First Deploy: Under 30 Minutes\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003e\u003ccode\u003egh repo create --template\u003c/code\u003e. Fill in four lines of \u003ccode\u003ecatalog-info.yaml\u003c/code\u003e. Write your logic. \u003ccode\u003egit push\u003c/code\u003e. The workflow builds, tests, and deploys with identity, tags, and labels already correct. The catalogue picks it up overnight. No ticket, no request to the platform team. The golden path \u003cem\u003eis\u003c/em\u003e the request system.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"when-backstage-actually-is-the-right-answer\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#when-backstage-actually-is-the-right-answer\" title=\"When Backstage Actually Is the Right Answer\"\u003eWhen Backstage Actually Is the Right Answer\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eBackstage is the right answer when:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eThe platform team is five-plus people\u003c/strong\u003e with genuine capacity to run Backstage as a product, including the upgrade treadmill.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eThe service count is large enough that catalogue queries genuinely matter:\u003c/strong\u003e hundreds of services, not dozens. At dozens, a YAML file and a static page answer every question Backstage would.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eYou need a dependency graph, not a list.\u003c/strong\u003e When \u0026ldquo;what breaks if \u003ccode\u003eauth-api\u003c/code\u003e goes down?\u0026rdquo; needs a traversable graph across hundreds of nodes, a flat YAML catalogue stops being enough.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eYou\u0026rsquo;ll get real use from existing Backstage plugins\u003c/strong\u003e for proprietary in-house systems, where the plugin ecosystem saves you building integrations from scratch.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eTwo or three engineers and a few dozen services? Backstage will eat your runway. Pick the tool that matches your headcount, not the one that matched the headcount of the team whose conference talk you watched.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"where-teams-go-wrong\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#where-teams-go-wrong\" title=\"Where Teams Go Wrong\"\u003eWhere Teams Go Wrong\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eTeams build what they want to operate rather than what developers need to ship through.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eToo much, too soon:\u003c/strong\u003e UI before catalogue (the data model is the hard part; the UI is a weekend). CLI before template (\u003ccode\u003egh repo create --template\u003c/code\u003e handles the first fifty services). Plugins for tools nobody uses yet. Speculative integration is speculative debt.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNot enough:\u003c/strong\u003e The catalogue gets skipped because it feels like paperwork. Every incident without it costs you the forty minutes you \u0026ldquo;saved\u0026rdquo;. Documentation nobody can find at 02:00 is a graveyard. And the platform needs an on-call rota: no owner when it breaks means teams won\u0026rsquo;t depend on it. As I argue in \u003ca href=\"/posts/incident-response-github-actions/\"\u003eYour Incident Response Plan Is a Lie. Here\u0026rsquo;s How to Fix It.\u003c/a\u003e: undefined ownership is the failure, not the missing tool.\u003c/p\u003e\n\u003cp\u003eIf a feature doesn\u0026rsquo;t lower time-to-deploy, encode a good default, feed the catalogue, or kill a repeated SRE question: it waits.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-an-idp-cannot-fix\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#what-an-idp-cannot-fix\" title=\"What an IDP Cannot Fix\"\u003eWhat an IDP Cannot Fix\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eTooling is downstream of organisation. Team dynamics at war, a production swamp nobody has time to drain, leadership that won\u0026rsquo;t give teams space to adopt the golden path: these are org problems, not tool problems. It\u0026rsquo;s the same trap I describe in \u003ca href=\"/posts/kubernetes-not-platform-strategy/\"\u003eKubernetes Is Not a Platform Strategy\u003c/a\u003e. The tool is how you express the strategy, not a substitute for one.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-i-recommend\"\u003e\u003ca href=\"/posts/platform-engineering-without-backstage/#what-i-recommend\" title=\"What I Recommend\"\u003eWhat I Recommend\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eBuild one golden-path repo in two weeks: one service shape, CI, CD, Bicep, \u003ccode\u003ecatalog-info.yaml\u003c/code\u003e. Adopt it on two real services, not pilots. Add the catalogue in an afternoon. Only ask whether you need Backstage at month three or four, once service count and team size give you an actual answer.\u003c/p\u003e\n\u003cp\u003eEarn the right to operate Backstage by first delivering value without it.\u003c/p\u003e","date_modified":"2026-05-27T18:08:49+02:00","date_published":"2026-05-27T18:00:00+02:00","id":"https://daily-devops.net/posts/platform-engineering-without-backstage/","language":"en","summary":"Backstage is not the only path to an Internal Developer Platform. Pragmatic IDP patterns on Azure that ship value before the YAML eats your team.","tags":["platform-engineering","azure","devops","aks"],"title":"Platform Engineering Without Backstage: Pragmatic IDPs on Azure","url":"https://daily-devops.net/posts/platform-engineering-without-backstage/"},{"authors":[{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"content_html":"\n\n\n\n\u003ch2 id=\"the-problem-traditional-storage-models-dont-translate-to-kubernetes\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#the-problem-traditional-storage-models-dont-translate-to-kubernetes\" title=\"The Problem: Traditional Storage Models Don\u0026rsquo;t Translate to Kubernetes\"\u003eThe Problem: Traditional Storage Models Don\u0026rsquo;t Translate to Kubernetes\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eRunning stateful workloads in Kubernetes means more than deploying a database pod. Traditional storage models (provision a disk, format it, mount it, expect it to stay) collide with Kubernetes\u0026rsquo; ephemeral, distributed architecture. Pods get rescheduled, scaled, and terminated. Your database shouldn\u0026rsquo;t lose data when that happens.\u003c/p\u003e\n\u003cp\u003eThe core challenge: \u003cstrong\u003ehow do you attach persistent storage to ephemeral compute?\u003c/strong\u003e On-premises infrastructure relies on SAN devices, NFS mounts, or local disks with predictable failure domains. You know which server hosts which disk. In AKS, you work with Azure storage primitives: Managed Disks, Azure Files, blob storage. These need seamless integration with Kubernetes lifecycle management. The abstractions differ, the failure modes differ, and operational patterns require rethinking.\u003c/p\u003e\n\u003cp\u003eComplexity multiplies with backup requirements, disaster recovery expectations, and multi-cluster data synchronization. Whether migrating legacy apps that expect local RAID controllers or building cloud-native data platforms from scratch, AKS storage architecture knowledge is foundational. Get it wrong: data loss, performance bottlenecks, escalating cloud bills.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"pvcpv-architecture-how-storage-binds-to-pods-in-aks\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#pvcpv-architecture-how-storage-binds-to-pods-in-aks\" title=\"PVC/PV Architecture: How Storage Binds to Pods in AKS\"\u003ePVC/PV Architecture: How Storage Binds to Pods in AKS\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes abstracts storage through two key objects: \u003cstrong\u003ePersistentVolumes (PV)\u003c/strong\u003e and \u003cstrong\u003ePersistentVolumeClaims (PVC)\u003c/strong\u003e. A PV represents the actual storage resource (Azure Disk, Azure Files share). A PVC represents the request for that storage. The relationship mirrors compute abstractions: nodes are physical machines, pods are logical units consuming node resources. Similarly, PVs are physical storage, PVCs are logical requests consuming PV capacity.\u003c/p\u003e\n\u003cp\u003eThe binding flow:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eDeveloper creates a PVC specifying size, access mode, and storage class\u003c/li\u003e\n\u003cli\u003eKubernetes finds or provisions a matching PV based on the storage class\u003c/li\u003e\n\u003cli\u003ePVC binds to the PV, making it available to pods\u003c/li\u003e\n\u003cli\u003ePods reference the PVC in their volume mounts\u003c/li\u003e\n\u003cli\u003eWhen the pod terminates, the PVC remains (data persists across pod lifecycles)\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eAccess modes matter:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eReadWriteOnce (RWO)\u003c/strong\u003e: Single node can mount the volume (Azure Disk)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eReadWriteMany (RWX)\u003c/strong\u003e: Multiple nodes can mount simultaneously (Azure Files)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eReadOnlyMany (ROX)\u003c/strong\u003e: Multiple nodes, read-only access\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eMost stateful apps (databases, message queues) use RWO. Azure Disks provide better IOPS and latency than Azure Files. For shared storage (parallel batch processing, shared config directories, legacy apps expecting NFS semantics), use RWX: Azure Files or third-party CSI drivers like NFS or CephFS.\u003c/p\u003e\n\u003cp\u003eCritical insight: \u003cstrong\u003ePVCs decouple storage requests from storage implementation.\u003c/strong\u003e Developers don\u0026rsquo;t need to know if they get a Premium SSD or Standard HDD. They request 100Gi of fast storage, the storage class handles provisioning. This abstraction enables platform teams to enforce policies (all production PVCs use Premium tier) without touching application manifests.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"azure-disk-vs-azure-files-performance-cost-regional-constraints\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#azure-disk-vs-azure-files-performance-cost-regional-constraints\" title=\"Azure Disk vs. Azure Files: Performance, Cost, Regional Constraints\"\u003eAzure Disk vs. Azure Files: Performance, Cost, Regional Constraints\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eChoosing between Azure Disk and Azure Files isn\u0026rsquo;t a one-size-fits-all decision. Each has distinct performance profiles, cost implications, and operational constraints.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAzure Disk (Managed Disks):\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePerformance:\u003c/strong\u003e Lower latency, higher IOPS. Premium SSDs reach 20,000 IOPS, Ultra Disks exceed that.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eAccess:\u003c/strong\u003e Single-node attachment (RWO). Pod rescheduling to another node triggers disk detach and reattach (expect brief delay).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eUse cases:\u003c/strong\u003e Databases (PostgreSQL, MongoDB), stateful apps requiring low-latency I/O.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCost:\u003c/strong\u003e Pay per provisioned disk size. A 1TB Premium SSD costs more than a 1TB Standard HDD, regardless of actual usage.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRegional constraints:\u003c/strong\u003e Disks are zone-specific. With availability zones, pods must schedule in the same zone as the disk.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eAzure Files (SMB/NFS):\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePerformance:\u003c/strong\u003e Higher latency than disks. Premium Files tier improves performance but still trails disk I/O.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eAccess:\u003c/strong\u003e Multi-node (RWX). Multiple pods across nodes can mount the same share.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eUse cases:\u003c/strong\u003e Shared logs, static assets, config files, legacy apps expecting NFS.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCost:\u003c/strong\u003e Pay per storage consumed plus transactions. Transaction costs surprise teams on high-throughput workloads.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRegional constraints:\u003c/strong\u003e File shares are regional, not zonal. Better for cross-zone workloads, still tied to single region.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eDecision criteria:\u003c/strong\u003e Default to Azure Disk for databases and high-IOPS apps. Use Azure Files only when RWX access or legacy NFS compatibility is required. For backup targets or archival storage, consider blob storage with CSI drivers (experimental, improving).\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"the-disk-attachment-penalty\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#the-disk-attachment-penalty\" title=\"The Disk Attachment Penalty\"\u003eThe Disk Attachment Penalty\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eGotcha: \u003cstrong\u003edisk attachment times.\u003c/strong\u003e Pod rescheduling requires Azure to detach the disk from the old node and attach it to the new one. This takes 30 to 90 seconds. Apps that cannot tolerate this downtime need application-level replication (PostgreSQL streaming replication) or third-party solutions like Portworx.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"storage-classes--dynamic-provisioning-automating-the-lifecycle\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#storage-classes--dynamic-provisioning-automating-the-lifecycle\" title=\"Storage Classes \u0026amp; Dynamic Provisioning: Automating the Lifecycle\"\u003eStorage Classes \u0026amp; Dynamic Provisioning: Automating the Lifecycle\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eStatic provisioning (manually creating PVs, hoping someone claims them) creates operational overhead. \u003cstrong\u003eStorage classes\u003c/strong\u003e enable dynamic provisioning: Kubernetes automatically creates a PV when a PVC is submitted.\u003c/p\u003e\n\u003cp\u003eAKS ships with default storage classes:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003edefault\u003c/code\u003e: Standard HDD Azure Disk (RWO)\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003emanaged-premium\u003c/code\u003e: Premium SSD Azure Disk (RWO)\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003eazurefile\u003c/code\u003e: Azure Files share (RWX)\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003eazurefile-premium\u003c/code\u003e: Premium Azure Files share (RWX)\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eYou can define custom storage classes to fine-tune parameters:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eapiVersion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003estorage.k8s.io/v1\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ekind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eStorageClass\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003emetadata\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003efast-ssd\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eprovisioner\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edisk.csi.azure.com\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eparameters\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eskuName\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ePremium_LRS\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ekind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eManaged\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ecachingMode\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eReadOnly\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"c\"\u003e# Zone redundant storage (ZRS) for higher durability\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"c\"\u003e# skuName: Premium_ZRS\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eallowVolumeExpansion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"kc\"\u003etrue\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ereclaimPolicy\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eRetain\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003evolumeBindingMode\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eWaitForFirstConsumer\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eKey parameters:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ereclaimPolicy:\u003c/strong\u003e \u003ccode\u003eDelete\u003c/code\u003e removes the disk when PVC is deleted, \u003ccode\u003eRetain\u003c/code\u003e keeps it. For production databases, \u003ccode\u003eRetain\u003c/code\u003e prevents accidental data deletion.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003evolumeBindingMode:\u003c/strong\u003e \u003ccode\u003eWaitForFirstConsumer\u003c/code\u003e delays PV creation until pod scheduling. Critical for zone-aware clusters (Kubernetes creates the disk in the same zone as the pod).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eallowVolumeExpansion:\u003c/strong\u003e Enables PVC resizing without recreation. Azure Disks support this, not all storage backends do.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eBest practice:\u003c/strong\u003e Create environment-specific storage classes (dev, staging, prod) with different \u003ccode\u003eskuName\u003c/code\u003e values. Dev clusters use Standard HDDs, prod uses Premium SSDs. Developers use identical manifests across environments, only the storage class name changes.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"backup--recovery-rtorpo-implications\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#backup--recovery-rtorpo-implications\" title=\"Backup \u0026amp; Recovery: RTO/RPO Implications\"\u003eBackup \u0026amp; Recovery: RTO/RPO Implications\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes doesn\u0026rsquo;t backup data by default. Running \u003ccode\u003ekubectl delete pvc\u003c/code\u003e without a recovery plan means permanent data loss.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eVelero\u003c/strong\u003e (formerly Heptio Ark) is the de facto standard for Kubernetes backup. It snapshots PVs, captures Kubernetes object state, stores backups in object storage (Azure Blob, S3, GCS).\u003c/p\u003e\n\u003cp\u003eExample Velero backup schedule (via CLI):\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Install Velero with Azure plugin\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003evelero install \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --provider azure \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --plugins velero/velero-plugin-for-microsoft-azure:v1.9.0 \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --bucket velero-backups \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --secret-file ./credentials-velero \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --backup-location-config \u003cspan class=\"nv\"\u003eresourceGroup\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003eaks-backups-rg,storageAccount\u003cspan class=\"o\"\u003e=\u003c/span\u003eaksbackupssa\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Create a daily backup schedule for production namespace\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003evelero schedule create daily-prod-backup \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --schedule\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;0 2 * * *\u0026#34;\u003c/span\u003e \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --include-namespaces production \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --snapshot-volumes \u003cspan class=\"se\"\u003e\\\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e  --ttl 720h\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\n\n\n\u003ch3 id=\"rto-and-rpo-considerations\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#rto-and-rpo-considerations\" title=\"RTO And RPO Considerations\"\u003eRTO And RPO Considerations\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003eRTO/RPO considerations:\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSnapshot-based backups (Azure Disk snapshots via Velero):\u003c/strong\u003e RPO equals backup frequency (hourly, daily). RTO equals time to provision new PV plus restore data (5 to 30 minutes).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eNative Azure Backup for AKS:\u003c/strong\u003e Microsoft managed solution. Integrated with Azure Backup policies, slower restores and less granular than Velero.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eApplication-level backups (pg_dump, mongodump):\u003c/strong\u003e Bypasses Kubernetes entirely. Lower RTO with automated restore scripts, requires custom orchestration.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eGotcha:\u003c/strong\u003e Velero relies on Azure Disk snapshots. Disk in Zone 1, restore to cluster in Zone 2 requires cross-zone snapshot copy (not instant). Test restore procedures in non-prod clusters. A backup never restored is wishful thinking.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"multi-aks-replication-patterns-for-cross-cluster-data-synchronization\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#multi-aks-replication-patterns-for-cross-cluster-data-synchronization\" title=\"Multi-AKS Replication: Patterns for Cross-Cluster Data Synchronization\"\u003eMulti-AKS Replication: Patterns for Cross-Cluster Data Synchronization\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eRunning stateful workloads across multiple AKS clusters—whether for HA, disaster recovery, or multi-region latency requirements—adds another layer of complexity.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePattern 1: Application-Level Replication\u003c/strong\u003e\nLet the application handle replication. PostgreSQL streaming replication, MongoDB replica sets, Kafka replication understand their data models and replicate efficiently.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePros:\u003c/strong\u003e No Kubernetes-specific dependencies. Works identically in VMs, on-premises, or managed services.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCons:\u003c/strong\u003e You manage replication lag, split-brain scenarios, and failover logic.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003ePattern 2: Storage-Level Replication\u003c/strong\u003e\nUse Azure NetApp Files or third-party solutions like Portworx for block or file-level replication.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePros:\u003c/strong\u003e Transparent to applications. Works with legacy apps lacking native replication.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCons:\u003c/strong\u003e Expensive. NetApp Files Premium tier and Portworx licensing (scales with node count) add significant cost.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003ePattern 3: Backup-Based DR\u003c/strong\u003e\nVelero backups from primary cluster, restore to secondary on failover.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePros:\u003c/strong\u003e Cost-effective (blob storage only).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCons:\u003c/strong\u003e RPO equals last backup interval (hours, not seconds). RTO includes restore time (minutes to hours).\u003c/li\u003e\n\u003c/ul\u003e\n\n\n\n\n\u003ch3 id=\"a-multi-region-postgresql-pattern\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#a-multi-region-postgresql-pattern\" title=\"A Multi-Region PostgreSQL Pattern\"\u003eA Multi-Region PostgreSQL Pattern\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003eReal-world example:\u003c/strong\u003e Multi-region PostgreSQL deployment pattern I\u0026rsquo;ve encountered:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePrimary AKS cluster (West Europe):\u003c/strong\u003e Production traffic\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSecondary AKS cluster (North Europe):\u003c/strong\u003e Read replicas via PostgreSQL streaming replication\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eVelero backups:\u003c/strong\u003e Azure Blob in third region (East US) for regulatory compliance\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThis provides sub-second RPO within Europe (streaming replication), hourly RPO globally (Velero), 5-minute RTO for regional failover (promote read replica).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eOperational reality:\u003c/strong\u003e Multi-cluster data replication is complex. Avoid it by using managed services (Azure Database for PostgreSQL with geo-replication) if possible. Running databases in AKS requires investment in automation, monitoring, and runbooks. Your 3 AM self will appreciate this decision.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"final-thoughts\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#final-thoughts\" title=\"Final Thoughts\"\u003eFinal Thoughts\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eStorage in AKS represents a set of trade-offs requiring deliberate navigation. Azure Disk provides performance with zone-locking. Azure Files offers flexibility with latency penalties. Velero enables backups but demands operational discipline and testing. Multi-cluster replication delivers resilience with non-linear operational complexity.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"a-pragmatic-starting-point\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#a-pragmatic-starting-point\" title=\"A Pragmatic Starting Point\"\u003eA Pragmatic Starting Point\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003ePragmatic approach: Start with managed storage classes and Velero. Use Azure Disk for databases and high-IOPS workloads. Use Azure Files only when RWX access or legacy NFS compatibility is genuinely required. Test restore procedures quarterly, not during outages. Schedule fire drills: delete a namespace, restore from backup. Measure actual RTO/RPO instead of assuming SLA compliance.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"when-to-leave-aks-for-managed-data-services\"\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/#when-to-leave-aks-for-managed-data-services\" title=\"When To Leave AKS For Managed Data Services\"\u003eWhen To Leave AKS For Managed Data Services\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eWhen stateful workload requirements outgrow AKS storage primitives (sub-second cross-region replication, disk attachment latency breaking your app, spiraling storage costs), don\u0026rsquo;t force solutions. Consider Azure managed services (Azure Database for PostgreSQL, Cosmos DB) or specialized data platforms (Confluent Cloud for Kafka, MongoDB Atlas). Sometimes the best Kubernetes storage strategy is avoiding stateful workloads in Kubernetes.\u003c/p\u003e\n\u003cp\u003eKubernetes excels at stateless orchestration. For stateful workloads, it\u0026rsquo;s capable but demands understanding the plumbing, accepting trade-offs, building operational muscle around backups, monitoring, and runbooks. Treat storage as infrastructure that will fail, not infrastructure that just works. Plan accordingly.\u003c/p\u003e\n","date_modified":"2026-05-26T10:22:03+02:00","date_published":"2026-02-04T17:00:00+01:00","id":"https://daily-devops.net/posts/storage-architecture-stateful-workloads-aks/","language":"en","summary":"PVC/PV patterns, Azure Disk vs Files trade-offs, Velero backup strategies, and cross-cluster replication for production stateful workloads in AKS.","tags":["storage","azure","kubernetes","cloud","database","reliability","operations","platform-engineering","disaster-recovery"],"title":"Storage Architecture \u0026 Stateful Workloads in AKS","url":"https://daily-devops.net/posts/storage-architecture-stateful-workloads-aks/"},{"authors":[{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"content_html":"\u003cp\u003eAKS documentation will get you to a running cluster. It won\u0026rsquo;t tell you why your pod authenticated in staging and gets a 401 in production. It won\u0026rsquo;t explain why upgrading a 50-node cluster at 2 AM felt fine but a 300-node upgrade at noon caused cascading evictions. It won\u0026rsquo;t show you which storage class to avoid when your database needs to survive node pool replacements.\u003c/p\u003e\n\u003cp\u003eThis series covers the operational reality — the decisions that distinguish AKS clusters that run quietly in production from clusters that generate 3 AM alerts. Nine articles, each examining a specific architectural domain with the specificity that matters when something breaks.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"why-aks-operations-is-different\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#why-aks-operations-is-different\" title=\"Why AKS Operations Is Different\"\u003eWhy AKS Operations Is Different\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eMicrosoft manages the AKS control plane. That sounds like less work, and in some ways it is — you don\u0026rsquo;t patch etcd, you don\u0026rsquo;t replace failed control plane VMs, you don\u0026rsquo;t worry about API server certificate rotation. What it doesn\u0026rsquo;t mean is that running AKS in production is simple or that managed Kubernetes hands you a reliable platform and steps aside.\u003c/p\u003e\n\u003cp\u003eEvery node pool configuration decision is yours. Every storage class binding, every PVC lifecycle policy, every decision about which node pool hosts which workload — that\u0026rsquo;s on you. RBAC spans three separate systems simultaneously: Kubernetes RBAC, Azure RBAC, and Azure AD. A misconfiguration in any one of them produces an access failure that looks identical from the application\u0026rsquo;s perspective. The documentation will show you how to configure each system in isolation. It will not show you why they interact in non-obvious ways under specific conditions, or what the failure mode looks like when you get the federation configuration slightly wrong.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"where-networking-stops-being-managed\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#where-networking-stops-being-managed\" title=\"Where Networking Stops Being Managed\"\u003eWhere Networking Stops Being Managed\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eNetworking is another area where \u0026ldquo;managed\u0026rdquo; has a narrower meaning than the word implies. Microsoft manages the control plane networking. Your VNet, your subnets, your IP address planning, your DNS configuration, your ingress architecture — all of it is your responsibility, and the decisions compound. IP exhaustion caused by node pool scaling is a common production incident that no amount of control plane management prevents. Private cluster DNS resolution breaks in ways that take hours to diagnose if you haven\u0026rsquo;t encountered the pattern before.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"upgrades-the-gap-between-docs-and-reality\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#upgrades-the-gap-between-docs-and-reality\" title=\"Upgrades: The Gap Between Docs and Reality\"\u003eUpgrades: The Gap Between Docs and Reality\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eUpgrades are perhaps the clearest illustration of the gap between documentation and reality. The documentation describes upgrade mechanics accurately. What it doesn\u0026rsquo;t describe is how Pod Disruption Budget misconfigurations interact with cluster autoscaler behavior during node pool drain, why the timing of upgrades relative to workload peak matters more than most teams expect, or how a PDB that looks correct on paper blocks drain indefinitely on a cluster that\u0026rsquo;s handling real traffic. Managed Kubernetes handles the control plane upgrade. The workload upgrade is a careful orchestration problem that the platform does not solve for you.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"storage-where-managed-disappears\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#storage-where-managed-disappears\" title=\"Storage: Where \u0026ldquo;Managed\u0026rdquo; Disappears\"\u003eStorage: Where \u0026ldquo;Managed\u0026rdquo; Disappears\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eStorage is where the word \u0026ldquo;managed\u0026rdquo; disappears entirely. Azure manages the underlying disk and file services. AKS provides the CSI drivers. Everything between your application and the storage backend — PVC binding, reclaim policies, volume expansion behavior, backup orchestration, behavior during node failure or node pool deletion — is configuration you own. Teams that treat storage as a detail find out it isn\u0026rsquo;t when a node pool replacement deletes volumes that were bound to nodes rather than to the cluster.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"cost-decisions-compound-silently\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#cost-decisions-compound-silently\" title=\"Cost Decisions Compound Silently\"\u003eCost Decisions Compound Silently\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eCost is a dimension that managed Kubernetes actively obscures. The control plane is free at most tiers. Node pool costs scale with what you configure, and the configuration space is large: VM SKU selection, autoscaler min/max bounds, system versus user node pool separation, spot VM integration, pod density targets. None of these have obviously correct values. All of them interact. Teams that inherit clusters often inherit cost structures that made sense at a different scale or for a different workload profile, and reversing those decisions requires careful sequencing to avoid downtime.\u003c/p\u003e\n\u003cp\u003eThe happy paths in the documentation work. They work because they\u0026rsquo;re constructed to work. Production clusters encounter the edges — the configuration combinations, the scale thresholds, the timing sensitivities — that happy paths don\u0026rsquo;t cover. This series is about the edges.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-this-series-covers\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#what-this-series-covers\" title=\"What This Series Covers\"\u003eWhat This Series Covers\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/pod-identity-access-control-aks/\"\u003ePod Identity \u0026amp; Access Control in AKS: What Actually Breaks\u003c/a\u003e\u003c/strong\u003e starts with identity because identity failures are the most common source of production incidents. Workload Identity Federation eliminates credential lifecycle problems but introduces configuration complexity spanning three separate RBAC systems — Kubernetes RBAC, Azure RBAC, and Azure AD permissions. The article explains where credentials still leak despite federation, how layers interact and fail, and validation patterns that catch misconfigurations before they become incidents.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/storage-architecture-stateful-workloads-aks/\"\u003eStorage Architecture \u0026amp; Stateful Workloads in AKS\u003c/a\u003e\u003c/strong\u003e addresses what most AKS guides skip: what actually happens to your data when a node gets replaced. PVC/PV architecture, Azure Disk versus Azure Files performance trade-offs, Velero backup configurations that survive real restore scenarios, and multi-cluster replication patterns for production stateful workloads.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/cost-optimization-resource-governance-aks/\"\u003eAKS Cost Optimization: Resource Governance That Actually Works\u003c/a\u003e\u003c/strong\u003e covers the gap between \u0026ldquo;set resource limits\u0026rdquo; and actually controlling spend at scale. Pod density strategies, node pool design decisions that compound over time, spot VM integration without reliability regressions, and FinOps tagging that produces actionable cost attribution rather than unread dashboards.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/multi-aks-cluster-networking-hub-spoke/\"\u003eMulti-AKS Cluster Networking \u0026amp; Hub-Spoke Topology\u003c/a\u003e\u003c/strong\u003e examines what happens to networking when you move from one cluster to many. VNet peering patterns, hub-spoke routing, cross-cluster DNS resolution, shared ingress options, and — critically — the decision criteria for when mesh complexity becomes justified rather than premature.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/cluster-upgrades-zero-downtime-aks/\"\u003eAKS Cluster Upgrades: Zero-Downtime Operations That Actually Work\u003c/a\u003e\u003c/strong\u003e covers upgrade mechanics that documentation describes optimistically. Cordon and drain behavior, Pod Disruption Budget configuration that prevents service disruption rather than theater-level protection, multi-node-pool rollout strategies, and validation-driven automation that makes upgrades reproducible rather than heroic.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/container-registry-image-security-aks/\"\u003eContainer Registry \u0026amp; Image Security in AKS Deployments\u003c/a\u003e\u003c/strong\u003e covers ACR hardening beyond the basics. A production-ready sequence: vulnerability scanning, image signing with Notation, RBAC scoping, private endpoints, policy enforcement through Azure Policy and admission controllers, and geo-replication strategies with clear trade-offs explained.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/disaster-recovery-business-continuity-aks/\"\u003eAKS Disaster Recovery: Why Your Untested Backup Will Fail\u003c/a\u003e\u003c/strong\u003e addresses the gap between having backups and having a tested recovery plan. Velero configuration, realistic RTO/RPO targets that match business risk rather than wishful thinking, restore testing procedures that catch problems before outages, and multi-region failover steps your team can actually execute under pressure.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/hybrid-aks-on-prem-azure-arc/\"\u003eHybrid AKS: Bridging Cloud and On-Prem with Azure Arc\u003c/a\u003e\u003c/strong\u003e covers the operational patterns for organizations running Kubernetes across cloud and on-premises simultaneously. ExpressRoute and VPN connectivity, Azure Arc for unified management across heterogeneous environments, consistent policy enforcement, DNS resolution, and identity federation without duplicating systems.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003ca href=\"/posts/aks-at-scale-mega-cluster-lessons/\"\u003eAKS at Scale: Hard-Won Lessons from 1000+ Node Clusters\u003c/a\u003e\u003c/strong\u003e closes the series with what changes when clusters grow large enough that the platform itself becomes the bottleneck. etcd limits under high object churn, network saturation at scale, observability overhead that compounds with cluster size, and cost spirals that emerge from architectural decisions that seemed fine at 50 nodes.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"who-this-is-for\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#who-this-is-for\" title=\"Who This Is For\"\u003eWho This Is For\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003ePlatform engineers and infrastructure-focused developers responsible for AKS clusters in production — or teams about to inherit that responsibility. Each article assumes you\u0026rsquo;ve run AKS before and want operational depth, not introductory setup instructions.\u003c/p\u003e\n\u003cp\u003eThe series covers Terraform, Bicep, Kubectl, and Azure CLI patterns throughout. Examples are grounded in production scenarios rather than constructed to demonstrate features.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"how-these-articles-were-written\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#how-these-articles-were-written\" title=\"How These Articles Were Written\"\u003eHow These Articles Were Written\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eEach article in this series is based on production experience — clusters that handled real traffic, failed in real ways, and required real fixes under time pressure. That distinction matters for what you\u0026rsquo;ll find here and what you won\u0026rsquo;t.\u003c/p\u003e\n\u003cp\u003eProduction experience means the failure patterns are specific. Not \u0026ldquo;storage can be tricky\u0026rdquo; but which storage class binding decisions survive node pool replacements and which don\u0026rsquo;t. Not \u0026ldquo;upgrades can cause downtime\u0026rdquo; but which combination of PDB configuration and autoscaler behavior produces an indefinitely blocked drain. Not \u0026ldquo;identity is complex\u0026rdquo; but the exact configuration gap in Workload Identity Federation that causes silent auth failures in one environment and not another. The specificity isn\u0026rsquo;t for its own sake — it\u0026rsquo;s the difference between an article that confirms your intuition and one that actually changes what you configure next.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"trade-offs-over-single-right-answers\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#trade-offs-over-single-right-answers\" title=\"Trade-offs Over Single Right Answers\"\u003eTrade-offs Over Single Right Answers\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eWhat production experience doesn\u0026rsquo;t mean is that every approach here is the only valid one. Large-scale AKS operation involves genuine trade-offs — between cost and resilience, between operational simplicity and flexibility, between standardization and workload-specific tuning. The articles explain the reasoning behind recommendations rather than just stating them, because the reasoning is what lets you adapt the approach to your constraints. A node pool design that works for a batch processing workload is wrong for a latency-sensitive API, and the article on cost governance explains why rather than presenting a single correct answer.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"when-aks-makes-things-harder\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#when-aks-makes-things-harder\" title=\"When AKS Makes Things Harder\"\u003eWhen AKS Makes Things Harder\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eThe articles were not written to showcase features or to demonstrate that AKS has a solution for every problem. Some of them document problems that AKS makes harder than it should be, and say so directly. If a particular architectural pattern has a known failure mode at scale, that failure mode appears in the article rather than in a footnote or an FAQ three pages into the documentation. If a feature has a meaningful limitation that affects how you should configure it, that limitation is in the main text, not in a callout box labeled \u0026ldquo;note.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe goal is for these articles to be the thing you read before a production incident rather than the thing you find during one.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"where-to-start\"\u003e\u003ca href=\"/posts/aks-architecture-operations/#where-to-start\" title=\"Where to Start\"\u003eWhere to Start\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eRead in published order if you\u0026rsquo;re building out AKS infrastructure from scratch — identity and storage are foundational, and later articles reference earlier concepts. Jump to specific articles if you\u0026rsquo;re dealing with an immediate operational problem: the titles are specific enough that the right article for your situation should be obvious.\u003c/p\u003e\n\u003cp\u003eThe scale article at the end is worth reading early if your cluster is already growing or if you\u0026rsquo;re designing for growth — some architectural decisions made at 50 nodes are expensive to reverse at 500.\u003c/p\u003e\n","date_modified":"2026-05-25T23:41:10+02:00","date_published":"2026-01-21T17:00:00+01:00","id":"https://daily-devops.net/posts/aks-architecture-operations/","language":"en","summary":"Nine articles on production AKS—identity, storage, multi-cluster networking, cost governance, DR, and running 1000-node clusters in practice.","tags":["kubernetes","azure","cloud","devops","operations","platform-engineering"],"title":"AKS Architecture \u0026 Operations — The Complete Series","url":"https://daily-devops.net/posts/aks-architecture-operations/"},{"authors":[{"name":"Martin Stühmer","url":"https://daily-devops.net/authors/martin/"}],"content_html":"\u003cp\u003eKubernetes has transitioned from a technical option to an assumed default. In organizations and projects I\u0026rsquo;ve worked with, discussions no longer start with whether Kubernetes is appropriate. They start with migration timelines. I\u0026rsquo;ve sat through planning sessions where the question wasn\u0026rsquo;t \u0026ldquo;Should we use Kubernetes?\u0026rdquo; but rather \u0026ldquo;When can we have everything moved over?\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThis shift isn\u0026rsquo;t driven by application requirements. It\u0026rsquo;s driven by narrative. Consulting decks and reference architectures present \u003cem\u003e\u003cstrong\u003eKubernetes as a universal platform\u003c/strong\u003e\u003c/em\u003e that absorbs governance, security, scalability, observability, recovery, and operational responsibility. The implicit promise: once your software runs on Kubernetes, the hard parts are handled. I\u0026rsquo;ve watched teams adopt this belief wholesale, only to discover the gaps six months into production.\u003c/p\u003e\n\u003cp\u003eThat promise is incomplete. Kubernetes primarily addresses \u003cstrong\u003eone phase\u003c/strong\u003e: runtime orchestration. Most architectural risk, cost overruns, and operational failures occur \u003cstrong\u003ebefore\u003c/strong\u003e runtime during design and delivery, or \u003cstrong\u003eafter\u003c/strong\u003e runtime when incidents happen and systems evolve. I\u0026rsquo;ve debugged production incidents where Kubernetes ran flawlessly while the system failed spectacularly because architectural problems existed upstream and downstream of container orchestration.\u003c/p\u003e\n\u003cp\u003eTreating Kubernetes as a lifecycle platform rather than a runtime component introduces complexity that stays invisible during planning and becomes unavoidable in production. The demos look clean. The reference architectures are elegant. Then you hit reality.\u003c/p\u003e\n\u003cp\u003eTwo questions matter: Not whether Kubernetes works (it does, consistently, in its domain), but where its responsibility ends and whether your organization can handle what lies beyond those boundaries.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"kubernetes-in-the-net-reality\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#kubernetes-in-the-net-reality\" title=\"Kubernetes in the .NET Reality\"\u003eKubernetes in the .NET Reality\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes clusters rarely host a single, clean workload type in practice. They become convergence points: ASP.NET Core APIs, background workers, event-driven processors, migrated Windows Services, and platform components all sharing infrastructure. I\u0026rsquo;ve inherited clusters running everything from modern microservices to decade-old .NET Framework services wrapped in Windows containers, all competing for the same resources.\u003c/p\u003e\n\u003cp\u003eFor stateless, Linux-based ASP.NET Core services, Kubernetes is genuinely strong. Deployments are predictable. Rollouts are controlled. Health checks integrate cleanly. You implement a simple health endpoint:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-csharp\" data-lang=\"csharp\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kt\"\u003evar\u003c/span\u003e \u003cspan class=\"n\"\u003ebuilder\u003c/span\u003e \u003cspan class=\"p\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eWebApplication\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eCreateBuilder\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eargs\u003c/span\u003e\u003cspan class=\"p\"\u003e);\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003ebuilder\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eServices\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eAddHealthChecks\u003c/span\u003e\u003cspan class=\"p\"\u003e();\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kt\"\u003evar\u003c/span\u003e \u003cspan class=\"n\"\u003eapp\u003c/span\u003e \u003cspan class=\"p\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003ebuilder\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eBuild\u003c/span\u003e\u003cspan class=\"p\"\u003e();\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eapp\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eMapHealthChecks\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s\"\u003e\u0026#34;/health\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e);\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eapp\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eRun\u003c/span\u003e\u003cspan class=\"p\"\u003e();\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThen you deploy 3 replicas and Kubernetes does what you asked: it keeps exactly 3 running, rolling out updates without downtime, removing failed pods from traffic automatically. You push a new image and watch the update complete—no manual intervention, no traffic loss, no coordination overhead.\u003c/p\u003e\n\u003cp\u003eThis is where Kubernetes works exactly as intended: the application exposes its state honestly, and the platform responds intelligently. Three replicas means three replicas, constantly. A pod fails, it gets replaced within seconds. A rolling update happens seamlessly because Kubernetes orchestrates the transition and the application cooperates through its health endpoint. The first time you watch this happen without manually managing anything, it feels like magic.\u003c/p\u003e\n\u003cp\u003eThis experience—predictable, reliable, hands-off—becomes the template in your mind for how Kubernetes should work everywhere.\u003c/p\u003e\n\u003cp\u003eThe mistake begins when this success gets generalized. I\u0026rsquo;ve seen this pattern repeatedly: success with stateless APIs leads to confidence that everything belongs in Kubernetes. Then the complexity arrives.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"governance-structure-without-enforcement\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#governance-structure-without-enforcement\" title=\"Governance: Structure Without Enforcement\"\u003eGovernance: Structure Without Enforcement\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes offers namespaces, labels, and RBAC. These are primitives, not governance. Real enterprise governance requires enforceable policy, auditability, cost attribution, and environmental separation. In Azure-centric environments, these concerns traditionally live at the subscription, management group, and Azure Policy layer, where they\u0026rsquo;re auditable, mandatory, and enforced at the platform level.\u003c/p\u003e\n\u003cp\u003eIntroducing Kubernetes adds a second governance plane. Without deliberate policy enforcement, clusters drift. I\u0026rsquo;ve seen production and experimental workloads coexist in the same cluster because namespace isolation felt sufficient. It wasn\u0026rsquo;t. Cost attribution becomes opaque. Who actually paid for that node pool? Which business unit owns this? When incidents happen, these questions waste critical time.\u003c/p\u003e\n\u003cp\u003eIn one organization, we discovered experimental ML workloads running on production infrastructure because someone had \u003ccode\u003ekubectl\u003c/code\u003e access and \u0026ldquo;just needed to test something quickly.\u0026rdquo; The namespace separation existed. The policy enforcement didn\u0026rsquo;t.\u003c/p\u003e\n\u003cp\u003eKubernetes doesn\u0026rsquo;t prevent this drift. It accelerates it by making deployment so frictionless that governance becomes an afterthought.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"identity-kubernetes-stops-where-entra-id-starts\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#identity-kubernetes-stops-where-entra-id-starts\" title=\"Identity: Kubernetes Stops Where Entra ID Starts\"\u003eIdentity: Kubernetes Stops Where Entra ID Starts\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003e.NET applications rely on Entra ID (formerly Azure AD) for authentication, authorization, managed identities, and conditional access. Kubernetes has no native concept of enterprise identity. It doesn\u0026rsquo;t integrate with Entra ID\u0026rsquo;s policy layer, conditional access rules, or compliance tracking. This isn\u0026rsquo;t a limitation; it\u0026rsquo;s architectural reality.\u003c/p\u003e\n\u003cp\u003eKubernetes RBAC governs access to cluster resources: who can deploy pods, create services, read secrets. But application identity—the identity your code runs under, the services it authenticates to, the permissions it holds—that\u0026rsquo;s entirely separate. Kubernetes facilitates the technical handshake (workload identity token exchange), but the authority making identity decisions lives outside the cluster in Entra ID. Your application integrates with Entra ID directly, not through Kubernetes.\u003c/p\u003e\n\u003cp\u003eThis boundary is invisible until you\u0026rsquo;re three months into production and security asks about conditional access policies, device compliance rules, or audit trails. Kubernetes doesn\u0026rsquo;t track any of that. It can\u0026rsquo;t. The identity system is external, and Kubernetes merely provides the plumbing to connect to it.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve worked with teams who expected Kubernetes to handle enterprise identity because it handled everything else. It doesn\u0026rsquo;t. That realization typically arrives when security reviews surface the integration gaps.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"networking-where-kubernetes-abstraction-fails-first\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#networking-where-kubernetes-abstraction-fails-first\" title=\"Networking: Where Kubernetes Abstraction Fails First\"\u003eNetworking: Where Kubernetes Abstraction Fails First\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eNetworking is where Kubernetes myths collapse fastest. I\u0026rsquo;ve seen the most preventable production incidents here. Kubernetes introduces its own networking model, but it doesn\u0026rsquo;t replace enterprise networking. It operates \u003cstrong\u003einside\u003c/strong\u003e it. This distinction matters when things go wrong.\u003c/p\u003e\n\u003cp\u003eIn Azure-based architectures, your first line of defense exists outside the cluster:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eVirtual networks and subnet isolation\u003c/li\u003e\n\u003cli\u003eUser-defined routing (UDR)\u003c/li\u003e\n\u003cli\u003eAzure Firewall or Network Virtual Appliance (NVA)\u003c/li\u003e\n\u003cli\u003eApplication Gateway or Front Door with Web Application Firewall (WAF)\u003c/li\u003e\n\u003cli\u003ePrivate endpoints and service endpoints\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIngress controllers route traffic. They don\u0026rsquo;t defend the network. They\u0026rsquo;re application-layer components running inside pods, not hardened network appliances.\u003c/p\u003e\n\u003cp\u003eTreating Kubernetes ingress as your security perimeter shifts responsibility from hardened network controls to application-level components that were never designed to absorb hostile traffic at scale. I\u0026rsquo;ve seen this assumption lead to security incidents where attackers bypassed ingress controllers by targeting services directly once they gained cluster access.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"azure-cni-and-ip-exhaustion\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#azure-cni-and-ip-exhaustion\" title=\"Azure CNI and IP Exhaustion\"\u003eAzure CNI and IP Exhaustion\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eWith Azure CNI, every pod consumes a real IP address from your virtual network subnet. Scaling pods means scaling IP consumption linearly. Poor subnet sizing surfaces late—usually in production when teams suddenly can\u0026rsquo;t scale further and the error message is cryptic. Kubernetes schedules pods until the network says no, then fails silently.\u003c/p\u003e\n\u003cp\u003eThis isn\u0026rsquo;t a Kubernetes failure. It\u0026rsquo;s a networking responsibility that Kubernetes exposes. I\u0026rsquo;ve debugged this scenario more times than I\u0026rsquo;d like to admit, always with the same root cause: network planning happened before anyone calculated peak pod counts under load.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"east-west-traffic-and-lateral-movement\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#east-west-traffic-and-lateral-movement\" title=\"East-West Traffic and Lateral Movement\"\u003eEast-West Traffic and Lateral Movement\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eKubernetes networking is flat by default. Every pod can reach every other pod within the cluster. Network policies are optional and frequently incomplete. In organizations without dedicated platform teams, they\u0026rsquo;re often absent entirely.\u003c/p\u003e\n\u003cp\u003eFor multi-service .NET systems, this makes lateral movement trivial once any single pod is compromised. An attacker who gains access to a frontend pod can immediately probe backend services, database connections, and internal APIs. Kubernetes provides the mechanism (network policies) but doesn\u0026rsquo;t enforce discipline. I worked on an incident response where a compromised pod accessed 12 different internal services before we detected it. Network policies existed in the repository. They weren\u0026rsquo;t applied.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"egress-control\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#egress-control\" title=\"Egress Control\"\u003eEgress Control\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eIngress gets constant attention: WAF rules, TLS certificates, rate limiting. Egress almost never does. By default, all pods can reach the internet: any destination, any port. In regulated environments, that\u0026rsquo;s unacceptable. Egress control requires forced routing through Azure Firewall and explicit allow-listing of destinations.\u003c/p\u003e\n\u003cp\u003eKubernetes has no native concept of allowed destinations. You build this external to the cluster, then spend weeks troubleshooting why perfectly valid application calls fail because someone forgot to allow-list a critical API endpoint.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"security-responsibility-is-concentrated-not-removed\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#security-responsibility-is-concentrated-not-removed\" title=\"Security: Responsibility Is Concentrated, Not Removed\"\u003eSecurity: Responsibility Is Concentrated, Not Removed\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes provides security mechanisms. Almost none are enabled by default. A .NET application on Azure App Service benefits from opinionated defaults: automatic image scanning, encrypted secrets, preconfigured network isolation, integrated runtime monitoring.\u003c/p\u003e\n\u003cp\u003eIn Kubernetes, every guarantee requires deliberate recreation:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eImage provenance through admission controllers and policy enforcement\u003c/li\u003e\n\u003cli\u003eSecret handling through external secret stores (Azure Key Vault integration)\u003c/li\u003e\n\u003cli\u003eNetwork segmentation through network policies and firewall rules\u003c/li\u003e\n\u003cli\u003eRuntime monitoring through service mesh sidecars or host-level agents\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eEach added controller or sidecar increases capability and attack surface simultaneously. I\u0026rsquo;ve reviewed Kubernetes configurations where security controls outnumbered application pods. The cluster became a security platform that happened to run some software.\u003c/p\u003e\n\u003cp\u003eKubernetes doesn\u0026rsquo;t reduce security effort. It concentrates it into your platform team, assuming you have one.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"cicd-and-supply-chain-kubernetes-consumes-trust\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#cicd-and-supply-chain-kubernetes-consumes-trust\" title=\"CI/CD and Supply Chain: Kubernetes Consumes Trust\"\u003eCI/CD and Supply Chain: Kubernetes Consumes Trust\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes consumes artifacts. It doesn\u0026rsquo;t produce trust. CI pipelines, artifact promotion, image immutability, and signing decisions all happen long before Kubernetes schedules a pod. A broken supply chain can\u0026rsquo;t be repaired at runtime. If a malicious image makes it to your registry, Kubernetes will happily deploy it.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve worked with a team who discovered their CI pipeline had been compromised for three weeks. Kubernetes deployed every malicious image perfectly—on schedule, with zero-downtime rolling updates. The orchestration worked flawlessly. The supply chain didn\u0026rsquo;t. Kubernetes enforces desired state but doesn\u0026rsquo;t validate how that state was produced. That validation is your responsibility in your build pipelines, artifact registries, and admission controllers.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"observability-infrastructure-metrics-are-not-insight\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#observability-infrastructure-metrics-are-not-insight\" title=\"Observability: Infrastructure Metrics Are Not Insight\"\u003eObservability: Infrastructure Metrics Are Not Insight\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes emits metrics and logs: CPU usage per pod, memory consumption, network I/O. These describe platform health, not system behavior. .NET systems require application-level observability—distributed tracing across service boundaries, dependency tracking to external systems, structured logging with correlation IDs.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-csharp\" data-lang=\"csharp\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003ebuilder\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eServices\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eAddOpenTelemetry\u003c/span\u003e\u003cspan class=\"p\"\u003e()\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eWithTracing\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003et\u003c/span\u003e \u003cspan class=\"p\"\u003e=\u0026gt;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e        \u003cspan class=\"n\"\u003et\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eAddAspNetCoreInstrumentation\u003c/span\u003e\u003cspan class=\"p\"\u003e()\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e         \u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eAddHttpClientInstrumentation\u003c/span\u003e\u003cspan class=\"p\"\u003e());\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eWithout integration into Azure Monitor and Application Insights, incidents become reconstruction exercises. I\u0026rsquo;ve sat in war rooms where Kubernetes dashboards stayed green—all pods healthy, all nodes operational—while users experienced cascading timeouts. Pod restarts hide underlying failures instead of surfacing them. A pod that crashes and restarts every 30 seconds looks \u0026ldquo;healthy\u0026rdquo; to Kubernetes if it passes health checks between crashes.\u003c/p\u003e\n\u003cp\u003eObservability requires design. You bring it, or you debug blind.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"scalability-kubernetes-scales-pods-not-systems\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#scalability-kubernetes-scales-pods-not-systems\" title=\"Scalability: Kubernetes Scales Pods, Not Systems\"\u003eScalability: Kubernetes Scales Pods, Not Systems\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes scales replicas, not architectures. Database contention, synchronous dependencies, external API limits—they all remain regardless of how many pod copies you create. Kubernetes can amplify bottlenecks just as effectively as it amplifies capacity.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve watched auto-scaling create 50 pod replicas, all waiting for the same database connection pool that maxed out at 100 connections. More pods didn\u0026rsquo;t solve the problem—they made it worse by consuming resources while waiting.\u003c/p\u003e\n\u003cp\u003eEvent-driven scaling improves this, but only with architectural redesign. Kubernetes enables the \u003cstrong\u003emechanism\u003c/strong\u003e for elasticity—you can scale replicas based on external signals. But the architecture determines whether that mechanism translates into actual scalability. Scaling 50 pods won\u0026rsquo;t help if they\u0026rsquo;re all waiting on the same bottleneck. That\u0026rsquo;s a design problem, not an orchestration problem.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"backup-and-recovery-kubernetes-stops-completely\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#backup-and-recovery-kubernetes-stops-completely\" title=\"Backup and Recovery: Kubernetes Stops Completely\"\u003eBackup and Recovery: Kubernetes Stops Completely\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes restarts containers. It doesn\u0026rsquo;t restore systems. State lives outside the cluster in databases, message queues, caches, and storage accounts. Backup and recovery remain responsibilities of data platforms and operational processes. Kubernetes has no concept of business continuity or disaster recovery beyond \u0026ldquo;restart the pod.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eHigh availability masks failure. It doesn\u0026rsquo;t undo it. A corrupted database doesn\u0026rsquo;t care how many pod replicas exist or how fast Kubernetes can reschedule them. I\u0026rsquo;ve responded to incidents where Kubernetes performed perfectly—immediate failover, health-driven routing—while the underlying data corruption spread across all replicas.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"windows-containers-on-kubernetes-a-strong-architectural-smell\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#windows-containers-on-kubernetes-a-strong-architectural-smell\" title=\"Windows Containers on Kubernetes: A Strong Architectural Smell\"\u003eWindows Containers on Kubernetes: A Strong Architectural Smell\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eWindows containers are supported but introduce slower startup times (minutes versus seconds), limited ecosystem support, and operational asymmetry—separate node pools, different update cadence, higher costs. They\u0026rsquo;re frequently used to avoid refactoring legacy workloads, turning Kubernetes into a compatibility layer rather than a platform.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;ve seen .NET Framework applications from 2010 wrapped in Windows containers and deployed to Kubernetes because \u0026ldquo;we\u0026rsquo;re moving to cloud-native.\u0026rdquo; The workload hadn\u0026rsquo;t changed. The infrastructure complexity increased dramatically. They function, they complicate operations, and they rarely age well.\u003c/p\u003e\n\u003cp\u003eEvery Windows container deployment I\u0026rsquo;ve reviewed eventually became a maintenance burden. The startup time alone makes scaling problematic. Windows licensing costs amplify infrastructure expenses. And the operational split between Linux and Windows node pools fragments your platform team\u0026rsquo;s expertise.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"cost-and-organizational-economics\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#cost-and-organizational-economics\" title=\"Cost and Organizational Economics\"\u003eCost and Organizational Economics\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes isn\u0026rsquo;t cost-neutral—a realization that typically arrives 3-6 months after initial deployment when finance asks why cloud costs doubled. It shifts cost visibility from infrastructure to organization: platform teams grow from 2 to 8 people, node pools sit idle waiting for burst capacity that happens twice a month, Windows nodes amplify costs through licensing and compute, observability instrumentation adds runtime overhead and egress costs.\u003c/p\u003e\n\u003cp\u003eTechnical efficiency—improved resource utilization through bin-packing and scheduling—often comes at \u003cstrong\u003eorganizational expense\u003c/strong\u003e: larger platform teams, slower iteration velocity (every change needs cluster-wide validation), distributed debugging complexity (which of the 15 services in the trace actually caused the timeout?).\u003c/p\u003e\n\u003cp\u003eThe calculation isn\u0026rsquo;t universal. It depends on workload mix, team structure, organizational tolerance for operational complexity. For companies running 200+ microservices with dedicated SRE teams, Kubernetes pays dividends. For companies running 8 services with 3 developers, it\u0026rsquo;s often overhead.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"conclusion-kubernetes-concentrates-architectural-responsibility\"\u003e\u003ca href=\"/posts/kubernetes-not-platform-strategy/#conclusion-kubernetes-concentrates-architectural-responsibility\" title=\"Conclusion: Kubernetes Concentrates Architectural Responsibility\"\u003eConclusion: Kubernetes Concentrates Architectural Responsibility\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes is powerful and, in specific scenarios, the right choice: stateless Linux-based APIs with clean 12-factor design, event-driven background workers that scale horizontally, organizations with dedicated platform teams who can absorb operational complexity, and standardized workload portfolios where 80%+ of applications fit predictable patterns.\u003c/p\u003e\n\u003cp\u003eOutside these boundaries, Kubernetes doesn\u0026rsquo;t remove responsibility. It concentrates it. The responsibilities I\u0026rsquo;ve outlined (governance, identity, networking, security, observability, backup) don\u0026rsquo;t disappear. They become explicit architectural decisions that someone on your team must own, implement, and maintain.\u003c/p\u003e\n\u003cp\u003eKubernetes is not governance. That lives at the subscription, policy, and organizational level. It\u0026rsquo;s not identity. That authority is Entra ID. It\u0026rsquo;s not the security perimeter. That\u0026rsquo;s the network, the firewall, and the defense-in-depth controls you build around the cluster. It\u0026rsquo;s not backup and recovery. That responsibility belongs to data platforms and business continuity planning. It\u0026rsquo;s not observability. That\u0026rsquo;s an application design concern requiring deliberate instrumentation.\u003c/p\u003e\n\u003cp\u003eKubernetes orchestrates workloads, and it does this extremely well.\u003c/p\u003e\n\u003cp\u003eFrom an architect\u0026rsquo;s perspective—someone who has designed, deployed, and maintained these systems in production—Kubernetes can be the most visible component of a hosting solution but never the \u003cstrong\u003ewhole\u003c/strong\u003e solution. The promise that it absorbs the software lifecycle is marketing, not engineering reality.\u003c/p\u003e\n\u003cp\u003eThat distinction isn\u0026rsquo;t theoretical. It\u0026rsquo;s operational reality I\u0026rsquo;ve experienced across multiple organizations, multiple industries, multiple failure modes.\u003c/p\u003e\n\u003cp\u003eThe question isn\u0026rsquo;t whether Kubernetes works—it does, consistently, predictably, within its domain. The question is whether your organization can handle everything Kubernetes \u003cstrong\u003edoesn\u0026rsquo;t\u003c/strong\u003e do, and whether the complexity trade-off makes sense for your specific context, team capability, and workload characteristics.\u003c/p\u003e\n\u003cp\u003eAnswer that question honestly before committing your platform strategy.\u003c/p\u003e\n","date_modified":"2026-05-26T10:22:03+02:00","date_published":"2026-01-13T17:00:00+01:00","id":"https://daily-devops.net/posts/kubernetes-not-platform-strategy/","language":"en","summary":"Kubernetes orchestrates containers brilliantly. But governance, identity, and recovery live elsewhere—and ignoring those boundaries breaks production.\n","tags":["kubernetes","architecture","platform-engineering","dotnet","cloudnative"],"title":"Kubernetes Is Not a Platform Strategy\n","url":"https://daily-devops.net/posts/kubernetes-not-platform-strategy/"},{"authors":[{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"content_html":"\u003cp\u003eNetwork segmentation is a fundamental security control for modern Kubernetes environments. AKS supports multiple networking models such as kubenet, Azure CNI, and overlay CNIs. The networking model matters, but the decisive factor for enforcing isolation and compliance is the consistent application of network policies.\u003c/p\u003e\n\u003cp\u003eThis article describes how network policies work in AKS, the available engines, practical examples, and recommended practices for enforcing a zero-trust posture within a cluster.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"why-network-policies-matter\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#why-network-policies-matter\" title=\"Why network policies matter\"\u003eWhy network policies matter\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubernetes permissively allows pod-to-pod communication by default, which simplifies operations but increases risk. Without network policies, an attacker or a compromised workload can move laterally, access internal services, exfiltrate data, or generate unintended traffic. Network policies let you express explicit allow rules, reducing the cluster attack surface and supporting compliance requirements.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"aks-network-policy-engines\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#aks-network-policy-engines\" title=\"AKS network policy engines\"\u003eAKS network policy engines\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eAKS offers two commonly used network policy implementations. Choose based on feature needs and operational constraints.\u003c/p\u003e\n\u003cp\u003eAKS also supports Cilium as a network policy and dataplane option. Evaluate Cilium if you require advanced eBPF-based dataplane features or different dataplane capabilities (see Microsoft Docs).\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"azure-network-policies\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#azure-network-policies\" title=\"Azure Network Policies\"\u003eAzure Network Policies\u003c/a\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eNative AKS integration.\u003c/li\u003e\n\u003cli\u003eRequires Azure CNI (see Microsoft Docs: Use network policies in AKS).\u003c/li\u003e\n\u003cli\u003eHigh performance and deep integration with Azure networking.\u003c/li\u003e\n\u003cli\u003ePolicies are enforced by Azure\u0026rsquo;s policy manager.\u003c/li\u003e\n\u003cli\u003eBest suited for organizations that prefer a managed, Azure-native solution.\u003c/li\u003e\n\u003c/ul\u003e\n\n\n\n\n\u003ch3 id=\"calico-network-policies\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#calico-network-policies\" title=\"Calico Network Policies\"\u003eCalico Network Policies\u003c/a\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eOpen-source and widely adopted.\u003c/li\u003e\n\u003cli\u003eSupports advanced features such as egress controls and global policies.\u003c/li\u003e\n\u003cli\u003eWorks with Azure CNI and kubenet (see Microsoft Docs: Use network policies in AKS).\u003c/li\u003e\n\u003cli\u003eSuitable for complex architectures, multi-cloud deployments, or teams that need granular L3/L4 controls.\u003c/li\u003e\n\u003c/ul\u003e\n\n\n\n\n\u003ch2 id=\"how-network-policies-work\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#how-network-policies-work\" title=\"How network policies work\"\u003eHow network policies work\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eNetwork policies declare allowed traffic in terms of pod selectors, namespace selectors, ports, and protocol. A policy can specify ingress rules, egress rules, or both. Importantly, once any policy selects a pod, the implicit behavior becomes deny for traffic not explicitly allowed. That default-deny behavior is the basis for predictable and auditable isolation.\u003c/p\u003e\n\u003cp\u003eNote: Network policy is commonly set at cluster creation (for example: \u003ccode\u003eaz aks create --network-plugin azure --network-policy azure\u003c/code\u003e). You can enable or change the network policy engine on an existing cluster (for example: \u003ccode\u003eaz aks update --resource-group myRG --name myAKSCluster --network-policy calico\u003c/code\u003e). However, changing the network policy can trigger node-pool reimaging and temporary disruption.\u003c/p\u003e\n\u003cp\u003ePractical maintenance steps when changing network policies:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eTest the change in a staging cluster first. Example create command for a disposable test cluster:\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003eaz aks create -g myRG -n test-cluster --network-plugin azure --network-policy calico --node-count \u003cspan class=\"m\"\u003e1\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cul\u003e\n\u003cli\u003eWhen rolling changes through production, update one node pool at a time and verify workloads before proceeding.\u003c/li\u003e\n\u003cli\u003eBefore making changes, cordon and drain affected nodes to allow graceful eviction:\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl cordon \u0026lt;node-name\u0026gt;\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl drain \u0026lt;node-name\u0026gt; --ignore-daemonsets --delete-local-data\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cul\u003e\n\u003cli\u003eAfter the update, validate workloads and then uncordon nodes: \u003ccode\u003ekubectl uncordon \u0026lt;node-name\u0026gt;\u003c/code\u003e.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003ePlan a maintenance window for these operations and automate the rollback or node-pool recreation path if validation fails.\u003c/p\u003e\n\u003cp\u003eNote: Kubernetes NetworkPolicy is an L3/L4 mechanism. It controls IP and port level access between pods and namespaces. For L7 (HTTP/FQDN) filtering you need an engine that explicitly supports L7 policies (for example, Cilium\u0026rsquo;s L7 features) or a service-mesh / proxy-based approach.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"practical-example-allow-only-specific-traffic\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#practical-example-allow-only-specific-traffic\" title=\"Practical example: Allow only specific traffic\"\u003ePractical example: Allow only specific traffic\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eThis policy allows only requests from pods labeled role=app to pods labeled role=backend on TCP port 8080 in the production namespace.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yml\" data-lang=\"yml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eapiVersion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003enetworking.k8s.io/v1\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ekind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eNetworkPolicy\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003emetadata\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eallow-app-to-backend\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003enamespace\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eproduction\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003espec\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003epodSelector\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003ematchLabels\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003erole\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ebackend\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eingress\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e- \u003cspan class=\"nt\"\u003efrom\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e- \u003cspan class=\"nt\"\u003epodSelector\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e            \u003c/span\u003e\u003cspan class=\"nt\"\u003ematchLabels\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e              \u003c/span\u003e\u003cspan class=\"nt\"\u003erole\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eapp\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003eports\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e- \u003cspan class=\"nt\"\u003eprotocol\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eTCP\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003eport\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"m\"\u003e8080\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eWithout other allow rules, all other traffic to the selected backend pods will be blocked. This approach supports a least-privilege model for intra-cluster communication.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"how-to-validate-policies\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#how-to-validate-policies\" title=\"How to validate policies\"\u003eHow to validate policies\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eQuick validation steps you can run in a test cluster:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eCreate a small test cluster with Calico enabled:\u003c/li\u003e\n\u003c/ol\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003eaz aks create -g myRG -n test-calico --network-plugin azure --network-policy calico --node-count \u003cspan class=\"m\"\u003e1\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003col\u003e\n\u003cli\u003eDeploy two lightweight pods and verify connectivity:\u003c/li\u003e\n\u003c/ol\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl run client --image\u003cspan class=\"o\"\u003e=\u003c/span\u003ebusybox --restart\u003cspan class=\"o\"\u003e=\u003c/span\u003eNever -- sleep \u003cspan class=\"m\"\u003e3600\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl run server --image\u003cspan class=\"o\"\u003e=\u003c/span\u003ebusybox --restart\u003cspan class=\"o\"\u003e=\u003c/span\u003eNever -- sleep \u003cspan class=\"m\"\u003e3600\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl get pods -o wide\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl \u003cspan class=\"nb\"\u003eexec\u003c/span\u003e -it client -- /bin/sh\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# from inside the client pod try to reach the server pod IP (replace \u0026lt;server-pod-ip\u0026gt;):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003enc -zv \u0026lt;server-pod-ip\u0026gt; \u003cspan class=\"m\"\u003e8080\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003col\u003e\n\u003cli\u003eApply your NetworkPolicy and repeat the test. Use \u003ccode\u003ekubectl describe networkpolicy \u0026lt;name\u0026gt;\u003c/code\u003e to inspect selectors and rules.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThese steps are intended for validation only. Do not run them against production clusters.\u003c/p\u003e\n\u003cp\u003eCI validation snippet (example):\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# apply policy and run quick connectivity check\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl apply -f mypolicy.yaml\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl run client --image\u003cspan class=\"o\"\u003e=\u003c/span\u003ebusybox --restart\u003cspan class=\"o\"\u003e=\u003c/span\u003eNever -- sleep \u003cspan class=\"m\"\u003e3600\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl run server --image\u003cspan class=\"o\"\u003e=\u003c/span\u003ebusybox --restart\u003cspan class=\"o\"\u003e=\u003c/span\u003eNever -- sleep \u003cspan class=\"m\"\u003e3600\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nv\"\u003eSERVER_IP\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"k\"\u003e$(\u003c/span\u003ekubectl get pod -l \u003cspan class=\"nv\"\u003erun\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003eserver -o \u003cspan class=\"nv\"\u003ejsonpath\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s1\"\u003e\u0026#39;{.items[0].status.podIP}\u0026#39;\u003c/span\u003e\u003cspan class=\"k\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003ekubectl \u003cspan class=\"nb\"\u003eexec\u003c/span\u003e client -- nc -zv \u003cspan class=\"nv\"\u003e$SERVER_IP\u003c/span\u003e \u003cspan class=\"m\"\u003e8080\u003c/span\u003e \u003cspan class=\"o\"\u003e||\u003c/span\u003e \u003cspan class=\"nb\"\u003eexit\u003c/span\u003e \u003cspan class=\"m\"\u003e1\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eCI security guidance:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ePrefer ephemeral test clusters created by the pipeline and destroyed after the run. If that is not possible, create a Kubernetes ServiceAccount with minimal RBAC instead of storing a full-cluster admin \u003ccode\u003eKUBECONFIG\u003c/code\u003e in secrets.\u003c/li\u003e\n\u003cli\u003eUse a least-privilege service principal or OIDC-based login for Azure authentication and scope credentials to the smallest resource group or cluster role necessary. Avoid exposing long-lived admin credentials in CI secrets.\u003c/li\u003e\n\u003c/ul\u003e\n\n\n\n\n\u003ch2 id=\"namespace-isolation\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#namespace-isolation\" title=\"Namespace isolation\"\u003eNamespace isolation\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eNamespaces help organize workloads but do not enforce network isolation by themselves. Apply a policy that denies ingress to all pods unless explicitly allowed to implement namespace-level segmentation.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yml\" data-lang=\"yml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eapiVersion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003enetworking.k8s.io/v1\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ekind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eNetworkPolicy\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003emetadata\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edeny-cross-namespace\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003espec\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003epodSelector\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e{}\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eingress\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[]\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\n\n\n\u003ch2 id=\"egress-control\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#egress-control\" title=\"Egress control\"\u003eEgress control\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eOutbound traffic is often overlooked, yet many compromises involve unfiltered egress. Use egress policies to permit only required external destinations. Example: allow DNS to a specific resolver.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yml\" data-lang=\"yml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eapiVersion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003enetworking.k8s.io/v1\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ekind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eNetworkPolicy\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003emetadata\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eallow-egress-dns\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003espec\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003epodSelector\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e{}\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003epolicyTypes\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e- \u003cspan class=\"l\"\u003eEgress\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003eegress\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e- \u003cspan class=\"nt\"\u003eto\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e- \u003cspan class=\"nt\"\u003eipBlock\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e            \u003c/span\u003e\u003cspan class=\"nt\"\u003ecidr\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"m\"\u003e8.8.8.8\u003c/span\u003e\u003cspan class=\"l\"\u003e/32\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e\u003cspan class=\"nt\"\u003eports\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e- \u003cspan class=\"nt\"\u003eprotocol\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eUDP\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003eport\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"m\"\u003e53\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\n\n\n\u003ch2 id=\"choosing-the-right-engine\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#choosing-the-right-engine\" title=\"Choosing the right engine\"\u003eChoosing the right engine\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eFeature comparison at a glance:\u003c/p\u003e\n\u003ctable class=\"striped\"\u003e\n\t\u003cthead\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003cth\u003eFeature\u003c/th\u003e\n\t\t\t\t\t\u003cth style=\"text-align: right\"\u003eAzure Network Policies\u003c/th\u003e\n\t\t\t\t\t\u003cth style=\"text-align: right\"\u003eCalico\u003c/th\u003e\n\t\t\t\t\t\u003cth style=\"text-align: right\"\u003eCilium\u003c/th\u003e\n\t\t\t\u003c/tr\u003e\n\t\u003c/thead\u003e\n\t\u003ctbody\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003eAKS integration\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eVery good\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eGood\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eGood\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003ePerformance\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eHigh\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eHigh\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eHigh\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003eComplexity\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eLow\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eMedium\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eMedium\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003eAdvanced egress\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eNo\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eYes\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eYes\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003eGlobal policies\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eNo\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eYes\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eYes\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003eMulti-cloud support\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eNo\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eYes\u003c/td\u003e\n\t\t\t\t\t\u003ctd style=\"text-align: right\"\u003eYes\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote on Cilium: Cilium provides an eBPF-based dataplane and supports advanced L7 features and cluster/global policy CRDs. Many of Cilium\u0026rsquo;s advanced capabilities rely on Linux eBPF support; feature parity on Windows nodes is limited. Check the AKS Cilium and Cilium docs for supported scenarios and any AKS-specific integration steps.\u003c/p\u003e\n\u003cp\u003eRecommendation: use Azure Network Policies if you need a managed Azure-native solution and do not require advanced Calico features. Choose Calico if you need advanced egress controls, global policies, or multi-cloud consistency.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"best-practices\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#best-practices\" title=\"Best practices\"\u003eBest practices\u003c/a\u003e\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eStart with a default-deny posture. Block traffic first, then explicitly allow required flows.\u003c/li\u003e\n\u003cli\u003eOrganize policies per namespace to simplify governance and reduce accidental exposure.\u003c/li\u003e\n\u003cli\u003eVersion and test policies as part of CI pipelines. Tools such as Kyverno or Gatekeeper help validate and enforce policy changes before they reach production.\u003c/li\u003e\n\u003cli\u003eInstrument and visualize traffic flows using Azure Monitor, Calico UI, or third-party observability tools. Visibility is critical for troubleshooting and verification.\u003c/li\u003e\n\u003cli\u003eCombine network policies with Pod Security Standards to protect workloads and reduce risk at multiple layers.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eAuthor tip: Test policy changes in a disposable staging cluster and automate policy validation in CI pipelines. This reduces surprises during production rollouts and helps detect overly broad or blocking rules early.\u003c/p\u003e\n\u003cp\u003eAuthor note: I will be honest, when I first started working with AKS network policies I found the default behaviour a bit surprising — and you probably will too. So, a pretty simple rule of thumb I use is: start small, test often, and iterate. If you take nothing else from this article, just run the validation steps in a throwaway cluster and you\u0026rsquo;ll learn quickly what gets blocked and what does not.\u003c/p\u003e\n\u003cp\u003eKnown limitations and version notes\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eWindows node support and feature parity can differ from Linux; check the AKS Windows guidance for details. (See Microsoft Docs.)\u003c/li\u003e\n\u003cli\u003eSome advanced Calico features may require specific Calico versions; refer to the Calico and AKS release notes before adopting L7 or global policy features.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eValidate NetworkPolicy\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003eon\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"l\"\u003epush]\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003ejobs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e  \u003c/span\u003e\u003cspan class=\"nt\"\u003evalidate-policy\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003eruns-on\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eubuntu-latest\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e    \u003c/span\u003e\u003cspan class=\"nt\"\u003esteps\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eCheckout\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eactions/checkout@v4\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eSet up kubectl\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003euses\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eazure/setup-kubectl@v3\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e      \u003c/span\u003e- \u003cspan class=\"nt\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eApply policy and test connectivity\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003eenv\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e          \u003c/span\u003e\u003cspan class=\"nt\"\u003eKUBECONFIG\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003e${{ secrets.KUBECONFIG }}\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e        \u003c/span\u003e\u003cspan class=\"nt\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e|\u003c/span\u003e\u003cspan class=\"sd\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          kubectl apply -f mypolicy.yaml\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          kubectl run client --image=busybox --restart=Never -- sleep 3600\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          kubectl run server --image=busybox --restart=Never -- sleep 3600\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          sleep 5\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          SERVER_IP=$(kubectl get pod -l run=server -o jsonpath=\u0026#39;{.items[0].status.podIP}\u0026#39;)\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"sd\"\u003e          kubectl exec client -- nc -zv $SERVER_IP 8080 || exit 1\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eA few quick, honestly practical pointers: name your test namespace \u003ccode\u003enp-test\u003c/code\u003e, use labels like \u003ccode\u003eapp=demo\u003c/code\u003e and \u003ccode\u003erole=backend\u003c/code\u003e, and store \u003ccode\u003eKUBECONFIG\u003c/code\u003e in your CI secrets. These tiny, somewhat mundane conventions make reproducible tests a lot easier.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"conclusion\"\u003e\u003ca href=\"/posts/aks-network-policies-zero-trust/#conclusion\" title=\"Conclusion\"\u003eConclusion\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eNetwork policies are a foundational control for securing AKS clusters. They enable a zero-trust approach inside the cluster, reduce the attack surface, separate workloads, and allow precise control of inbound and outbound traffic. Whether you adopt Azure Network Policies or Calico, apply policies consistently, automate testing and deployment, and maintain visibility to ensure the cluster remains secure and auditable.\u003c/p\u003e","date_modified":"2026-05-26T10:22:03+02:00","date_published":"2025-12-10T11:45:00+01:00","id":"https://daily-devops.net/posts/aks-network-policies-zero-trust/","language":"en","summary":"Learn why AKS Network Policies are essential for Zero Trust, pod isolation, and Kubernetes security—plus how to implement them the right way.","tags":["networking","azure","cloud","kubernetes","platform-engineering"],"title":"AKS Network Policies: The Security Layer Your Cluster Is Missing","url":"https://daily-devops.net/posts/aks-network-policies-zero-trust/"},{"authors":[{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"content_html":"\u003cp\u003eSelecting the right network model is arguably one of the most critical architectural decisions you will make when deploying a Kubernetes cluster on Azure Kubernetes Service (AKS). This choice ripples through nearly every aspect of your cluster\u0026rsquo;s lifecycle, influencing how pods communicate, how efficiently you use your IP address space, which Azure services integrate seamlessly with your workloads, and ultimately, how well your infrastructure scales to meet future demands. It affects scalability, security posture, operational cost, performance characteristics, available integration options, and your long-term operational flexibility.\u003c/p\u003e\n\u003cp\u003eFor many years, AKS administrators have largely found themselves choosing between two well-established options: \u003cstrong\u003ekubenet\u003c/strong\u003e and \u003cstrong\u003eAzure CNI\u003c/strong\u003e. Each brought distinct tradeoffs to the table. kubenet offered simplicity and IP efficiency at the cost of limited integration, while Azure CNI provided rich enterprise capabilities but introduced significant IP consumption challenges that required careful VNet planning. With the introduction of \u003cstrong\u003eAzure CNI Overlay\u003c/strong\u003e, Microsoft has addressed these historical limitations by adding a genuinely modern option that thoughtfully combines IP efficiency with comprehensive enterprise networking capabilities.\u003c/p\u003e\n\u003cp\u003eThis article walks through a comprehensive, practical comparison of all three networking models. We\u0026rsquo;ll examine how each one works under the hood, explore the genuine strengths and limitations of each approach, and ultimately provide you with the guidance you need to make an informed decision about which model best suits your specific organizational requirements and technical constraints.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"why-the-network-model-actually-matters\"\u003e\u003ca href=\"/posts/aks-networking-clash/#why-the-network-model-actually-matters\" title=\"Why the Network Model Actually Matters\"\u003eWhy the Network Model Actually Matters\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eYour choice of network model influences practically every layer of your cluster. How pods receive IP addresses, how they communicate with each other and the VNet, performance and latency characteristics, security boundaries, and policy enforcement all hinge on this decision. So does your ability to integrate with Azure services, your scalability ceiling, your cluster density potential, and ultimately your VNet planning complexity.\u003c/p\u003e\n\u003cp\u003eChanging this decision later is difficult and sometimes impossible. It\u0026rsquo;s not a setting you adjust casually after launch. Getting it right from the start matters considerably.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"kubenet-simplicity-at-the-cost-of-integration\"\u003e\u003ca href=\"/posts/aks-networking-clash/#kubenet-simplicity-at-the-cost-of-integration\" title=\"kubenet: Simplicity at the Cost of Integration\"\u003ekubenet: Simplicity at the Cost of Integration\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eKubenet is effectively legacy for new projects. Microsoft maintains it for existing clusters, but no production workloads should start with it today.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"how-it-works\"\u003e\u003ca href=\"/posts/aks-networking-clash/#how-it-works\" title=\"How it works\"\u003eHow it works\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003ekubenet is the simplest networking approach for AKS. Each node receives a single VNet-routable IP address, but pods get their IPs from a separate, non-routable CIDR range that exists only within the cluster. When pods need to communicate outside the cluster, traffic goes through network address translation (NAT) and user-defined routes (UDRs) that you manage yourself. This fundamental separation is both kubenet\u0026rsquo;s defining feature and its core limitation.\u003c/p\u003e\n\u003cp\u003eKubenet maxes out at 400 nodes. For modern clusters, that\u0026rsquo;s a hard ceiling you\u0026rsquo;ll hit faster than you expect.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"strengths\"\u003e\u003ca href=\"/posts/aks-networking-clash/#strengths\" title=\"Strengths\"\u003eStrengths\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eThe appeal is genuine. Kubenet is IP efficient—you consume very few VNet IPs because pods sit in their own address space. It\u0026rsquo;s simple to understand and straightforward to configure, which makes it attractive for teams new to Kubernetes or environments where networking should stay uncomplicated. Operationally, that translates to lower cost and less day-to-day overhead.\u003c/p\u003e\n\u003cp\u003eThe downside? Isolation.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"limitations\"\u003e\u003ca href=\"/posts/aks-networking-clash/#limitations\" title=\"Limitations\"\u003eLimitations\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eBecause pods aren\u0026rsquo;t directly routable in the VNet, they remain isolated from your broader Azure networking ecosystem. NAT adds overhead and troubleshooting complexity. Integration with Azure networking features—Network Security Groups, Private Link, Azure Firewall—remains limited. For enterprise deployments or hybrid scenarios where your cluster needs to participate seamlessly in existing infrastructure, these limitations become real constraints.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"when-to-use-it\"\u003e\u003ca href=\"/posts/aks-networking-clash/#when-to-use-it\" title=\"When to use it\"\u003eWhen to use it\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003ekubenet works well in specific contexts: development and test environments where simplicity matters more than features, small clusters running non-critical workloads, or scenarios with minimal networking requirements. Beyond those cases, you\u0026rsquo;re better served exploring alternatives.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"azure-cni-enterprise-integration-comes-with-a-price\"\u003e\u003ca href=\"/posts/aks-networking-clash/#azure-cni-enterprise-integration-comes-with-a-price\" title=\"Azure CNI: Enterprise Integration Comes with a Price\"\u003eAzure CNI: Enterprise Integration Comes with a Price\u003c/a\u003e\u003c/h2\u003e\n\n\n\n\n\u003ch3 id=\"how-it-works-1\"\u003e\u003ca href=\"/posts/aks-networking-clash/#how-it-works-1\" title=\"How it works\"\u003eHow it works\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eAzure CNI (Container Networking Interface) represents a fundamental shift from kubenet. Instead of isolating pods in a separate address space, this model assigns each pod a direct, fully routable IP address from your VNet subnet. Pods become first-class participants in your Azure network, capable of direct communication with any VNet resource without NAT or additional routing rules. Traffic flows directly with minimal overhead, resulting in transparent and predictable networking.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"strengths-1\"\u003e\u003ca href=\"/posts/aks-networking-clash/#strengths-1\" title=\"Strengths\"\u003eStrengths\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eThe advantages become apparent in enterprise environments where network visibility matters. Pods hold genuine VNet addresses, so they participate fully in your security frameworks, policy enforcement, and monitoring. Network Security Groups apply directly to pods. Private Link connections work seamlessly. Azure Firewall can inspect traffic properly. Your monitoring tools see pods as native VNet resources. This transparency is invaluable in regulated industries or zero-trust architectures where every network flow must be visible and controllable. Performance is excellent too—no NAT overhead means direct, efficient communication.\u003c/p\u003e\n\u003cp\u003eThe trade-off is real: you need substantial VNet address space.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"limitations-1\"\u003e\u003ca href=\"/posts/aks-networking-clash/#limitations-1\" title=\"Limitations\"\u003eLimitations\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eAzure CNI has a substantial appetite for IP addresses. Every pod needs its own VNet IP, which exhausts address space quickly in larger clusters or with high pod density. A 100-node cluster with 200 pods per node consumes 20,000 pod IPs alone—you need a /14 VNet subnet just for pods. For organizations with limited IP space or managing many clusters in a constrained range, this becomes a genuine scaling constraint.\u003c/p\u003e\n\u003cp\u003eCommon mistakes with Azure CNI: Teams underestimate pod density and provision subnets too small. A /19 feels generous until you hit 250 pods/node on 50 nodes. Then you\u0026rsquo;re recreating the entire cluster. Plan your pod count ceiling carefully—don\u0026rsquo;t guess.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"when-to-use-it-1\"\u003e\u003ca href=\"/posts/aks-networking-clash/#when-to-use-it-1\" title=\"When to use it\"\u003eWhen to use it\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eChoose Azure CNI when network governance, compliance, and performance take priority over IP efficiency. Production workloads in regulated industries, hybrid environments, and zero-trust architectures all benefit from its full integration story. If your organization can accommodate the IP consumption and your workloads demand strong visibility, Azure CNI delivers consistently.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"how-it-works-2\"\u003e\u003ca href=\"/posts/aks-networking-clash/#how-it-works-2\" title=\"How it works\"\u003eHow it works\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eNodes still receive VNet IP addresses as in standard Azure CNI. But pods operate within a separate overlay network with its own CIDR range, decoupled from the VNet. Pod traffic routes through a lightweight overlay stack that handles encapsulation transparently. Despite this separation, full Azure CNI functionality remains available—pods retain integration benefits with Azure services and security constructs.\u003c/p\u003e\n\u003cp\u003eThe math changes dramatically: a 1,000-node cluster with 200 pods/node requires only a /19 overlay CIDR (8,192 IPs), not a /11 VNet subnet like traditional Azure CNI. Traditional CNI would need approximately 200,000 VNet IPs (1,000 nodes × 250 pods/node capacity). That\u0026rsquo;s roughly a 25x reduction in VNet consumption compared to traditional CNI\u0026rsquo;s flat model.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"strengths-2\"\u003e\u003ca href=\"/posts/aks-networking-clash/#strengths-2\" title=\"Strengths\"\u003eStrengths\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eAzure CNI Overlay combines the best of both models. It maintains high IP efficiency similar to kubenet—run large numbers of pods without exhausting VNet address space. Simultaneously, it delivers full enterprise integration like Azure CNI—direct compatibility with Network Security Groups, Private Link, Azure Firewall, and monitoring solutions. Large-scale clusters work without complex subnet planning. Organizations with limited IP space or managing many clusters get a significant scaling advantage. Microsoft explicitly recommends this as the standard for new production clusters, reflecting the platform\u0026rsquo;s evolution.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"limitations-2\"\u003e\u003ca href=\"/posts/aks-networking-clash/#limitations-2\" title=\"Limitations\"\u003eLimitations\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eOverlay adds minor latency—plan for 100-200 microseconds extra per pod-to-external hop due to NAT translation. For latency-sensitive workloads (HFT trading, real-time gaming), this matters. Classic Azure CNI eliminates this cost entirely.\u003c/p\u003e\n\u003cp\u003eDebugging pod-to-external traffic is harder. You\u0026rsquo;ll need to understand SNAT translation. Classic Azure CNI shows pod IPs in network traces; Overlay hides them behind node IPs. Budget extra engineering for network troubleshooting. Most teams underestimate this operational cost.\u003c/p\u003e\n\u003cp\u003eRegional limitations remain: Windows Server 2019 pod support rolled out Q4 2024, but DCsv2 Confidential Computing VMs are unsupported on Overlay (use DCAsv5 instead). Check your region\u0026rsquo;s feature matrix before committing.\u003c/p\u003e\n\u003cp\u003eCommon mistakes: Forgetting that Overlay configuration can\u0026rsquo;t be changed post-deployment. Teams have recreated entire clusters after discovering pod density requirements too late. Finalize your pod count ceiling before cluster creation.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"when-to-use-it-2\"\u003e\u003ca href=\"/posts/aks-networking-clash/#when-to-use-it-2\" title=\"When to use it\"\u003eWhen to use it\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eChoose Azure CNI Overlay if: (1) Your cluster will exceed 1,000 nodes, (2) IP space is scarce, or (3) Pod density baseline exceeds 100 pods/node. For smaller clusters with abundant IP space, classic Azure CNI remains valid.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCritical operational consideration:\u003c/strong\u003e Overlay networking impacts your observability strategy. Direct pod IP logging doesn\u0026rsquo;t work. Your monitoring tools must track node IPs and SNAT mappings instead. Prometheus scrapes will show node targets, not pod targets. Container registries see pod IPs translate through node IPs. Budget extra engineering for network observability—this is where most teams get blindsided.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"putting-it-all-in-perspective-a-practical-comparison\"\u003e\u003ca href=\"/posts/aks-networking-clash/#putting-it-all-in-perspective-a-practical-comparison\" title=\"Putting It All in Perspective: A Practical Comparison\"\u003ePutting It All in Perspective: A Practical Comparison\u003c/a\u003e\u003c/h2\u003e\n\u003ctable class=\"striped\"\u003e\n\t\u003cthead\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003cth\u003eFeature\u003c/th\u003e\n\t\t\t\t\t\u003cth\u003ekubenet\u003c/th\u003e\n\t\t\t\t\t\u003cth\u003eAzure CNI\u003c/th\u003e\n\t\t\t\t\t\u003cth\u003eAzure CNI Overlay\u003c/th\u003e\n\t\t\t\u003c/tr\u003e\n\t\u003c/thead\u003e\n\t\u003ctbody\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eMax Nodes\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003e400\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003e1,000+\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003e5,000\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003ePod IP Source\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003ePod CIDR\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eVNet subnet\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eOverlay CIDR\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eIP Efficiency\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eHigh\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eLow (20,000+ IPs/100 nodes)\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eHigh (8,000 IPs/100 nodes)\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eRouting\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eNAT + UDR\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eDirect\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eOverlay (SNAT egress)\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003ePerformance\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eGood (+latency)\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eExcellent\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eHigh (+100-200μs NAT)\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eAzure Integration\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eLimited\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eFull\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eFull\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eComplexity\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eLow\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eHigh\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eMedium\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\t\t\u003ctr\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eProduction-Ready?\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eLegacy only\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003eYes (IP constraints)\u003c/td\u003e\n\t\t\t\t\t\u003ctd\u003e\u003cstrong\u003eYes (default)\u003c/strong\u003e\u003c/td\u003e\n\t\t\t\u003c/tr\u003e\n\t\u003c/tbody\u003e\n\u003c/table\u003e\n\n\n\n\n\u003ch2 id=\"making-the-right-choice-for-your-constraints\"\u003e\u003ca href=\"/posts/aks-networking-clash/#making-the-right-choice-for-your-constraints\" title=\"Making the Right Choice for Your Constraints\"\u003eMaking the Right Choice for Your Constraints\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eAs of Q4 2025, Microsoft recommends CNI Overlay for all new AKS clusters. Kubenet remains only for legacy migration scenarios. Traditional Azure CNI (flat model) is now positioned as \u0026ldquo;advanced use only.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eYour decision depends on your specific constraints. Here\u0026rsquo;s what that means practically:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLimited IP address space?\u003c/strong\u003e Overlay is your only option. A 500-node cluster with traditional CNI burns 125,000 VNet IPs (500 nodes × 250 pods/node). Overlay uses maybe 500 IPs for nodes, 8,000 for the private CIDR. That\u0026rsquo;s the difference between feasible and impossible.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRegulated industry requiring direct pod traceability?\u003c/strong\u003e Traditional Azure CNI gives you pod IPs you can trace end-to-end. Overlay requires you to reverse-engineer SNAT mappings. Compliance frameworks sometimes demand the former. Check your audit requirements before deciding.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDevelopment or proof-of-concept?\u003c/strong\u003e Kubenet is still reasonable here. Simplicity wins. Just don\u0026rsquo;t ship it to production.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNew production cluster with no prior constraints?\u003c/strong\u003e Overlay. Default assumption. End of discussion. The platform matured past the point where you need to second-guess this.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"the-path-forward-understanding-the-real-trade-offs\"\u003e\u003ca href=\"/posts/aks-networking-clash/#the-path-forward-understanding-the-real-trade-offs\" title=\"The Path Forward: Understanding the Real Trade-offs\"\u003eThe Path Forward: Understanding the Real Trade-offs\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eAKS networking boils down to this: kubenet is dead for production. Azure CNI works only if you have VNet space to burn. Overlay is the pragmatic default.\u003c/p\u003e\n\u003cp\u003eKubenet was the starting point in 2017. Azure CNI added enterprise features in 2019. But both forced uncomfortable choices: either accept a 400-node ceiling with poor observability, or reserve a /11 subnet that might bankrupt your IP planning. Neither worked for real clusters at scale.\u003c/p\u003e\n\u003cp\u003eOverlay changed that equation. Yes, you lose direct pod IP traceability. Yes, you add 100-200 microseconds latency. But you get 5,000-node clusters with IP efficiency that makes sense. You get monitoring that doesn\u0026rsquo;t require reverse-engineering NAT tables. You get a path forward that doesn\u0026rsquo;t require architectural compromise.\u003c/p\u003e\n\u003cp\u003eThe trade-off is honest: latency and debugging complexity for scalability and IP efficiency. For most organizations, that\u0026rsquo;s the right trade.\u003c/p\u003e\n\u003cp\u003eIf you\u0026rsquo;re building new infrastructure on AKS, start with Overlay. If you\u0026rsquo;re running the math on existing clusters and wondering whether to migrate, Overlay is probably cheaper than the subnet expansion you\u0026rsquo;re otherwise facing. Plan your observability around SNAT mappings from day one. Budget engineering time for network troubleshooting. But build forward knowing the constraint that has limited AKS clusters for five years is finally solved.\u003c/p\u003e\n","date_modified":"2026-05-26T10:22:03+02:00","date_published":"2025-12-03T11:45:00+01:00","id":"https://daily-devops.net/posts/aks-networking-clash/","language":"en","summary":"Azure CNI Overlay beats kubenet's 400-node ceiling and classic CNI's IP exhaustion. Compare all three AKS network models before the cluster locks in.","tags":["networking","azure","cloud","kubernetes","platform-engineering"],"title":"AKS Networking Clash: kubenet vs. CNI vs. CNI Overlay","url":"https://daily-devops.net/posts/aks-networking-clash/"}],"language":"en","title":"Platform Engineering for .NET \u0026 Azure Teams on Daily DevOps \u0026 .NET","version":"https://jsonfeed.org/version/1.1"}