Kubernetes Is Not a Platform Strategy

Kubernetes Is Not a Platform Strategy

Kubernetes has transitioned from a technical option to an assumed default. In organizations and projects I’ve worked with, discussions no longer start with whether Kubernetes is appropriate. They start with migration timelines. I’ve sat through planning sessions where the question wasn’t “Should we use Kubernetes?” but rather “When can we have everything moved over?”

This shift isn’t driven by application requirements. It’s driven by narrative. Consulting decks and reference architectures present Kubernetes as a universal platform that absorbs governance, security, scalability, observability, recovery, and operational responsibility. The implicit promise: once your software runs on Kubernetes, the hard parts are handled. I’ve watched teams adopt this belief wholesale, only to discover the gaps six months into production.

That promise is incomplete. Kubernetes primarily addresses one phase: runtime orchestration. Most architectural risk, cost overruns, and operational failures occur before runtime during design and delivery, or after runtime when incidents happen and systems evolve. I’ve debugged production incidents where Kubernetes ran flawlessly while the system failed spectacularly because architectural problems existed upstream and downstream of container orchestration.

Treating Kubernetes as a lifecycle platform rather than a runtime component introduces complexity that stays invisible during planning and becomes unavoidable in production. The demos look clean. The reference architectures are elegant. Then you hit reality.

Two questions matter: Not whether Kubernetes works (it does, consistently, in its domain), but where its responsibility ends and whether your organization can handle what lies beyond those boundaries.

Kubernetes in the .NET Reality

Kubernetes clusters rarely host a single, clean workload type in practice. They become convergence points: ASP.NET Core APIs, background workers, event-driven processors, migrated Windows Services, and platform components all sharing infrastructure. I’ve inherited clusters running everything from modern microservices to decade-old .NET Framework services wrapped in Windows containers, all competing for the same resources.

For stateless, Linux-based ASP.NET Core services, Kubernetes is genuinely strong. Deployments are predictable. Rollouts are controlled. Health checks integrate cleanly. You implement a simple health endpoint:

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHealthChecks();
var app = builder.Build();
app.MapHealthChecks("/health");
app.Run();

Then you deploy 3 replicas and Kubernetes does what you asked: it keeps exactly 3 running, rolling out updates without downtime, removing failed pods from traffic automatically. You push a new image and watch the update complete—no manual intervention, no traffic loss, no coordination overhead.

This is where Kubernetes works exactly as intended: the application exposes its state honestly, and the platform responds intelligently. Three replicas means three replicas, constantly. A pod fails, it gets replaced within seconds. A rolling update happens seamlessly because Kubernetes orchestrates the transition and the application cooperates through its health endpoint. The first time you watch this happen without manually managing anything, it feels like magic.

This experience—predictable, reliable, hands-off—becomes the template in your mind for how Kubernetes should work everywhere.

The mistake begins when this success gets generalized. I’ve seen this pattern repeatedly: success with stateless APIs leads to confidence that everything belongs in Kubernetes. Then the complexity arrives.

Governance: Structure Without Enforcement

Kubernetes offers namespaces, labels, and RBAC. These are primitives, not governance. Real enterprise governance requires enforceable policy, auditability, cost attribution, and environmental separation. In Azure-centric environments, these concerns traditionally live at the subscription, management group, and Azure Policy layer, where they’re auditable, mandatory, and enforced at the platform level.

Introducing Kubernetes adds a second governance plane. Without deliberate policy enforcement, clusters drift. I’ve seen production and experimental workloads coexist in the same cluster because namespace isolation felt sufficient. It wasn’t. Cost attribution becomes opaque. Who actually paid for that node pool? Which business unit owns this? When incidents happen, these questions waste critical time.

In one organization, we discovered experimental ML workloads running on production infrastructure because someone had kubectl access and “just needed to test something quickly.” The namespace separation existed. The policy enforcement didn’t.

Kubernetes doesn’t prevent this drift. It accelerates it by making deployment so frictionless that governance becomes an afterthought.

Identity: Kubernetes Stops Where Entra ID Starts

.NET applications rely on Entra ID (formerly Azure AD) for authentication, authorization, managed identities, and conditional access. Kubernetes has no native concept of enterprise identity. It doesn’t integrate with Entra ID’s policy layer, conditional access rules, or compliance tracking. This isn’t a limitation; it’s architectural reality.

Kubernetes RBAC governs access to cluster resources: who can deploy pods, create services, read secrets. But application identity—the identity your code runs under, the services it authenticates to, the permissions it holds—that’s entirely separate. Kubernetes facilitates the technical handshake (workload identity token exchange), but the authority making identity decisions lives outside the cluster in Entra ID. Your application integrates with Entra ID directly, not through Kubernetes.

This boundary is invisible until you’re three months into production and security asks about conditional access policies, device compliance rules, or audit trails. Kubernetes doesn’t track any of that. It can’t. The identity system is external, and Kubernetes merely provides the plumbing to connect to it.

I’ve worked with teams who expected Kubernetes to handle enterprise identity because it handled everything else. It doesn’t. That realization typically arrives when security reviews surface the integration gaps.

Networking: Where Kubernetes Abstraction Fails First

Networking is where Kubernetes myths collapse fastest. I’ve seen the most preventable production incidents here. Kubernetes introduces its own networking model, but it doesn’t replace enterprise networking. It operates inside it. This distinction matters when things go wrong.

In Azure-based architectures, your first line of defense exists outside the cluster:

  • Virtual networks and subnet isolation
  • User-defined routing (UDR)
  • Azure Firewall or Network Virtual Appliance (NVA)
  • Application Gateway or Front Door with Web Application Firewall (WAF)
  • Private endpoints and service endpoints

Ingress controllers route traffic. They don’t defend the network. They’re application-layer components running inside pods, not hardened network appliances.

Treating Kubernetes ingress as your security perimeter shifts responsibility from hardened network controls to application-level components that were never designed to absorb hostile traffic at scale. I’ve seen this assumption lead to security incidents where attackers bypassed ingress controllers by targeting services directly once they gained cluster access.

Azure CNI and IP Exhaustion

With Azure CNI, every pod consumes a real IP address from your virtual network subnet. Scaling pods means scaling IP consumption linearly. Poor subnet sizing surfaces late—usually in production when teams suddenly can’t scale further and the error message is cryptic. Kubernetes schedules pods until the network says no, then fails silently.

This isn’t a Kubernetes failure. It’s a networking responsibility that Kubernetes exposes. I’ve debugged this scenario more times than I’d like to admit, always with the same root cause: network planning happened before anyone calculated peak pod counts under load.

East-West Traffic and Lateral Movement

Kubernetes networking is flat by default. Every pod can reach every other pod within the cluster. Network policies are optional and frequently incomplete. In organizations without dedicated platform teams, they’re often absent entirely.

For multi-service .NET systems, this makes lateral movement trivial once any single pod is compromised. An attacker who gains access to a frontend pod can immediately probe backend services, database connections, and internal APIs. Kubernetes provides the mechanism (network policies) but doesn’t enforce discipline. I worked on an incident response where a compromised pod accessed 12 different internal services before we detected it. Network policies existed in the repository. They weren’t applied.

Egress Control

Ingress gets constant attention: WAF rules, TLS certificates, rate limiting. Egress almost never does. By default, all pods can reach the internet: any destination, any port. In regulated environments, that’s unacceptable. Egress control requires forced routing through Azure Firewall and explicit allow-listing of destinations.

Kubernetes has no native concept of allowed destinations. You build this external to the cluster, then spend weeks troubleshooting why perfectly valid application calls fail because someone forgot to allow-list a critical API endpoint.

Security: Responsibility Is Concentrated, Not Removed

Kubernetes provides security mechanisms. Almost none are enabled by default. A .NET application on Azure App Service benefits from opinionated defaults: automatic image scanning, encrypted secrets, preconfigured network isolation, integrated runtime monitoring.

In Kubernetes, every guarantee requires deliberate recreation:

  • Image provenance through admission controllers and policy enforcement
  • Secret handling through external secret stores (Azure Key Vault integration)
  • Network segmentation through network policies and firewall rules
  • Runtime monitoring through service mesh sidecars or host-level agents

Each added controller or sidecar increases capability and attack surface simultaneously. I’ve reviewed Kubernetes configurations where security controls outnumbered application pods. The cluster became a security platform that happened to run some software.

Kubernetes doesn’t reduce security effort. It concentrates it into your platform team, assuming you have one.

CI/CD and Supply Chain: Kubernetes Consumes Trust

Kubernetes consumes artifacts. It doesn’t produce trust. CI pipelines, artifact promotion, image immutability, and signing decisions all happen long before Kubernetes schedules a pod. A broken supply chain can’t be repaired at runtime. If a malicious image makes it to your registry, Kubernetes will happily deploy it.

I’ve worked with a team who discovered their CI pipeline had been compromised for three weeks. Kubernetes deployed every malicious image perfectly—on schedule, with zero-downtime rolling updates. The orchestration worked flawlessly. The supply chain didn’t. Kubernetes enforces desired state but doesn’t validate how that state was produced. That validation is your responsibility in your build pipelines, artifact registries, and admission controllers.

Observability: Infrastructure Metrics Are Not Insight

Kubernetes emits metrics and logs: CPU usage per pod, memory consumption, network I/O. These describe platform health, not system behavior. .NET systems require application-level observability—distributed tracing across service boundaries, dependency tracking to external systems, structured logging with correlation IDs.

builder.Services.AddOpenTelemetry()
    .WithTracing(t =>
        t.AddAspNetCoreInstrumentation()
         .AddHttpClientInstrumentation());

Without integration into Azure Monitor and Application Insights, incidents become reconstruction exercises. I’ve sat in war rooms where Kubernetes dashboards stayed green—all pods healthy, all nodes operational—while users experienced cascading timeouts. Pod restarts hide underlying failures instead of surfacing them. A pod that crashes and restarts every 30 seconds looks “healthy” to Kubernetes if it passes health checks between crashes.

Observability requires design. You bring it, or you debug blind.

Scalability: Kubernetes Scales Pods, Not Systems

Kubernetes scales replicas, not architectures. Database contention, synchronous dependencies, external API limits—they all remain regardless of how many pod copies you create. Kubernetes can amplify bottlenecks just as effectively as it amplifies capacity.

I’ve watched auto-scaling create 50 pod replicas, all waiting for the same database connection pool that maxed out at 100 connections. More pods didn’t solve the problem—they made it worse by consuming resources while waiting.

Event-driven scaling improves this, but only with architectural redesign. Kubernetes enables the mechanism for elasticity—you can scale replicas based on external signals. But the architecture determines whether that mechanism translates into actual scalability. Scaling 50 pods won’t help if they’re all waiting on the same bottleneck. That’s a design problem, not an orchestration problem.

Backup and Recovery: Kubernetes Stops Completely

Kubernetes restarts containers. It doesn’t restore systems. State lives outside the cluster in databases, message queues, caches, and storage accounts. Backup and recovery remain responsibilities of data platforms and operational processes. Kubernetes has no concept of business continuity or disaster recovery beyond “restart the pod.”

High availability masks failure. It doesn’t undo it. A corrupted database doesn’t care how many pod replicas exist or how fast Kubernetes can reschedule them. I’ve responded to incidents where Kubernetes performed perfectly—immediate failover, health-driven routing—while the underlying data corruption spread across all replicas.

Windows Containers on Kubernetes: A Strong Architectural Smell

Windows containers are supported but introduce slower startup times (minutes versus seconds), limited ecosystem support, and operational asymmetry—separate node pools, different update cadence, higher costs. They’re frequently used to avoid refactoring legacy workloads, turning Kubernetes into a compatibility layer rather than a platform.

I’ve seen .NET Framework applications from 2010 wrapped in Windows containers and deployed to Kubernetes because “we’re moving to cloud-native.” The workload hadn’t changed. The infrastructure complexity increased dramatically. They function, they complicate operations, and they rarely age well.

Every Windows container deployment I’ve reviewed eventually became a maintenance burden. The startup time alone makes scaling problematic. Windows licensing costs amplify infrastructure expenses. And the operational split between Linux and Windows node pools fragments your platform team’s expertise.

Cost and Organizational Economics

Kubernetes isn’t cost-neutral—a realization that typically arrives 3-6 months after initial deployment when finance asks why cloud costs doubled. It shifts cost visibility from infrastructure to organization: platform teams grow from 2 to 8 people, node pools sit idle waiting for burst capacity that happens twice a month, Windows nodes amplify costs through licensing and compute, observability instrumentation adds runtime overhead and egress costs.

Technical efficiency—improved resource utilization through bin-packing and scheduling—often comes at organizational expense: larger platform teams, slower iteration velocity (every change needs cluster-wide validation), distributed debugging complexity (which of the 15 services in the trace actually caused the timeout?).

The calculation isn’t universal. It depends on workload mix, team structure, organizational tolerance for operational complexity. For companies running 200+ microservices with dedicated SRE teams, Kubernetes pays dividends. For companies running 8 services with 3 developers, it’s often overhead.

Conclusion: Kubernetes Concentrates Architectural Responsibility

Kubernetes is powerful and, in specific scenarios, the right choice: stateless Linux-based APIs with clean 12-factor design, event-driven background workers that scale horizontally, organizations with dedicated platform teams who can absorb operational complexity, and standardized workload portfolios where 80%+ of applications fit predictable patterns.

Outside these boundaries, Kubernetes doesn’t remove responsibility. It concentrates it. The responsibilities I’ve outlined (governance, identity, networking, security, observability, backup) don’t disappear. They become explicit architectural decisions that someone on your team must own, implement, and maintain.

Kubernetes is not governance. That lives at the subscription, policy, and organizational level. It’s not identity. That authority is Entra ID. It’s not the security perimeter. That’s the network, the firewall, and the defense-in-depth controls you build around the cluster. It’s not backup and recovery. That responsibility belongs to data platforms and business continuity planning. It’s not observability. That’s an application design concern requiring deliberate instrumentation.

Kubernetes orchestrates workloads, and it does this extremely well.

From an architect’s perspective—someone who has designed, deployed, and maintained these systems in production—Kubernetes can be the most visible component of a hosting solution but never the whole solution. The promise that it absorbs the software lifecycle is marketing, not engineering reality.

That distinction isn’t theoretical. It’s operational reality I’ve experienced across multiple organizations, multiple industries, multiple failure modes.

The question isn’t whether Kubernetes works—it does, consistently, predictably, within its domain. The question is whether your organization can handle everything Kubernetes doesn’t do, and whether the complexity trade-off makes sense for your specific context, team capability, and workload characteristics.

Answer that question honestly before committing your platform strategy.

Comments

VG Wort