AKS Networking Clash: kubenet vs. CNI vs. CNI Overlay

AKS Networking Clash: kubenet vs. CNI vs. CNI Overlay

Selecting the right network model is arguably one of the most critical architectural decisions you will make when deploying a Kubernetes cluster on Azure Kubernetes Service (AKS). This choice ripples through nearly every aspect of your cluster’s lifecycle, influencing how pods communicate, how efficiently you use your IP address space, which Azure services integrate seamlessly with your workloads, and ultimately, how well your infrastructure scales to meet future demands. It affects scalability, security posture, operational cost, performance characteristics, available integration options, and your long-term operational flexibility.

For many years, AKS administrators have largely found themselves choosing between two well-established options: kubenet and Azure CNI. Each brought distinct tradeoffs to the table. kubenet offered simplicity and IP efficiency at the cost of limited integration, while Azure CNI provided rich enterprise capabilities but introduced significant IP consumption challenges that required careful VNet planning. With the introduction of Azure CNI Overlay, Microsoft has addressed these historical limitations by adding a genuinely modern option that thoughtfully combines IP efficiency with comprehensive enterprise networking capabilities.

This article walks through a comprehensive, practical comparison of all three networking models. We’ll examine how each one works under the hood, explore the genuine strengths and limitations of each approach, and ultimately provide you with the guidance you need to make an informed decision about which model best suits your specific organizational requirements and technical constraints.

Why the Network Model Actually Matters

Your choice of network model influences practically every layer of your cluster. How pods receive IP addresses, how they communicate with each other and the VNet, performance and latency characteristics, security boundaries, and policy enforcement all hinge on this decision. So does your ability to integrate with Azure services, your scalability ceiling, your cluster density potential, and ultimately your VNet planning complexity.

Changing this decision later is difficult and sometimes impossible. It’s not a setting you adjust casually after launch. Getting it right from the start matters considerably.

kubenet: Simplicity at the Cost of Integration

Kubenet is effectively legacy for new projects. Microsoft maintains it for existing clusters, but no production workloads should start with it today.

How it works

kubenet is the simplest networking approach for AKS. Each node receives a single VNet-routable IP address, but pods get their IPs from a separate, non-routable CIDR range that exists only within the cluster. When pods need to communicate outside the cluster, traffic goes through network address translation (NAT) and user-defined routes (UDRs) that you manage yourself. This fundamental separation is both kubenet’s defining feature and its core limitation.

Kubenet maxes out at 400 nodes. For modern clusters, that’s a hard ceiling you’ll hit faster than you expect.

Strengths

The appeal is genuine. Kubenet is IP efficient—you consume very few VNet IPs because pods sit in their own address space. It’s simple to understand and straightforward to configure, which makes it attractive for teams new to Kubernetes or environments where networking should stay uncomplicated. Operationally, that translates to lower cost and less day-to-day overhead.

The downside? Isolation.

Limitations

Because pods aren’t directly routable in the VNet, they remain isolated from your broader Azure networking ecosystem. NAT adds overhead and troubleshooting complexity. Integration with Azure networking features—Network Security Groups, Private Link, Azure Firewall—remains limited. For enterprise deployments or hybrid scenarios where your cluster needs to participate seamlessly in existing infrastructure, these limitations become real constraints.

When to use it

kubenet works well in specific contexts: development and test environments where simplicity matters more than features, small clusters running non-critical workloads, or scenarios with minimal networking requirements. Beyond those cases, you’re better served exploring alternatives.

Azure CNI: Enterprise Integration Comes with a Price

How it works

Azure CNI (Container Networking Interface) represents a fundamental shift from kubenet. Instead of isolating pods in a separate address space, this model assigns each pod a direct, fully routable IP address from your VNet subnet. Pods become first-class participants in your Azure network, capable of direct communication with any VNet resource without NAT or additional routing rules. Traffic flows directly with minimal overhead, resulting in transparent and predictable networking.

Strengths

The advantages become apparent in enterprise environments where network visibility matters. Pods hold genuine VNet addresses, so they participate fully in your security frameworks, policy enforcement, and monitoring. Network Security Groups apply directly to pods. Private Link connections work seamlessly. Azure Firewall can inspect traffic properly. Your monitoring tools see pods as native VNet resources. This transparency is invaluable in regulated industries or zero-trust architectures where every network flow must be visible and controllable. Performance is excellent too—no NAT overhead means direct, efficient communication.

The trade-off is real: you need substantial VNet address space.

Limitations

Azure CNI has a substantial appetite for IP addresses. Every pod needs its own VNet IP, which exhausts address space quickly in larger clusters or with high pod density. A 100-node cluster with 200 pods per node consumes 20,000 pod IPs alone—you need a /14 VNet subnet just for pods. For organizations with limited IP space or managing many clusters in a constrained range, this becomes a genuine scaling constraint.

Common mistakes with Azure CNI: Teams underestimate pod density and provision subnets too small. A /19 feels generous until you hit 250 pods/node on 50 nodes. Then you’re recreating the entire cluster. Plan your pod count ceiling carefully—don’t guess.

When to use it

Choose Azure CNI when network governance, compliance, and performance take priority over IP efficiency. Production workloads in regulated industries, hybrid environments, and zero-trust architectures all benefit from its full integration story. If your organization can accommodate the IP consumption and your workloads demand strong visibility, Azure CNI delivers consistently.

How it works

Nodes still receive VNet IP addresses as in standard Azure CNI. But pods operate within a separate overlay network with its own CIDR range, decoupled from the VNet. Pod traffic routes through a lightweight overlay stack that handles encapsulation transparently. Despite this separation, full Azure CNI functionality remains available—pods retain integration benefits with Azure services and security constructs.

The math changes dramatically: a 1,000-node cluster with 200 pods/node requires only a /19 overlay CIDR (8,192 IPs), not a /11 VNet subnet like traditional Azure CNI. Traditional CNI would need approximately 200,000 VNet IPs (1,000 nodes × 250 pods/node capacity). That’s roughly a 25x reduction in VNet consumption compared to traditional CNI’s flat model.

Strengths

Azure CNI Overlay combines the best of both models. It maintains high IP efficiency similar to kubenet—run large numbers of pods without exhausting VNet address space. Simultaneously, it delivers full enterprise integration like Azure CNI—direct compatibility with Network Security Groups, Private Link, Azure Firewall, and monitoring solutions. Large-scale clusters work without complex subnet planning. Organizations with limited IP space or managing many clusters get a significant scaling advantage. Microsoft explicitly recommends this as the standard for new production clusters, reflecting the platform’s evolution.

Limitations

Overlay adds minor latency—plan for 100-200 microseconds extra per pod-to-external hop due to NAT translation. For latency-sensitive workloads (HFT trading, real-time gaming), this matters. Classic Azure CNI eliminates this cost entirely.

Debugging pod-to-external traffic is harder. You’ll need to understand SNAT translation. Classic Azure CNI shows pod IPs in network traces; Overlay hides them behind node IPs. Budget extra engineering for network troubleshooting. Most teams underestimate this operational cost.

Regional limitations remain: Windows Server 2019 pod support rolled out Q4 2024, but DCsv2 Confidential Computing VMs are unsupported on Overlay (use DCAsv5 instead). Check your region’s feature matrix before committing.

Common mistakes: Forgetting that Overlay configuration can’t be changed post-deployment. Teams have recreated entire clusters after discovering pod density requirements too late. Finalize your pod count ceiling before cluster creation.

When to use it

Choose Azure CNI Overlay if: (1) Your cluster will exceed 1,000 nodes, (2) IP space is scarce, or (3) Pod density baseline exceeds 100 pods/node. For smaller clusters with abundant IP space, classic Azure CNI remains valid.

Critical operational consideration: Overlay networking impacts your observability strategy. Direct pod IP logging doesn’t work. Your monitoring tools must track node IPs and SNAT mappings instead. Prometheus scrapes will show node targets, not pod targets. Container registries see pod IPs translate through node IPs. Budget extra engineering for network observability—this is where most teams get blindsided.

Putting It All in Perspective: A Practical Comparison

FeaturekubenetAzure CNIAzure CNI Overlay
Max Nodes4001,000+5,000
Pod IP SourcePod CIDRVNet subnetOverlay CIDR
IP EfficiencyHighLow (20,000+ IPs/100 nodes)High (8,000 IPs/100 nodes)
RoutingNAT + UDRDirectOverlay (SNAT egress)
PerformanceGood (+latency)ExcellentHigh (+100-200μs NAT)
Azure IntegrationLimitedFullFull
ComplexityLowHighMedium
Production-Ready?Legacy onlyYes (IP constraints)Yes (default)

Making the Right Choice for Your Constraints

As of Q4 2025, Microsoft recommends CNI Overlay for all new AKS clusters. Kubenet remains only for legacy migration scenarios. Traditional Azure CNI (flat model) is now positioned as “advanced use only.”

Your decision depends on your specific constraints. Here’s what that means practically:

Limited IP address space? Overlay is your only option. A 500-node cluster with traditional CNI burns 125,000 VNet IPs (500 nodes × 250 pods/node). Overlay uses maybe 500 IPs for nodes, 8,000 for the private CIDR. That’s the difference between feasible and impossible.

Regulated industry requiring direct pod traceability? Traditional Azure CNI gives you pod IPs you can trace end-to-end. Overlay requires you to reverse-engineer SNAT mappings. Compliance frameworks sometimes demand the former. Check your audit requirements before deciding.

Development or proof-of-concept? Kubenet is still reasonable here. Simplicity wins. Just don’t ship it to production.

New production cluster with no prior constraints? Overlay. Default assumption. End of discussion. The platform matured past the point where you need to second-guess this.

The Path Forward: Understanding the Real Trade-offs

AKS networking boils down to this: kubenet is dead for production. Azure CNI works only if you have VNet space to burn. Overlay is the pragmatic default.

Kubenet was the starting point in 2017. Azure CNI added enterprise features in 2019. But both forced uncomfortable choices: either accept a 400-node ceiling with poor observability, or reserve a /11 subnet that might bankrupt your IP planning. Neither worked for real clusters at scale.

Overlay changed that equation. Yes, you lose direct pod IP traceability. Yes, you add 100-200 microseconds latency. But you get 5,000-node clusters with IP efficiency that makes sense. You get monitoring that doesn’t require reverse-engineering NAT tables. You get a path forward that doesn’t require architectural compromise.

The trade-off is honest: latency and debugging complexity for scalability and IP efficiency. For most organizations, that’s the right trade.

If you’re building new infrastructure on AKS, start with Overlay. If you’re running the math on existing clusters and wondering whether to migrate, Overlay is probably cheaper than the subnet expansion you’re otherwise facing. Plan your observability around SNAT mappings from day one. Budget engineering time for network troubleshooting. But build forward knowing the constraint that has limited AKS clusters for five years is finally solved.

Comments