AKS Network Policies: The Security Layer Your Cluster Is Missing
Network segmentation is a fundamental security control for modern Kubernetes environments. AKS supports multiple networking models such as kubenet, Azure CNI, and overlay CNIs. The networking model matters, but the decisive factor for enforcing isolation and compliance is the consistent application of network policies.
This article describes how network policies work in AKS, the available engines, practical examples, and recommended practices for enforcing a zero-trust posture within a cluster.
Why network policies matter
Kubernetes permissively allows pod-to-pod communication by default, which simplifies operations but increases risk. Without network policies, an attacker or a compromised workload can move laterally, access internal services, exfiltrate data, or generate unintended traffic. Network policies let you express explicit allow rules, reducing the cluster attack surface and supporting compliance requirements.
AKS network policy engines
AKS offers two commonly used network policy implementations. Choose based on feature needs and operational constraints.
AKS also supports Cilium as a network policy and dataplane option. Evaluate Cilium if you require advanced eBPF-based dataplane features or different dataplane capabilities (see Microsoft Docs).
Azure Network Policies
- Native AKS integration.
- Requires Azure CNI (see Microsoft Docs: Use network policies in AKS).
- High performance and deep integration with Azure networking.
- Policies are enforced by Azure’s policy manager.
- Best suited for organizations that prefer a managed, Azure-native solution.
Calico Network Policies
- Open-source and widely adopted.
- Supports advanced features such as egress controls and global policies.
- Works with Azure CNI and kubenet (see Microsoft Docs: Use network policies in AKS).
- Suitable for complex architectures, multi-cloud deployments, or teams that need granular L3/L4 controls.
How network policies work
Network policies declare allowed traffic in terms of pod selectors, namespace selectors, ports, and protocol. A policy can specify ingress rules, egress rules, or both. Importantly, once any policy selects a pod, the implicit behavior becomes deny for traffic not explicitly allowed. That default-deny behavior is the basis for predictable and auditable isolation.
Note: Network policy is commonly set at cluster creation (for example: az aks create --network-plugin azure --network-policy azure). You can enable or change the network policy engine on an existing cluster (for example: az aks update --resource-group myRG --name myAKSCluster --network-policy calico). However, changing the network policy can trigger node-pool reimaging and temporary disruption.
Practical maintenance steps when changing network policies:
- Test the change in a staging cluster first. Example create command for a disposable test cluster:
az aks create -g myRG -n test-cluster --network-plugin azure --network-policy calico --node-count 1
- When rolling changes through production, update one node pool at a time and verify workloads before proceeding.
- Before making changes, cordon and drain affected nodes to allow graceful eviction:
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-local-data
- After the update, validate workloads and then uncordon nodes:
kubectl uncordon <node-name>.
Plan a maintenance window for these operations and automate the rollback or node-pool recreation path if validation fails.
Note: Kubernetes NetworkPolicy is an L3/L4 mechanism. It controls IP and port level access between pods and namespaces. For L7 (HTTP/FQDN) filtering you need an engine that explicitly supports L7 policies (for example, Cilium’s L7 features) or a service-mesh / proxy-based approach.
Practical example: Allow only specific traffic
This policy allows only requests from pods labeled role=app to pods labeled role=backend on TCP port 8080 in the production namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-app-to-backend
namespace: production
spec:
podSelector:
matchLabels:
role: backend
ingress:
- from:
- podSelector:
matchLabels:
role: app
ports:
- protocol: TCP
port: 8080
Without other allow rules, all other traffic to the selected backend pods will be blocked. This approach supports a least-privilege model for intra-cluster communication.
How to validate policies
Quick validation steps you can run in a test cluster:
- Create a small test cluster with Calico enabled:
az aks create -g myRG -n test-calico --network-plugin azure --network-policy calico --node-count 1
- Deploy two lightweight pods and verify connectivity:
kubectl run client --image=busybox --restart=Never -- sleep 3600
kubectl run server --image=busybox --restart=Never -- sleep 3600
kubectl get pods -o wide
kubectl exec -it client -- /bin/sh
# from inside the client pod try to reach the server pod IP (replace <server-pod-ip>):
nc -zv <server-pod-ip> 8080
- Apply your NetworkPolicy and repeat the test. Use
kubectl describe networkpolicy <name>to inspect selectors and rules.
These steps are intended for validation only. Do not run them against production clusters.
CI validation snippet (example):
# apply policy and run quick connectivity check
kubectl apply -f mypolicy.yaml
kubectl run client --image=busybox --restart=Never -- sleep 3600
kubectl run server --image=busybox --restart=Never -- sleep 3600
SERVER_IP=$(kubectl get pod -l run=server -o jsonpath='{.items[0].status.podIP}')
kubectl exec client -- nc -zv $SERVER_IP 8080 || exit 1
CI security guidance:
- Prefer ephemeral test clusters created by the pipeline and destroyed after the run. If that is not possible, create a Kubernetes ServiceAccount with minimal RBAC instead of storing a full-cluster admin
KUBECONFIGin secrets. - Use a least-privilege service principal or OIDC-based login for Azure authentication and scope credentials to the smallest resource group or cluster role necessary. Avoid exposing long-lived admin credentials in CI secrets.
Namespace isolation
Namespaces help organize workloads but do not enforce network isolation by themselves. Apply a policy that denies ingress to all pods unless explicitly allowed to implement namespace-level segmentation.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-namespace
spec:
podSelector: {}
ingress: []
Egress control
Outbound traffic is often overlooked, yet many compromises involve unfiltered egress. Use egress policies to permit only required external destinations. Example: allow DNS to a specific resolver.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-dns
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 8.8.8.8/32
ports:
- protocol: UDP
port: 53
Choosing the right engine
Feature comparison at a glance:
| Feature | Azure Network Policies | Calico | Cilium |
|---|---|---|---|
| AKS integration | Very good | Good | Good |
| Performance | High | High | High |
| Complexity | Low | Medium | Medium |
| Advanced egress | No | Yes | Yes |
| Global policies | No | Yes | Yes |
| Multi-cloud support | No | Yes | Yes |
Note on Cilium: Cilium provides an eBPF-based dataplane and supports advanced L7 features and cluster/global policy CRDs. Many of Cilium’s advanced capabilities rely on Linux eBPF support; feature parity on Windows nodes is limited. Check the AKS Cilium and Cilium docs for supported scenarios and any AKS-specific integration steps.
Recommendation: use Azure Network Policies if you need a managed Azure-native solution and do not require advanced Calico features. Choose Calico if you need advanced egress controls, global policies, or multi-cloud consistency.
Best practices
- Start with a default-deny posture. Block traffic first, then explicitly allow required flows.
- Organize policies per namespace to simplify governance and reduce accidental exposure.
- Version and test policies as part of CI pipelines. Tools such as Kyverno or Gatekeeper help validate and enforce policy changes before they reach production.
- Instrument and visualize traffic flows using Azure Monitor, Calico UI, or third-party observability tools. Visibility is critical for troubleshooting and verification.
- Combine network policies with Pod Security Standards to protect workloads and reduce risk at multiple layers.
Author tip: Test policy changes in a disposable staging cluster and automate policy validation in CI pipelines. This reduces surprises during production rollouts and helps detect overly broad or blocking rules early.
Author note: I will be honest, when I first started working with AKS network policies I found the default behaviour a bit surprising — and you probably will too. So, a pretty simple rule of thumb I use is: start small, test often, and iterate. If you take nothing else from this article, just run the validation steps in a throwaway cluster and you’ll learn quickly what gets blocked and what does not.
Known limitations and version notes
- Windows node support and feature parity can differ from Linux; check the AKS Windows guidance for details. (See Microsoft Docs.)
- Some advanced Calico features may require specific Calico versions; refer to the Calico and AKS release notes before adopting L7 or global policy features.
name: Validate NetworkPolicy
on: [push]
jobs:
validate-policy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up kubectl
uses: azure/setup-kubectl@v3
- name: Apply policy and test connectivity
env:
KUBECONFIG: ${{ secrets.KUBECONFIG }}
run: |
kubectl apply -f mypolicy.yaml
kubectl run client --image=busybox --restart=Never -- sleep 3600
kubectl run server --image=busybox --restart=Never -- sleep 3600
sleep 5
SERVER_IP=$(kubectl get pod -l run=server -o jsonpath='{.items[0].status.podIP}')
kubectl exec client -- nc -zv $SERVER_IP 8080 || exit 1
A few quick, honestly practical pointers: name your test namespace np-test, use labels like app=demo and role=backend, and store KUBECONFIG in your CI secrets. These tiny, somewhat mundane conventions make reproducible tests a lot easier.
Conclusion
Network policies are a foundational control for securing AKS clusters. They enable a zero-trust approach inside the cluster, reduce the attack surface, separate workloads, and allow precise control of inbound and outbound traffic. Whether you adopt Azure Network Policies or Calico, apply policies consistently, automate testing and deployment, and maintain visibility to ensure the cluster remains secure and auditable.
