Multi-AKS Cluster Networking & Hub-Spoke Topology

Multi-AKS Cluster Networking & Hub-Spoke Topology

Your first production Azure Kubernetes Service (AKS) cluster often feels manageable for months, sometimes for years. Then demand grows and a second cluster appears. Regional resiliency might require it. Team isolation might require it. Compliance boundaries might require it.

The hard part is not creating cluster number two. The hard part is networking between clusters in a way your team can operate at 2 a.m.

This guide focuses on practical multi-cluster AKS networking: connectivity models, DNS (Domain Name System), ingress patterns, and the trade-offs that matter in production.

Why Single Clusters Hit Their Limits

Single-cluster architectures work until they stop being a sensible risk boundary. Three constraints usually force the move:

Scale ceilings. Azure CNI Overlay supports large cluster sizes, including documented scale targets up to 5,000 nodes per cluster in current AKS guidance. Verify current limits before architecture decisions because limits evolve over time (AKS scale limits).

Failure domain isolation. Control plane failures are uncommon, but when they happen the impact is serious. Multi-cluster design contains incidents. A failure in cluster A should not automatically break cluster B.

Team and workload separation. Different compliance requirements, service level objectives, and release cadence often require separate clusters. Shared clusters can become an organizational bottleneck.

Once you commit to multiple clusters, networking becomes the core design problem. Services in cluster A need controlled access to cluster B. Shared infrastructure such as DNS, observability, and data platforms must stay reachable. This must still be simple enough to run day to day.

Two patterns handle most Azure multi-cluster scenarios: Virtual Network (VNet) peering and Private Link. Both are valid, but they solve different problems.

VNet Peering: Direct Layer 3 Connectivity

VNet peering creates bidirectional connectivity between virtual networks over the Azure backbone. Traffic stays private, latency is low, and throughput is high (Virtual network peering overview).

For multi-cluster AKS, peering allows direct IP connectivity between pods and services, assuming routing and policies allow it.

Use peering when:

  • Clusters are in the same region or a paired region
  • You need low latency between workloads
  • You move significant data volume between clusters
  • You want simple routing with minimal translation overhead

Peering limitations:

  • Address spaces cannot overlap
  • Peering is not transitive
  • Security controls must be correct on both sides
  • Cross-region transfer costs can become noticeable

Peering is still the default starting point for most environments because it is predictable and easy to reason about.

Private Link exposes selected services through private endpoints. Instead of full network reachability, consumers connect only to what you explicitly publish (What is Azure Private Link).

In AKS, this is commonly used to expose internal services through an internal load balancer and Private Link Service. Consumer networks do not need full peering to the provider VNet.

Use Private Link when:

  • You need strict service-level exposure across boundaries
  • You cannot avoid overlapping IP ranges
  • You want narrow, auditable connectivity contracts
  • You want to reduce broad peering relationships

Private Link trade-offs:

  • Slightly higher latency than direct peering
  • More setup and lifecycle management
  • Service-specific by design, not full network connectivity
  • Endpoint cost accumulates as service count grows

If your goal is broad cluster-to-cluster communication, peering is simpler. If your goal is controlled service publishing, Private Link is often the better boundary.

Hub-Spoke Topology: Centralized Connectivity

Hub-spoke is the topology that usually wins once cluster count grows. Instead of a full mesh, each cluster VNet connects to a central hub.

         ┌─────────────┐
         │   Hub VNet  │
         │  (Shared)   │
         └──────┬──────┘
        ┌───────┼──────────┐
        │       │          │
    ┌───▼───┐ ┌─▼─────┐ ┌──▼────┐
    │Spoke A│ │Spoke B│ │Spoke C│
    │(Prod) │ │ (Dev) │ │(Stage)│
    └───────┘ └───────┘ └───────┘

Each spoke VNet hosts one AKS cluster. The hub carries shared services such as firewalling, gateway connectivity, DNS forwarding, and centralized observability.

Why Hub-Spoke Works

Simplified management. A full mesh requires $N\times(N-1)/2$ peerings. Hub-spoke usually needs one peering per spoke.

Centralized policy enforcement. Spoke egress can pass through hub security controls. Policy, logging, and compliance become easier to govern.

Cost allocation clarity. Shared services stay in the hub. Team-owned workload costs stay in spokes. Chargeback becomes easier.

Failure domain separation. Spoke incidents are usually isolated. Hub incidents affect connectivity and must be treated as critical.

Practical Implementation with Terraform

This Terraform excerpt shows the core peering pattern:

# Hub VNet with shared services
module "hub_vnet" {
  source              = "./modules/vnet"
  name                = "hub-vnet"
  address_space       = ["10.0.0.0/16"]
  resource_group_name = azurerm_resource_group.hub.name
  location            = var.location
  
  subnets = {
    firewall = {
      address_prefixes = ["10.0.1.0/24"]
    }
    gateway = {
      address_prefixes = ["10.0.2.0/24"]
    }
    shared-services = {
      address_prefixes = ["10.0.10.0/24"]
    }
  }
}

# Spoke VNet for production AKS cluster
module "spoke_prod_vnet" {
  source              = "./modules/vnet"
  name                = "spoke-prod-vnet"
  address_space       = ["10.1.0.0/16"]
  resource_group_name = azurerm_resource_group.spoke_prod.name
  location            = var.location
  
  subnets = {
    aks-nodes = {
      address_prefixes = ["10.1.0.0/19"]
    }
  }
}

# Peering: Spoke to Hub
resource "azurerm_virtual_network_peering" "spoke_prod_to_hub" {
  name                         = "spoke-prod-to-hub"
  resource_group_name          = azurerm_resource_group.spoke_prod.name
  virtual_network_name         = module.spoke_prod_vnet.name
  remote_virtual_network_id    = module.hub_vnet.id
  allow_virtual_network_access = true
  allow_forwarded_traffic      = true
  allow_gateway_transit        = false
  use_remote_gateways          = true
}

# Peering: Hub to Spoke
resource "azurerm_virtual_network_peering" "hub_to_spoke_prod" {
  name                         = "hub-to-spoke-prod"
  resource_group_name          = azurerm_resource_group.hub.name
  virtual_network_name         = module.hub_vnet.name
  remote_virtual_network_id    = module.spoke_prod_vnet.id
  allow_virtual_network_access = true
  allow_forwarded_traffic      = true
  allow_gateway_transit        = true
  use_remote_gateways          = false
}

Key configuration points:

  • allow_forwarded_traffic = true permits routing through the hub for spoke-to-spoke communication if needed
  • allow_gateway_transit = true (hub side) allows spokes to use hub’s VPN or ExpressRoute gateway
  • use_remote_gateways = true (spoke side) leverages hub gateway for on-premises connectivity
  • Address spaces must not overlap; plan your CIDR ranges before deployment

Hub-Spoke Trade-Offs

Latency. Spoke-to-spoke paths include an extra hop through the hub. Usually this is acceptable, but very latency-sensitive paths should be measured.

Hub as a critical dependency. If core hub components fail, cross-spoke and on-premises connectivity can fail with them. Critical environments should plan for redundancy.

Added infrastructure complexity. You now own central routing, firewalling, and gateway operations. For two or three clusters, direct peering may still be simpler.

Use hub-spoke when you have several clusters, need central governance, or depend on shared network services.

DNS Resolution Across Clusters

DNS is where many multi-cluster designs fail quietly. Connectivity may exist while name resolution does not.

The DNS Challenge

Each AKS cluster runs its own CoreDNS service. By default, it resolves cluster-local names such as .svc.cluster.local. Cross-cluster discovery needs explicit design.

You need answers to two questions:

  1. How does cluster A resolve service names from cluster B?
  2. How does this remain accurate as services change over time?

Approach 1: DNS Forwarding with Custom CoreDNS Configuration

You can extend CoreDNS to forward specific zones to resolvers in another cluster.

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  clusterb.server: |
    clusterb.local:53 {
        errors
        cache 30
        forward . 10.2.0.10
    }

This forwards queries for clusterb.local to the resolver in cluster B. Services become reachable by names such as service-name.namespace.svc.clusterb.local.

Limitations:

  • Manual configuration in each cluster
  • Resolver endpoints must stay reachable
  • Fragile if upstream DNS endpoints change

Approach 2: External DNS with Shared Zone

A more scalable pattern is running ExternalDNS in each cluster and writing records into a shared Azure Private DNS zone.

apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  annotations:
    external-dns.alpha.kubernetes.io/hostname: api.shared.internal
spec:
  type: LoadBalancer
  loadBalancerIP: 10.1.5.100
  ports:
  - port: 443
    targetPort: 8443

ExternalDNS creates records such as api.shared.internal and updates them as service endpoints change.

Benefits:

  • Automatic DNS management
  • Centralized control through Azure DNS
  • Works across clusters without manual forwarding rules

Trade-offs:

  • Requires ExternalDNS operations in every cluster
  • Adds a small DNS zone cost
  • Naming conventions are required to avoid collisions

For most production teams, this is the pragmatic default because it scales and removes manual DNS drift. In AKS, this model aligns well with Private DNS zone integration and standard cluster DNS behavior (AKS networking concepts).

Shared Ingress Architectures

You can expose multi-cluster services in two common ways: centralized ingress in a hub, or distributed ingress behind a global load balancer.

Centralized Ingress in Hub VNet

Run ingress in the hub VNet, for example with NGINX, Azure Application Gateway, or Envoy. External traffic enters once and is routed to spokes.

Advantages:

  • Single public IP for all clusters
  • Centralized TLS termination and certificate management
  • Simplified firewall rules (only hub ingress needs public exposure)

Limitations:

  • Hub becomes a bottleneck for all ingress traffic
  • Additional latency (traffic routes hub → spoke)
  • Hub failure impacts all clusters

Use centralized hub ingress when operational simplicity and unified policy enforcement outweigh performance concerns.

Distributed Ingress with Azure Front Door

Run ingress in each spoke and front it with Azure Front Door or Traffic Manager. Routing decisions can use health, latency, and geographic criteria (Azure Front Door overview).

Advantages:

  • High availability (cluster failures don’t take down all ingress)
  • Lower latency (traffic routes directly to closest cluster)
  • Scalable ingress capacity (not bottlenecked on hub)

Limitations:

  • Multiple public IPs to manage
  • Distributed certificate management (mitigated with cert-manager and Let’s Encrypt)
  • Requires global load balancer (Azure Front Door, Traffic Manager)

For high availability and regional resilience, distributed ingress is often the better long-term model.

Service Mesh Considerations: When Complexity Is Worth It

Service meshes such as Istio, Linkerd, or Consul can solve real problems, but they also add a major operational layer.

What Service Mesh Solves

Cross-cluster service discovery. Meshes can federate service catalogs, letting cluster A discover and route to services in cluster B without manual DNS configuration.

Traffic shifting and canary deployments. Route a percentage of traffic from cluster A to a new version in cluster B for testing before full cutover.

Mutual TLS and zero-trust networking. Encrypt all inter-service traffic and enforce identity-based policies across cluster boundaries.

Observability. Centralized metrics, tracing, and logging for requests flowing between clusters.

When Service Mesh Is Not Worth It

Most multi-cluster environments do not need a mesh on day one. Managing control planes, sidecar upgrades, and mesh debugging is expensive in terms of engineering time.

Consider service mesh only when:

  • You’re running 5+ clusters with complex inter-cluster traffic patterns
  • Zero-trust networking with mTLS is a hard requirement
  • Advanced traffic management (gradual rollouts, A/B testing across clusters) is core to your deployment strategy
  • Your team has service mesh expertise or dedicated platform engineering resources

Author note: in most organizations I have worked with, peering plus ExternalDNS plus standard ingress handled the majority of real requirements with far less cognitive load.

The Pragmatic Alternative: Keep the Baseline Simple

Before adding a mesh, validate whether baseline Kubernetes networking already meets your goals. Start with clean CIDR planning, network policies, ExternalDNS, and a proven ingress setup.

This baseline is proven and easier to run. Add mesh capabilities only when a measurable requirement demands them.

Cost and Operational Simplicity

Multi-cluster architecture increases both spend and operational load. Design intentionally so cost and complexity stay proportional to business value.

Cost Drivers

Data transfer between regions. Cross-region peering incurs egress charges. High-volume replication paths can become a significant monthly cost. Validate current pricing in the Azure bandwidth and networking pricing pages before committing to traffic-heavy topologies (Azure bandwidth pricing).

Shared infrastructure. Hub-spoke designs require gateway, firewall, and DNS components. These costs usually scale with hub count, not spoke count.

Duplicated platform components. More clusters often mean duplicated logging, metrics, and ingress layers. Consolidate where this does not weaken isolation.

Operational Overhead

Configuration drift. More clusters create more drift opportunities. GitOps tools such as Flux or Argo CD help enforce consistency.

Upgrade coordination. Upgrading many clusters is not linear work. Standardize upgrade pipelines and validate in staging first.

Incident response. Cross-cluster incidents are harder to debug. Centralized logs and tracing are mandatory, not optional.

Balance isolation against complexity. Extra clusters without clear boundaries usually become operational debt.

Conclusion: Start Simple, Scale Deliberately

Multi-cluster AKS solves real problems: scale boundaries, failure isolation, and team autonomy. It also introduces networking complexity that is easy to underestimate.

For most teams, this sequence works well:

  • Start with peering and clean IP planning
  • Move to hub-spoke when cluster count or governance requirements grow
  • Use ExternalDNS for shared service discovery
  • Choose centralized or distributed ingress based on availability and latency goals

Service mesh can be valuable, but only when its capabilities are tied to concrete requirements that justify the overhead.

Design with the fewest moving parts that satisfy your constraints. Every extra layer raises troubleshooting effort and incident duration.

Build for your current scale, then add components when measurable pain proves the need. That is the operationally honest path.

References

Comments