Container Registry & Image Security in AKS Deployments

Container Registry & Image Security in AKS Deployments

Your Azure Kubernetes Service (AKS) cluster is running smoothly. Deployments are automated. Teams ship features daily. Everything looks secure, until you discover that a container image pulled from your Azure Container Registry contains a critical vulnerability that’s been actively exploited in the wild for weeks.

This isn’t hypothetical. Supply chain attacks targeting container registries have become a primary attack vector. An unvetted image in production can expose sensitive data, allow lateral movement within your cluster, or provide an entry point for ransomware. The worst part: the vulnerability might not be in your code at all. It could be in a base image dependency you didn’t even know you were using.

Container registry security isn’t optional. It’s foundational to your entire Kubernetes security posture. And ACR (Azure Container Registry) provides the tools you need to enforce it, if you configure them correctly.

Author note: I have seen teams invest heavily in runtime controls while treating the registry as a passive artifact store. That gap usually shows up during incident response, not during happy-path deployments.

Image Scanning & Vulnerability Management

The first line of defense is knowing what’s actually in your images before they reach production. Image scanning tools like Trivy, Microsoft Defender for Containers (formerly Azure Defender), and Anchore analyze container layers for known vulnerabilities (CVEs), malware, and configuration issues.

But scanning alone isn’t enough. You need a policy-based approach that blocks vulnerable images from being deployed.

Microsoft Defender for Containers

Microsoft Defender for Containers integrates with ACR and provides agentless vulnerability assessment for container images when the plan and required extension are enabled. In practice, you get push-triggered assessment plus recurring reassessment over time. Findings are surfaced as recommendations in Microsoft Defender for Cloud, with severity and remediation guidance.

The critical configuration: set up alerting and response workflows. A scan report that nobody reads is worthless. Configure alerts to notify your DevOps team when high or critical vulnerabilities are detected. Better yet, integrate with your CI/CD pipeline to fail builds that introduce new vulnerabilities above a defined threshold.

Trivy Integration

Trivy is an open-source vulnerability scanner that’s lightweight, fast, and highly accurate. It scans container images, filesystem artifacts, and even Infrastructure as Code templates for vulnerabilities and misconfigurations.

Integrate Trivy into your CI/CD pipeline as a gate:

# Scan image before pushing to ACR
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:latest

# If vulnerabilities are found, build fails
# Otherwise, push to ACR
az acr login --name myregistry
docker tag myapp:latest myregistry.azurecr.io/myapp:latest
docker push myregistry.azurecr.io/myapp:latest

This approach ensures that only images meeting your security threshold reach your registry in the first place.

Image Signing & Verification

Vulnerability scanning tells you what’s in an image. Image signing tells you who built it and whether it’s been tampered with. This is supply chain security at its core.

Notation and Notary v2

Azure Container Registry supports Notary v2 (via the Notation CLI), which implements the CNCF Notary specification for signing and verifying container artifacts. When you sign an image, you’re cryptographically attesting that:

  1. The image was built by a trusted party (your CI/CD system)
  2. The image hasn’t been modified since it was signed
  3. The image meets specific criteria (e.g., passed security scans)

Here’s a practical workflow:

  1. In CI/CD: After building and scanning an image, sign it using Notation
  2. In ACR: Store signatures alongside the image
  3. In AKS: Use Azure Policy or admission controllers (OPA Gatekeeper, Kyverno) to verify signatures before allowing pod creation

Cosign Alternative

Cosign, part of the Sigstore project, is another popular option for image signing. It’s simpler to set up than Notary v2 and integrates well with Kubernetes admission controllers. The choice between Notation and Cosign often comes down to your broader toolchain: Notation if you’re heavily invested in Azure, Cosign if you prefer open-source portability.

RBAC for Registry Access

Who can push images to your registry? Who can pull them? These questions matter more than you might think.

Azure RBAC for ACR

ACR supports Azure role-based access control (RBAC) with granular permissions:

  • AcrPull: Read-only access to pull images
  • AcrPush: Ability to push and pull images
  • AcrDelete: Permission to delete images
  • Owner/Contributor: Full management rights

In practice, your setup should look like this:

  • CI/CD service principals: AcrPush role (can build and push images)
  • AKS node pools: AcrPull role via managed identity (can pull images for workloads)
  • Developers: No direct registry access (deployments go through CI/CD)
  • Security team: Reader role for auditing

This model ensures that production images flow through controlled pipelines, not from developer laptops.

AKS Managed Identity for ACR Access

Instead of storing registry credentials in Kubernetes secrets, use AKS managed identity to grant pull access. This eliminates credential management overhead and reduces the risk of credential leakage.

# Attach ACR to AKS cluster using managed identity
az aks update \
  --name myakscluster \
  --resource-group myresourcegroup \
  --attach-acr myregistry

Now your AKS nodes can pull images from ACR without any credentials stored in the cluster.

Private Endpoints: Network Isolation

By default, Azure Container Registry is accessible over the public internet. Even with RBAC, this creates an unnecessary attack surface. Private endpoints solve this by routing registry traffic through your Azure virtual network.

Terraform Example: ACR with Private Endpoint

Here’s a practical Terraform configuration for deploying ACR with a private endpoint:

# Azure Container Registry with Premium SKU (required for private endpoints)
resource "azurerm_container_registry" "acr" {
  name                = "myacrregistry"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  sku                 = "Premium"
  admin_enabled       = false
  
  # Disable public network access
  public_network_access_enabled = false
}

# Private endpoint for ACR
resource "azurerm_private_endpoint" "acr_pe" {
  name                = "acr-private-endpoint"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  subnet_id           = azurerm_subnet.private_endpoints.id

  private_service_connection {
    name                           = "acr-connection"
    private_connection_resource_id = azurerm_container_registry.acr.id
    is_manual_connection           = false
    subresource_names              = ["registry"]
  }

  private_dns_zone_group {
    name                 = "acr-dns-zone-group"
    private_dns_zone_ids = [azurerm_private_dns_zone.acr.id]
  }
}

# Private DNS zone for ACR
resource "azurerm_private_dns_zone" "acr" {
  name                = "privatelink.azurecr.io"
  resource_group_name = azurerm_resource_group.rg.name
}

# Link DNS zone to VNet
resource "azurerm_private_dns_zone_virtual_network_link" "acr" {
  name                  = "acr-dns-link"
  resource_group_name   = azurerm_resource_group.rg.name
  private_dns_zone_name = azurerm_private_dns_zone.acr.name
  virtual_network_id    = azurerm_virtual_network.vnet.id
}

With this configuration:

  1. ACR is only accessible from within your VNet (or peered VNets)
  2. DNS resolution automatically routes registry traffic through the private endpoint
  3. Public internet access to your registry is completely disabled

This is especially critical if your AKS cluster handles sensitive workloads or regulated data.

Policy Enforcement with Gatekeeper

Scanning and signing only matter if you enforce them. Kubernetes admission controllers intercept pod creation requests and enforce policies before workloads are admitted to the cluster.

OPA Gatekeeper for Image Source Enforcement

Open Policy Agent (OPA) Gatekeeper is a common admission controller for policy enforcement in Kubernetes. The following example enforces that workloads only pull images from approved registries. It does not verify cryptographic signatures by itself:

# Gatekeeper ConstraintTemplate for image signature verification
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: acrverifiedimages
spec:
  crd:
    spec:
      names:
        kind: AcrVerifiedImages
      validation:
        openAPIV3Schema:
          type: object
          properties:
            allowedRegistries:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package acrverifiedimages

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not registry_allowed(container.image)
          msg := sprintf("Container image '%v' is not from an allowed registry", [container.image])
        }

        registry_allowed(image) {
          allowed := input.parameters.allowedRegistries[_]
          startswith(image, allowed)
        }
# Constraint to enforce ACR-only images
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: AcrVerifiedImages
metadata:
  name: require-acr-images
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces:
      - "production"
  parameters:
    allowedRegistries:
      - "myacrregistry.azurecr.io/"

This policy ensures that pods in the production namespace can only use images from your approved ACR instance. Any attempt to deploy an image from Docker Hub, a public registry, or an unknown source will be rejected at admission time.

For signature verification, extend this pattern with Ratify and policy enforcement so signatures and trust policies are validated before admission. AKS Image Integrity also exists, but it is currently a preview feature with notable production limitations.

Common Mistakes with Policy Enforcement

  • Naming a registry-allowlist policy as “image verification” even though no signature verification happens
  • Enforcing policies only in production and leaving staging unrestricted
  • Enabling admission control without a break-glass process for incident response
  • Forgetting to version and test policy changes like application code

Multi-Region Replication: Distribution Strategy

If your AKS workloads span multiple Azure regions, you need a registry replication strategy that balances availability, performance, and cost.

Geo-Replication in ACR

ACR Premium SKU supports geo-replication, allowing you to maintain a single registry name while automatically replicating images to multiple Azure regions. This reduces latency for image pulls and provides failover capabilities.

# Enable geo-replication to multiple regions
az acr replication create \
  --registry myregistry \
  --location westeurope

az acr replication create \
  --registry myregistry \
  --location eastus

Now when an AKS cluster in West Europe pulls an image, it’s served from the local replica. If that replica becomes unavailable, ACR automatically fails over to another region.

Replication Patterns

Single-region workloads: No replication needed. Keep it simple.

Multi-region with low traffic: Geo-replication provides good balance between availability and cost.

Multi-region with high traffic or strict latency requirements: Consider dedicated ACR instances per region with automated image promotion pipelines. This gives you more control over what images are available in each region and when they’re promoted.

Disaster recovery: Geo-replication is not a backup strategy. If an image is accidentally deleted, it’s deleted from all replicas. Implement immutability policies (supported in ACR Premium) to prevent accidental deletion of critical images.

Practical Implementation Checklist

If you’re implementing ACR security for an existing AKS deployment, here’s the order of operations I’d recommend:

  1. Enable Microsoft Defender for Containers on your ACR instance (quick win, no code changes)
  2. Set up RBAC to limit who can push/pull images (reduces blast radius)
  3. Integrate Trivy or equivalent scanning into your CI/CD pipeline (prevents new vulnerabilities from entering the registry)
  4. Configure private endpoints if your workloads are in a VNet (reduces attack surface)
  5. Implement image signing with Notation or Cosign (establishes trust boundary)
  6. Deploy Gatekeeper or Kyverno to enforce policies at admission time (prevents policy violations from reaching runtime)
  7. Enable geo-replication if needed (improves availability and performance)

This sequence minimizes risk while keeping deployments flowing. Don’t try to implement everything at once. Layered security is iterative.

What This Actually Prevents

Let’s ground this in real scenarios:

Scenario 1: Compromised base image
Your application uses a popular Node.js base image. A critical vulnerability is discovered (e.g., log4shell equivalent). With vulnerability scanning enabled, you’re alerted within hours. With policy enforcement, existing vulnerable images can’t be deployed until patched.

Scenario 2: Rogue developer
A developer with push access to ACR tries to deploy an unsigned image from their laptop. With signature verification enforced via Gatekeeper, the deployment is rejected at admission time. Your cluster never runs unverified code.

Scenario 3: Supply chain attack
An attacker compromises your CI/CD pipeline and attempts to push a backdoored image to ACR. With RBAC properly configured, the service principal has limited scope. With private endpoints enabled, the attacker can’t even access your registry from outside your network.

Scenario 4: Accidental public exposure
A misconfiguration exposes your ACR to the public internet. With public network access disabled and private endpoints enforced, there’s no route to your registry from outside your VNet, so configuration mistakes don’t result in exposure.

These aren’t theoretical. They’re patterns I’ve seen in production environments that failed to implement registry security correctly.

Conclusion: Security Without Friction

The goal isn’t to lock down everything so tightly that deployments become painful. The goal is to build security into your workflow so seamlessly that it doesn’t slow down your teams.

Image scanning, signing, RBAC, private endpoints, and policy enforcement work together to create a defense-in-depth strategy. No single control is perfect. But layered together, they make successful attacks exponentially harder while keeping legitimate deployments fast.

Start with the quick wins: enable Defender for Containers, configure RBAC, integrate scanning into CI/CD. Then progressively layer on signing, private endpoints, and policy enforcement as your security maturity grows.

Your AKS cluster is only as secure as the images running inside it. Treat your container registry as the trust boundary it actually is.

Comments