Platform Engineering Without Backstage: Pragmatic IDPs on Azure

Every platform engineering conference talk in the last two years has had a Backstage slide. Glossy catalogue screenshot, a scaffolder demo that creates a repo in four clicks, a knowing nod about “developer experience”. What the slide never shows is the six months the team spent building plugins, the Postgres instance somebody now babysits, the TechDocs theme nobody asked for, and the 0.4 of an engineer permanently assigned to chasing Backstage’s two-week release cadence.

There is no shame in any of this. Backstage is a serious project and serious teams run it well. The shame is treating it as the default (the thing you reach for on day one) when most teams could ship 80% of the value with a tenth of the effort and a fraction of the running cost. Backstage is a platform for building platforms. Most teams need a platform, not a platform-platform.

This post is the Internal Developer Platform (IDP) I keep building when nobody is forcing me to use Backstage. It is small, opinionated, runs on Azure plumbing you already pay for, and ships value in the first quarter instead of the third year.

What an IDP Actually Needs to Do

Before the tooling argument, the value list. An Internal Developer Platform exists to:

Lower the time to first deployment for a new service. Day one, not day thirty.
Make the right thing the easy thing. Security, observability, and cost defaults arrive for free, not as a checklist the team forgets.
Provide a catalogue so anyone can answer “who owns this?” without a Slack archaeology session at 02:00.
Stop the SRE from writing the same answer a third time. Every repeated question is a missing platform feature.

Everything else is decoration. If a feature doesn’t move one of those four numbers, skip it.

The Pragmatic Stack on Azure

Five components. None of them require Backstage. All of them you can stand up with tools your org already owns.

Golden-path repositories: skeleton projects for the shapes you support (API, worker, frontend), each pre-wired to the platform.
Reusable GitHub Actions workflows: a small set of workflow_call templates that do the right thing so teams don’t copy-paste CI.
Opinionated Bicep modules: one module per platform service (Container App, Azure Kubernetes Service (AKS) namespace, Key Vault, Storage), with sane defaults and clearly marked escape hatches.
A service catalogue: one YAML file per service repo, aggregated into a queryable, static site. No database.
Scaffolding: gh repo create --template is enough for most teams. A CLI comes later, if ever.

Golden-Path Repository: The Highest-Leverage Artefact

A new service starts as a fork of a template repo. Day one, the team has CI, CD, infrastructure-as-code, and platform integration already wired together.

payments-api/                      (created from my-org/template-dotnet-api)
├── .github/
│   └── workflows/
│       ├── ci.yml                 → calls the platform reusable workflow
│       └── cd.yml                 → calls the platform reusable workflow
├── infra/
│   └── main.bicep                 → consumes the platform Bicep module
├── src/                           → service code (the only part the team writes)
├── catalog-info.yaml              → one file, registers the service in the catalogue
├── .editorconfig                  → org formatting defaults
├── .gitignore
└── README.md                      → pre-filled: how to run, deploy, and page on-call

The discipline that makes this work: the team owns src/ and almost nothing else. The moment teams start hand-editing the workflow files or the Bicep, the golden path is dead and you’re back to snowflakes. Keep the platform-owned files thin pointers to versioned platform artefacts, and the template stays maintainable for years.

Reusable Workflows: One Place to Improve Everyone’s CI

This is the highest-leverage line of code you will write all quarter. When you fix something in a reusable workflow (a vulnerable action version, a missing Software Bill of Materials (SBOM) step, a flaky cache), every service that calls it picks up the fix on its next run. No PR to fifty repos. No migration guide nobody reads.

The platform owns the real workflow:

# my-org/platform-workflows/.github/workflows/dotnet-api.yml
name: dotnet-api

on:
  workflow_call:
    inputs:
      service:
        required: true
        type: string
      environment:
        required: false
        type: string
        default: staging

permissions:
  id-token: write    # OIDC federation to Azure — no long-lived secrets in any repo
  contents: read
  packages: write

jobs:
  build-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with:
          dotnet-version: "10.0.x"
      - run: dotnet restore
      - run: dotnet build --no-restore -c Release
      - run: dotnet test --no-build -c Release --logger trx
      # SBOM + image build are platform defaults, not per-team decisions.
      - name: Build and push image
        run: |
          az acr build --registry ${{ vars.PLATFORM_ACR }} \
            --image ${{ inputs.service }}:${{ github.sha }} .

  deploy:
    needs: build-test
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}
    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
      - uses: actions/checkout@v4
      - name: Deploy infra + app
        run: |
          az deployment group create \
            --resource-group rg-${{ inputs.service }}-${{ inputs.environment }} \
            --template-file infra/main.bicep \
            --parameters image=${{ vars.PLATFORM_ACR }}.azurecr.io/${{ inputs.service }}:${{ github.sha }}

The team’s entire deployment pipeline is then twelve lines that nobody needs to understand:

# payments-api/.github/workflows/cd.yml (complete file)
name: cd
on:
  push:
    branches: [main]
jobs:
  ship:
    uses: my-org/platform-workflows/.github/workflows/dotnet-api.yml@v3
    with:
      service: payments-api
      environment: production
    secrets: inherit

Pin the @v3 and treat platform releases like a product: a changelog, semantic versions, so teams adopt fixes on a tag bump instead of being silently broken by @main.

Bicep Modules: The Opinions, Codified

A Bicep module per platform service replaces hours of copy-paste, and more importantly, replaces hours of arguing about defaults. Teams call the module with the three or four parameters that matter for their service. The platform team owns the parameters that matter for everyone: tags, managed identity, network posture, the things an auditor will ask about. This is the same argument I make in Why Your Azure Portal Clicks Will Fail the Next Audit: defaults that live in code are defaults you can prove.

// modules/container-app.bicep — one module, sane defaults, marked escape hatches
@description('Service name. Drives naming, tags, and identity.')
param service string

@description('Container image including registry and tag.')
param image string

param location string = resourceGroup().location

// --- Escape hatches: teams override only what their service genuinely needs. ---
@allowed(['0.25', '0.5', '1.0', '2.0'])
param cpu string = '0.5'
param memory string = '1Gi'
param minReplicas int = 1
param maxReplicas int = 10

// --- Platform-owned defaults teams do NOT get to weaken. ---
var tags = {
  service: service
  'managed-by': 'platform'
  'cost-center': 'engineering'
}

resource env 'Microsoft.App/managedEnvironments@2024-03-01' existing = {
  name: 'platform-aca-env'
}

resource identity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'id-${service}'
  location: location
  tags: tags
}

resource app 'Microsoft.App/containerApps@2024-03-01' = {
  name: service
  location: location
  tags: tags
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: { '${identity.id}': {} }
  }
  properties: {
    managedEnvironmentId: env.id
    configuration: {
      ingress: { external: true, targetPort: 8080, transport: 'auto' }
    }
    template: {
      containers: [
        { name: service, image: image, resources: { cpu: json(cpu), memory: memory } }
      ]
      scale: { minReplicas: minReplicas, maxReplicas: maxReplicas }
    }
  }
}

output fqdn string = app.properties.configuration.ingress.fqdn

The escape hatches are explicit and few. If a team needs to override something that isn’t a parameter, it’s a signal: either the default is wrong for everyone (fix the module), or the service is a genuine snowflake. Never allow a quiet fork of the module into a service repo. The day that happens, you have N modules to patch, not one.

Service Catalogue: A YAML File and a Build Step

Skipping the catalogue is the most expensive shortcut on this list, and teams skip it first because it feels like paperwork. It’s the difference between a five-second answer and a forty-minute incident dig. You don’t need Postgres, a graph database, or Backstage to have one.

One file per service repo:

# payments-api/catalog-info.yaml
service: payments-api
owner: team-payments
tier: 1                       # 1 = revenue-critical; drives alerting + review policy
language: dotnet
hosting: container-apps
on_call: payments-oncall@example.com
links:
  repo: https://github.com/example/payments-api
  runbook: https://example.atlassian.net/wiki/payments-api
  dashboard: https://example.grafana.net/d/payments
depends_on:
  - postgres-payments
  - auth-api

A nightly job in the platform repo collects every one of those files across the org and emits a single static JSON the catalogue site reads. The whole “backend” is the GitHub search API, jq, and yq:

# my-org/platform/.github/workflows/build-catalogue.yml
on:
  schedule:
    - cron: "0 3 * * *"     # nightly; the org doesn't reorganise faster than that
  workflow_dispatch:
jobs:
  aggregate:
    runs-on: ubuntu-latest
    steps:
      - name: Collect every catalog-info.yaml in the org
        env:
          GH_TOKEN: ${{ secrets.CATALOG_READ_TOKEN }}
        run: |
          gh search code --owner my-org --filename catalog-info.yaml \
            --json path,repository \
          | jq -r '.[] | "\(.repository.nameWithOwner) \(.path)"' \
          | while read -r repo path; do
              gh api "repos/$repo/contents/$path" --jq '.content' | base64 -d
            done \
          | yq ea '[.]' -o=json > catalog.json
      - name: Publish to the static catalogue site
        run: |
          az storage blob upload --account-name platformcatalogue \
            --container-name '$web' --name catalog.json --file catalog.json --overwrite

That catalog.json lands in an Azure Storage static website ($web container) that costs roughly the price of a coffee per year to host. The catalogue front-end is a single page that fetches the JSON and renders a searchable table. “Who owns payments-api? What’s the runbook? What does it depend on?” All answered in one query, by anyone, without a database to back up or a service to keep alive at 02:00.

From New Service to First Deploy: Under 30 Minutes

gh repo create --template. Fill in four lines of catalog-info.yaml. Write your logic. git push. The workflow builds, tests, and deploys with identity, tags, and labels already correct. The catalogue picks it up overnight. No ticket, no request to the platform team. The golden path is the request system.

When Backstage Actually Is the Right Answer

Backstage is the right answer when:

The platform team is five-plus people with genuine capacity to run Backstage as a product, including the upgrade treadmill.
The service count is large enough that catalogue queries genuinely matter: hundreds of services, not dozens. At dozens, a YAML file and a static page answer every question Backstage would.
You need a dependency graph, not a list. When “what breaks if auth-api goes down?” needs a traversable graph across hundreds of nodes, a flat YAML catalogue stops being enough.
You’ll get real use from existing Backstage plugins for proprietary in-house systems, where the plugin ecosystem saves you building integrations from scratch.

Two or three engineers and a few dozen services? Backstage will eat your runway. Pick the tool that matches your headcount, not the one that matched the headcount of the team whose conference talk you watched.

Where Teams Go Wrong

Teams build what they want to operate rather than what developers need to ship through.

Too much, too soon: UI before catalogue (the data model is the hard part; the UI is a weekend). CLI before template (gh repo create --template handles the first fifty services). Plugins for tools nobody uses yet. Speculative integration is speculative debt.

Not enough: The catalogue gets skipped because it feels like paperwork. Every incident without it costs you the forty minutes you “saved”. Documentation nobody can find at 02:00 is a graveyard. And the platform needs an on-call rota: no owner when it breaks means teams won’t depend on it. As I argue in Your Incident Response Plan Is a Lie. Here’s How to Fix It.: undefined ownership is the failure, not the missing tool.

If a feature doesn’t lower time-to-deploy, encode a good default, feed the catalogue, or kill a repeated SRE question: it waits.

What an IDP Cannot Fix

Tooling is downstream of organisation. Team dynamics at war, a production swamp nobody has time to drain, leadership that won’t give teams space to adopt the golden path: these are org problems, not tool problems. It’s the same trap I describe in Kubernetes Is Not a Platform Strategy. The tool is how you express the strategy, not a substitute for one.

Build one golden-path repo in two weeks: one service shape, CI, CD, Bicep, catalog-info.yaml. Adopt it on two real services, not pilots. Add the catalogue in an afternoon. Only ask whether you need Backstage at month three or four, once service count and team size give you an actual answer.

Earn the right to operate Backstage by first delivering value without it.

Comments