Platform Engineering Without Backstage: Pragmatic IDPs on Azure
Every platform engineering conference talk in the last two years has had a Backstage slide. Glossy catalogue screenshot, a scaffolder demo that creates a repo in four clicks, a knowing nod about “developer experience”. What the slide never shows is the six months the team spent building plugins, the Postgres instance somebody now babysits, the TechDocs theme nobody asked for, and the 0.4 of an engineer permanently assigned to chasing Backstage’s two-week release cadence.
There is no shame in any of this. Backstage is a serious project and serious teams run it well. The shame is treating it as the default (the thing you reach for on day one) when most teams could ship 80% of the value with a tenth of the effort and a fraction of the running cost. Backstage is a platform for building platforms. Most teams need a platform, not a platform-platform.
This post is the Internal Developer Platform (IDP) I keep building when nobody is forcing me to use Backstage. It is small, opinionated, runs on Azure plumbing you already pay for, and ships value in the first quarter instead of the third year.
What an IDP Actually Needs to Do
Before the tooling argument, the value list. An Internal Developer Platform exists to:
- Lower the time to first deployment for a new service. Day one, not day thirty.
- Make the right thing the easy thing. Security, observability, and cost defaults arrive for free, not as a checklist the team forgets.
- Provide a catalogue so anyone can answer “who owns this?” without a Slack archaeology session at 02:00.
- Stop the SRE from writing the same answer a third time. Every repeated question is a missing platform feature.
Everything else is decoration. If a feature doesn’t move one of those four numbers, skip it.
The Pragmatic Stack on Azure
Five components. None of them require Backstage. All of them you can stand up with tools your org already owns.
- Golden-path repositories: skeleton projects for the shapes you support (API, worker, frontend), each pre-wired to the platform.
- Reusable GitHub Actions workflows: a small set of
workflow_calltemplates that do the right thing so teams don’t copy-paste CI. - Opinionated Bicep modules: one module per platform service (Container App, Azure Kubernetes Service (AKS) namespace, Key Vault, Storage), with sane defaults and clearly marked escape hatches.
- A service catalogue: one YAML file per service repo, aggregated into a queryable, static site. No database.
- Scaffolding:
gh repo create --templateis enough for most teams. A CLI comes later, if ever.
Golden-Path Repository: The Highest-Leverage Artefact
A new service starts as a fork of a template repo. Day one, the team has CI, CD, infrastructure-as-code, and platform integration already wired together.
payments-api/ (created from my-org/template-dotnet-api)
├── .github/
│ └── workflows/
│ ├── ci.yml → calls the platform reusable workflow
│ └── cd.yml → calls the platform reusable workflow
├── infra/
│ └── main.bicep → consumes the platform Bicep module
├── src/ → service code (the only part the team writes)
├── catalog-info.yaml → one file, registers the service in the catalogue
├── .editorconfig → org formatting defaults
├── .gitignore
└── README.md → pre-filled: how to run, deploy, and page on-call
The discipline that makes this work: the team owns src/ and almost nothing else. The moment teams start hand-editing the workflow files or the Bicep, the golden path is dead and you’re back to snowflakes. Keep the platform-owned files thin pointers to versioned platform artefacts, and the template stays maintainable for years.
Reusable Workflows: One Place to Improve Everyone’s CI
This is the highest-leverage line of code you will write all quarter. When you fix something in a reusable workflow (a vulnerable action version, a missing Software Bill of Materials (SBOM) step, a flaky cache), every service that calls it picks up the fix on its next run. No PR to fifty repos. No migration guide nobody reads.
The platform owns the real workflow:
# my-org/platform-workflows/.github/workflows/dotnet-api.yml
name: dotnet-api
on:
workflow_call:
inputs:
service:
required: true
type: string
environment:
required: false
type: string
default: staging
permissions:
id-token: write # OIDC federation to Azure — no long-lived secrets in any repo
contents: read
packages: write
jobs:
build-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: "10.0.x"
- run: dotnet restore
- run: dotnet build --no-restore -c Release
- run: dotnet test --no-build -c Release --logger trx
# SBOM + image build are platform defaults, not per-team decisions.
- name: Build and push image
run: |
az acr build --registry ${{ vars.PLATFORM_ACR }} \
--image ${{ inputs.service }}:${{ github.sha }} .
deploy:
needs: build-test
runs-on: ubuntu-latest
environment: ${{ inputs.environment }}
steps:
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ vars.AZURE_TENANT_ID }}
subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
- uses: actions/checkout@v4
- name: Deploy infra + app
run: |
az deployment group create \
--resource-group rg-${{ inputs.service }}-${{ inputs.environment }} \
--template-file infra/main.bicep \
--parameters image=${{ vars.PLATFORM_ACR }}.azurecr.io/${{ inputs.service }}:${{ github.sha }}
The team’s entire deployment pipeline is then twelve lines that nobody needs to understand:
# payments-api/.github/workflows/cd.yml (complete file)
name: cd
on:
push:
branches: [main]
jobs:
ship:
uses: my-org/platform-workflows/.github/workflows/dotnet-api.yml@v3
with:
service: payments-api
environment: production
secrets: inherit
Pin the @v3 and treat platform releases like a product: a changelog, semantic versions, so teams adopt fixes on a tag bump instead of being silently broken by @main.
Bicep Modules: The Opinions, Codified
A Bicep module per platform service replaces hours of copy-paste, and more importantly, replaces hours of arguing about defaults. Teams call the module with the three or four parameters that matter for their service. The platform team owns the parameters that matter for everyone: tags, managed identity, network posture, the things an auditor will ask about. This is the same argument I make in Why Your Azure Portal Clicks Will Fail the Next Audit: defaults that live in code are defaults you can prove.
// modules/container-app.bicep — one module, sane defaults, marked escape hatches
@description('Service name. Drives naming, tags, and identity.')
param service string
@description('Container image including registry and tag.')
param image string
param location string = resourceGroup().location
// --- Escape hatches: teams override only what their service genuinely needs. ---
@allowed(['0.25', '0.5', '1.0', '2.0'])
param cpu string = '0.5'
param memory string = '1Gi'
param minReplicas int = 1
param maxReplicas int = 10
// --- Platform-owned defaults teams do NOT get to weaken. ---
var tags = {
service: service
'managed-by': 'platform'
'cost-center': 'engineering'
}
resource env 'Microsoft.App/managedEnvironments@2024-03-01' existing = {
name: 'platform-aca-env'
}
resource identity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
name: 'id-${service}'
location: location
tags: tags
}
resource app 'Microsoft.App/containerApps@2024-03-01' = {
name: service
location: location
tags: tags
identity: {
type: 'UserAssigned'
userAssignedIdentities: { '${identity.id}': {} }
}
properties: {
managedEnvironmentId: env.id
configuration: {
ingress: { external: true, targetPort: 8080, transport: 'auto' }
}
template: {
containers: [
{ name: service, image: image, resources: { cpu: json(cpu), memory: memory } }
]
scale: { minReplicas: minReplicas, maxReplicas: maxReplicas }
}
}
}
output fqdn string = app.properties.configuration.ingress.fqdn
The escape hatches are explicit and few. If a team needs to override something that isn’t a parameter, it’s a signal: either the default is wrong for everyone (fix the module), or the service is a genuine snowflake. Never allow a quiet fork of the module into a service repo. The day that happens, you have N modules to patch, not one.
Service Catalogue: A YAML File and a Build Step
Skipping the catalogue is the most expensive shortcut on this list, and teams skip it first because it feels like paperwork. It’s the difference between a five-second answer and a forty-minute incident dig. You don’t need Postgres, a graph database, or Backstage to have one.
One file per service repo:
# payments-api/catalog-info.yaml
service: payments-api
owner: team-payments
tier: 1 # 1 = revenue-critical; drives alerting + review policy
language: dotnet
hosting: container-apps
on_call: payments-oncall@example.com
links:
repo: https://github.com/example/payments-api
runbook: https://example.atlassian.net/wiki/payments-api
dashboard: https://example.grafana.net/d/payments
depends_on:
- postgres-payments
- auth-api
A nightly job in the platform repo collects every one of those files across the org and emits a single static JSON the catalogue site reads. The whole “backend” is the GitHub search API, jq, and yq:
# my-org/platform/.github/workflows/build-catalogue.yml
on:
schedule:
- cron: "0 3 * * *" # nightly; the org doesn't reorganise faster than that
workflow_dispatch:
jobs:
aggregate:
runs-on: ubuntu-latest
steps:
- name: Collect every catalog-info.yaml in the org
env:
GH_TOKEN: ${{ secrets.CATALOG_READ_TOKEN }}
run: |
gh search code --owner my-org --filename catalog-info.yaml \
--json path,repository \
| jq -r '.[] | "\(.repository.nameWithOwner) \(.path)"' \
| while read -r repo path; do
gh api "repos/$repo/contents/$path" --jq '.content' | base64 -d
done \
| yq ea '[.]' -o=json > catalog.json
- name: Publish to the static catalogue site
run: |
az storage blob upload --account-name platformcatalogue \
--container-name '$web' --name catalog.json --file catalog.json --overwrite
That catalog.json lands in an Azure Storage static website ($web container) that costs roughly the price of a coffee per year to host. The catalogue front-end is a single page that fetches the JSON and renders a searchable table. “Who owns payments-api? What’s the runbook? What does it depend on?” All answered in one query, by anyone, without a database to back up or a service to keep alive at 02:00.
From New Service to First Deploy: Under 30 Minutes
gh repo create --template. Fill in four lines of catalog-info.yaml. Write your logic. git push. The workflow builds, tests, and deploys with identity, tags, and labels already correct. The catalogue picks it up overnight. No ticket, no request to the platform team. The golden path is the request system.
When Backstage Actually Is the Right Answer
Backstage is the right answer when:
- The platform team is five-plus people with genuine capacity to run Backstage as a product, including the upgrade treadmill.
- The service count is large enough that catalogue queries genuinely matter: hundreds of services, not dozens. At dozens, a YAML file and a static page answer every question Backstage would.
- You need a dependency graph, not a list. When “what breaks if
auth-apigoes down?” needs a traversable graph across hundreds of nodes, a flat YAML catalogue stops being enough. - You’ll get real use from existing Backstage plugins for proprietary in-house systems, where the plugin ecosystem saves you building integrations from scratch.
Two or three engineers and a few dozen services? Backstage will eat your runway. Pick the tool that matches your headcount, not the one that matched the headcount of the team whose conference talk you watched.
Where Teams Go Wrong
Teams build what they want to operate rather than what developers need to ship through.
Too much, too soon: UI before catalogue (the data model is the hard part; the UI is a weekend). CLI before template (gh repo create --template handles the first fifty services). Plugins for tools nobody uses yet. Speculative integration is speculative debt.
Not enough: The catalogue gets skipped because it feels like paperwork. Every incident without it costs you the forty minutes you “saved”. Documentation nobody can find at 02:00 is a graveyard. And the platform needs an on-call rota: no owner when it breaks means teams won’t depend on it. As I argue in Your Incident Response Plan Is a Lie. Here’s How to Fix It.: undefined ownership is the failure, not the missing tool.
If a feature doesn’t lower time-to-deploy, encode a good default, feed the catalogue, or kill a repeated SRE question: it waits.
What an IDP Cannot Fix
Tooling is downstream of organisation. Team dynamics at war, a production swamp nobody has time to drain, leadership that won’t give teams space to adopt the golden path: these are org problems, not tool problems. It’s the same trap I describe in Kubernetes Is Not a Platform Strategy. The tool is how you express the strategy, not a substitute for one.
What I Recommend
Build one golden-path repo in two weeks: one service shape, CI, CD, Bicep, catalog-info.yaml. Adopt it on two real services, not pilots. Add the catalogue in an afternoon. Only ask whether you need Backstage at month three or four, once service count and team size give you an actual answer.
Earn the right to operate Backstage by first delivering value without it.

Comments