Observability and Monitoring

Observability is the practice of understanding what’s happening inside complex distributed systems by examining their external outputs. Unlike traditional monitoring that asks “is this thing broken?”, observability enables teams to ask arbitrary questions about system behavior without predicting failure modes in advance.

This collection explores observability patterns for .NET applications, including structured logging with modern logging frameworks, distributed tracing across microservices, metrics collection and visualization, and correlation strategies that connect data across telemetry signals. Topics address the practical implementation of OpenTelemetry, integration with observability platforms, and strategies for managing telemetry data volumes.

The articles examine how to instrument applications effectively, balance observability costs with benefits, and build systems that provide meaningful insights during incidents. The focus is on understanding what makes systems truly observable and how to implement observability practices that support rapid debugging and system understanding in production environments.

Observability in AKS CNI Overlay: When Pod IPs Hide Behind Nodes

CNI Overlay masks pod IPs behind node IPs through SNAT, breaking traditional observability. Network logs show nodes, application logs show pods. Without Container Insights, correlation IDs, and distributed tracing, you’re debugging blind. SNAT port exhaustion mimics network failures, and timestamp-based correlation is fragile. The cost of proper monitoring is trivial compared to debugging outbound connectivity at 3 AM without visibility.

Audit Logging That Survives Your Next Security Incident

Your audit logs probably won’t survive a real security incident. Most implementations log too much, protect too little, and provide zero value when something breaks at 2 AM. Here’s how to fix that with structured logging that actually works.

Why Your Logging Strategy Fails in Production

Let me tell you what I’ve learned over the years from watching teams deploy logging strategies that looked great on paper and failed spectacularly at 3 AM when production burned.

It’s not that they didn’t know the theory. They’d read the Azure documentation. They’d seen the structured logging samples. They’d studied distributed tracing. The real problem was different: they knew what to do but had no idea why it mattered until production broke catastrophically.