Observability and Monitoring

Observability is the practice of understanding what’s happening inside complex distributed systems by examining their external outputs. Unlike traditional monitoring that asks “is this thing broken?”, observability enables teams to ask arbitrary questions about system behavior without predicting failure modes in advance.

This collection explores observability patterns for .NET applications, including structured logging with modern logging frameworks, distributed tracing across microservices, metrics collection and visualization, and correlation strategies that connect data across telemetry signals. Topics address the practical implementation of OpenTelemetry, integration with observability platforms, and strategies for managing telemetry data volumes.

The articles examine how to instrument applications effectively, balance observability costs with benefits, and build systems that provide meaningful insights during incidents. The focus is on understanding what makes systems truly observable and how to implement observability practices that support rapid debugging and system understanding in production environments.

Privacy Health Checks: Beyond Database Connectivity

Privacy Health Checks: Beyond Database Connectivity

Your health checks verify database connectivity every 30 seconds. Great. But do they know that 15% of your users have expired consents? Privacy compliance isn’t a documentation exercise—it’s an operational discipline. Same IHealthCheck interface, different questions. Two queries, one ratio, three possible outcomes. Here’s how to build privacy health checks that turn audit questions into dashboard demos.
Green Dashboard, Dead Application

Green Dashboard, Dead Application

Your application just crashed in production. Azure App Service kept routing traffic to the failing instance for ninety seconds. Users saw timeouts. Your monitoring dashboard stayed green because the web server responded with HTTP 200 while the database connection pool was exhausted.

I’ve watched this exact scenario play out at three different organizations in the past year. Each time, the post-mortem revealed the same root cause: health checks that verified the process was breathing without checking whether it could actually do its job. ISO/IEC 27001 Control A.17.2.1 exists precisely for this reason—availability is a security control, not an operational afterthought.

Observability in AKS CNI Overlay: When Pod IPs Hide Behind Nodes

Observability in AKS CNI Overlay: When Pod IPs Hide Behind Nodes

CNI Overlay masks pod IPs behind node IPs through SNAT, breaking traditional observability. Network logs show nodes, application logs show pods. Without Container Insights, correlation IDs, and distributed tracing, you’re debugging blind. SNAT port exhaustion mimics network failures, and timestamp-based correlation is fragile. The cost of proper monitoring is trivial compared to debugging outbound connectivity at 3 AM without visibility.
Audit Logging That Survives Your Next Security Incident

Audit Logging That Survives Your Next Security Incident

Your audit logs probably won’t survive a real security incident. Most implementations log too much, protect too little, and provide zero value when something breaks at 2 AM. Here’s how to fix that with structured logging that actually works.
Why Your Logging Strategy Fails in Production

Why Your Logging Strategy Fails in Production

Let me tell you what I’ve learned over the years from watching teams deploy logging strategies that looked great on paper and failed spectacularly at 3 AM when production burned.

It’s not that they didn’t know the theory. They’d read the Azure documentation. They’d seen the structured logging samples. They’d studied distributed tracing. The real problem was different: they knew what to do but had no idea why it mattered until production broke catastrophically.