Structured Logging Patterns That Actually Survive Production

Structured Logging Patterns That Actually Survive Production

The previous post was a tour of the ways ILogger lies to you. This one is the answer to the question I always get afterwards: fine, so what should the architecture actually look like? This is the structured logging setup I trust in production. Not the simplest one, not the cleverest one. The one that survives a rollout, a sink outage, and a Friday-evening incident review.

It is opinionated. It will not match the defaults that any ASP.NET template hands you. That is the point: the defaults are how we got into the mess in the first place.

The Mental Model: Logs Are Events, Not Strings

Every problem this post addresses dissolves the moment you stop treating logs as strings. A log entry is an event with:

  • A name (the message template),
  • A set of typed properties (the arguments),
  • A context (scopes, activity, baggage),
  • A severity,
  • And, ideally, an identity that lets you correlate it across services.

The string is a rendering. Useful for a human reading a single line, useless for the system asking what happened to request X?

What This Means in Code

// Bad: a log entry that throws away its structure.
_logger.LogInformation($"Order {orderId} accepted for user {userId}");

// Good: an event with two structured properties and a stable name.
_logger.LogInformation(
    "Order accepted: {OrderId} {UserId}",
    orderId, userId);

The second form gives you searchable, aggregatable properties. The first form is a string. Your sink does not know what is in it.

Pattern 1: LoggerMessage Source Generators on Every Hot Path

The [LoggerMessage] attribute and its source generator have been part of Microsoft.Extensions.Logging since .NET 6. If a log call sits in a request path, a loop, or a per-item handler, it deserves a LoggerMessage definition.

The cost is one extra method declaration. The benefit is correctness (no interpolation surprises), performance (zero allocation when the level is filtered), and refactor safety (the template is checked at compile time). This is how [LoggerMessage] looks in practice:

public static partial class OrderLog
{
    [LoggerMessage(
        EventId = 1001,
        Level = LogLevel.Information,
        Message = "Order accepted: {OrderId} {UserId}")]
    public static partial void OrderAccepted(
        this ILogger logger, string orderId, string userId);

    [LoggerMessage(
        EventId = 1002,
        Level = LogLevel.Warning,
        Message = "Payment retry {RetryCount} for order {OrderId}")]
    public static partial void PaymentRetry(
        this ILogger logger, int retryCount, string orderId);

    [LoggerMessage(
        EventId = 1003,
        Level = LogLevel.Error,
        Message = "Order {OrderId} failed: {Reason}")]
    public static partial void OrderFailed(
        this ILogger logger, string orderId, string reason, Exception exception);
}

Usage at the call site is clean and carries no overhead when Information is filtered out in production:

OrderLog.OrderAccepted(_logger, orderId, userId);
OrderLog.PaymentRetry(_logger, retryCount, orderId);
OrderLog.OrderFailed(_logger, orderId, reason, ex);

Two things worth emphasizing. First: event IDs matter. They give you a stable numeric contract that survives rewording of the message template, language changes, and log sink upgrades. Define them in a central constants class and own them the way you own an API version. Second: the source generator enforces that parameter names match template placeholders at compile time. You will not ship a mismatched log call that silently renders wrong.

For anything that is not called frequently, direct ILogger calls with message templates are fine. Reserve source generators for hot paths. But if you have not gone through your codebase and identified which paths are hot, start there.

Pattern 2: Scopes Are for Context, Not for Strings

Scopes that hold properties are useful. Scopes that hold strings are decoration. The distinction matters because the right scope format is what enables structured enrichment across log entries inside a logical operation.

Make every scope a property bag:

using (_logger.BeginScope(new Dictionary<string, object?>
{
    ["CorrelationId"] = correlationId,
    ["TenantId"] = tenantId,
    ["OrderId"] = orderId
}))
{
    // every log entry inside carries these three properties
    OrderLog.OrderAccepted(_logger, orderId, userId);
}

Now for the part the documentation buries: scope capture requires explicit opt-in per provider. The table below shows what each common provider does by default and what you need to change:

ProviderScopes by defaultWhat you need
ConsoleOffIncludeScopes: true in config, or AddSimpleConsole(o => o.IncludeScopes = true) in code
SerilogCaptured when using Serilog.Extensions.LoggingNo extra step; scope key-value pairs flow into Serilog event properties automatically
Application Insights (SDK)OnScope properties appear in customDimensions; no extra step for the ILogger bridge
OpenTelemetry (AddOpenTelemetry)OffSet IncludeScopes = true on OpenTelemetryLoggerOptions

The second thing you need to wire is ActivityTrackingOptions. The logging framework can automatically inject trace context from the current Activity into every log entry as an implicit scope. In ASP.NET Core 6 and later, SpanId, TraceId, and ParentId are enabled by default. Baggage and Tags are not:

builder.Logging.Configure(options =>
{
    options.ActivityTrackingOptions =
        ActivityTrackingOptions.SpanId
        | ActivityTrackingOptions.TraceId
        | ActivityTrackingOptions.ParentId
        | ActivityTrackingOptions.Baggage;
});

Combined with IncludeScopes on your provider, this is the configuration that makes TraceId appear in every log entry automatically. You do not write it manually. The framework reads it from the current Activity.

Verify scope capture before you depend on it in production: query your actual log aggregation system for a scope property on a request you know ran. If the property is missing, your scope configuration is not complete.

Pattern 3: One Correlation ID, Threaded End-to-End

A correlation ID that lives in one service is not a correlation ID. It is a local label. The point is propagation across process boundaries.

Before getting into the mechanism, the distinction between Tags and Baggage deserves a sentence: Activity.SetTag stores data locally on the current span. It does not cross service boundaries. Activity.AddBaggage stores data in the W3C baggage header, which HttpClient propagates automatically to downstream services in .NET 5 and later. If you are attaching a business correlation ID and expecting it to show up in downstream logs, it needs to be baggage, not a tag. This is exactly the kind of thing that looks right in development (both are visible locally) and fails silently in production (downstream services never receive the tag).

Here is how to thread a business correlation ID end-to-end over HTTP:

// Inbound HTTP middleware: read the business correlation ID from the request,
// attach it to the current Activity as baggage so it propagates outbound.
app.Use(async (context, next) =>
{
    var correlationId = context.Request.Headers["X-Correlation-Id"].FirstOrDefault()
        ?? Activity.Current?.GetBaggageItem("correlation.id")
        ?? Guid.NewGuid().ToString("N");

    Activity.Current?.AddBaggage("correlation.id", correlationId);

    // Also push it into the log scope for structured enrichment
    using (_logger.BeginScope(new Dictionary<string, object?>
    {
        ["CorrelationId"] = correlationId
    }))
    {
        context.Response.Headers["X-Correlation-Id"] = correlationId;
        await next();
    }
});

On the outbound side, HttpClient in .NET 5 and later propagates the W3C traceparent and baggage headers by default via DistributedContextPropagator. Downstream services receive both and can read the correlation ID from Activity.Current?.GetBaggageItem("correlation.id") without additional wiring. One caveat: if you add OpenTelemetry and configure a custom propagator that does not include BaggagePropagator, the baggage header is silently dropped. The default CompositeTextMapPropagator in the OTel .NET SDK includes both TraceContextPropagator and BaggagePropagator, so you get baggage propagation unless you replace it explicitly. If your downstream services are not receiving the correlation ID, check your propagator chain before blaming the middleware.

For message-bus scenarios (Service Bus, RabbitMQ, Kafka), the principle is the same but the transport layer differs. Write the correlation ID into message application properties when sending. Read it and call Activity.Current?.AddBaggage(...) when receiving, before processing begins. Do not regenerate the ID at the receiving end.

When this is wired correctly, your sink lets you trace a single logical operation across services using a single query. The trace already exists; you just need the field to be there.

If you prefer not to wire this manually, NetEvolve.Http.Correlation.AspNetCore handles the inbound/outbound header propagation and scope enrichment as a drop-in middleware, useful when you want the behavior without owning the implementation.

Pattern 4: Levels Are a Contract, Not a Preference

A team that uses LogInformation for everything pays for the sink to ingest noise and gets nothing useful in return. A team that uses levels deliberately produces a signal an SRE can configure once and trust. The six levels are a contract with your operations team:

LevelWhen to useConcrete example
TraceStep-by-step execution details for development debugging; never in production"Entering ApplyDiscount with rate {Rate} for tier {Tier}"
DebugUseful temporarily during troubleshooting; filtered off by default in production"Cache miss for product {ProductId}, fetching from database"
InformationState transitions with business meaning and long-term value"Order {OrderId} accepted for customer {CustomerId}"
WarningRecovered from an unexpected condition; worth reviewing"Payment retry {RetryCount} for order {OrderId}; provider returned {StatusCode}"
ErrorOperation failed; this request did not complete"Order {OrderId} processing failed; payment gateway returned {ErrorCode}"
CriticalSystem integrity at risk; page someone now"Database connection pool exhausted; all {PoolSize} connections in use"

The test for whether a call belongs at a given level: describe the event in one of those terms. If you cannot, the call is either at the wrong level or should not exist.

The level I see abused most in production is Warning. Teams treat it as a soft Error: they know something went wrong, they do not want to fire an alert, so they log at Warning and move on. Six months later they have 40,000 warnings per day, none of them reviewed, and the dashboard that was supposed to catch real issues is useless because nobody looks at it anymore. I have inherited this situation in two separate projects. In both cases the root cause was the same: no one ever asked “if this fires at 2am, does someone need to act?” That is the right question for Warning. If the answer is no, it belongs at Information. If the answer is yes, it belongs at Error.

Error has its own trap. I see teams log Error for caught and handled exceptions: a validation failure, a 404, a retry that eventually succeeded. Those are not errors in the operational sense. They do not represent a broken request. When your Error stream includes handled conditions, your alert threshold creeps up to absorb the noise, and then a real failure at 3am does not page anyone. The definition in the table is intentional: if the request completed, it is not an Error.

Production defaults: filter Trace and Debug at the sink level. Configure alerts on Error count per minute. Configure pages on Critical. Warning goes to a dashboard that someone reviews daily. Information is your audit trail.

If your Information volume is high enough to worry about cost, that is a signal you are over-logging at Information, not a signal to raise the minimum level globally. Fix the categorization first.

Pattern 5: The Sink Is Part of the Architecture

Most teams pick a sink, configure it once, and never look at it again until the bill arrives. The sink is part of the architecture. It dictates what your queries look like, how expensive your logs are, and how fast you can recover during an incident.

Here is a realistic comparison of the four most common production options:

SinkStructured fieldsWhat it demandsQuery example
Azure Application InsightsYes, in customDimensionsApplicationInsights SDK or Microsoft.Extensions.Logging.ApplicationInsights; ILogger bridge enabledtraces | where customDimensions.OrderId == "ord-123"
Grafana LokiYes, as labels and line attributesLoki sink (e.g., Serilog.Sinks.Grafana.Loki); low-cardinality fields as labels only{app="api"} | json | OrderId = "ord-123"
Elasticsearch / OpenSearchYes, mapped fieldsNEST or Serilog.Sinks.Elasticsearch; correct index mapping; template for new indicesField query on OrderId.keyword
SeqYes, first-class structured propertiesSerilog.Sinks.Seq or direct HTTP API; minimal configuration overheadOrderId = "ord-123"

The practical differences matter more than the table implies. Seq has the lowest time-to-useful-query for a small to medium team. Elasticsearch gives you the most query power but requires the most operational investment. Loki is cost-efficient at scale if you accept its query model. Application Insights integrates with Azure end-to-end and is the right choice if you are already in Azure and not optimizing for cost.

One thing the table does not capture: Loki’s label model will surprise you if you come from a Serilog background. I watched a team put order IDs, user IDs, and request IDs into Loki labels because that is where they put enrichers in Serilog. Labels in Loki are supposed to be low-cardinality: app name, environment, log level. Each unique combination of label values becomes a separate log stream. Dump high-cardinality identifiers into labels and you get stream explosion: thousands of streams, index bloat, ingestion pressure, and eventually you hit Loki’s per-tenant stream limits. High-cardinality fields belong in the log line as structured JSON, where | json | OrderId = "ord-123" extracts them at query time. The query model is not better or worse than Elasticsearch, but it is different enough that you need to design for it before you ship, not after you notice the ingestion errors.

Whichever you pick, tune it. Sample high-volume categories. Drop health check noise before it reaches the sink. Define what a queryable property is before you ship, and confirm it arrives as a field in your aggregation system. The test takes five minutes: find a production log entry written with a named parameter and check whether the parameter appears as a separate field. If it does not, you have a sink configuration problem, not a code problem.

Pattern 6: OpenTelemetry Logs Where You Can

The OpenTelemetry Logs signal in .NET is stable as of .NET 8, via the OpenTelemetry NuGet package and its AddOpenTelemetry() ILogger integration. The pitch is real: your application code stays bound to ILogger, the sink destination is configured in one place, and you can switch exporters without touching application code.

The setup for a new ASP.NET Core application is straightforward:

builder.Logging.AddOpenTelemetry(logging =>
{
    logging.IncludeScopes = true;
    logging.SetResourceBuilder(
        ResourceBuilder.CreateDefault().AddService("order-api"));
    logging.AddOtlpExporter(); // sends to collector on localhost:4317 by default
});

Greenfield services should use this path. You get unified tracing and logging correlation automatically. TraceId and SpanId appear in every log entry because the OTel SDK populates them from the active span. Your logs and traces are correlated without additional ActivityTrackingOptions configuration.

I did not always reach for OTel Logs by default. For several years my answer to “how should we set up logging?” started with Serilog. The ecosystem was mature, the sink selection was broad, the enricher model worked well with ILogger. The moment I reconsidered was not a single incident; it was the third time I had to explain to a team why their Serilog MinimumLevel configuration and their appsettings.json Logging:LogLevel section were fighting each other, and why traces and logs were living in separate, unlinked systems. Once I started building with OTel traces enabled from day one, the correlation just worked. Logs and traces shared the same TraceId without any extra wiring. That changed the default for new services.

For existing services with a Serilog investment, the calculus is different. Replacing Serilog with the OTel logs bridge is not free. The better path is usually to add an OTLP (OpenTelemetry Protocol) sink to your existing Serilog pipeline (Serilog.Sinks.OpenTelemetry) and export to a collector that fans out to whatever backends you need. That gives you the decoupling benefit without the migration cost.

The place where OTel Logs is not yet worth the overhead is if your sole destination is Application Insights. The Azure Monitor OTel exporter works, but the native Application Insights SDK still has more telemetry features and tighter Azure integration. Keep an eye on the Azure Monitor distro for .NET; the gap is narrowing.

What I Do Not Recommend

Logging the full request body. Capture the properties that matter using named parameters, not a JsonSerializer.Serialize call on the whole object. Full body serialization costs allocations on every request, risks logging PII you did not intend to expose, and produces log entries that are expensive to store and useless to query.

Catching exceptions just to log them. If a middleware or background service handler higher up logs the exception with full context, the inner catch { _logger.LogError(ex, ...); throw; } adds a duplicate entry and noise. Log once, at the right level, with the right context. Let exceptions propagate unless you genuinely handle them.

Custom log output formats. The point of structured logging is the structure. A custom string format, however readable it looks in a terminal, throws away the properties and gives you substring matching at query time. Use the JSON output your sink expects and build dashboards on fields.

Maintaining two minimum level configurations. The Serilog dual-config trap from the previous post bears repeating here: if you use UseSerilog(), the Logging:LogLevel section in appsettings.json is ignored. Pick one authoritative source and delete the other.

Comments

VG Wort