Security Tests That Prove Themselves

Your security tests pass. Great. But when did they actually run? Against which code version? Can you prove it wasn’t last Tuesday’s build you’re showing?

Most security testing lives in Word documents, Postman exports, and screenshot folders on SharePoint. The tests themselves might be perfectly valid. The problem is traceability: there’s no systematic link between test execution and the code being validated.

CLI-based security testing changes this equation. Instead of tests that produce reports, you build tests that prove themselves. Every execution generates structured logs with timestamps, correlation IDs, and commit hashes. The evidence trail isn’t something you create after the fact. It’s a byproduct of running the tests.

This approach works whether you’re preparing for compliance reviews or simply want confidence that your security controls actually function in the code you’re about to deploy.

The Documentation Problem

Recognize this pattern?

Security_Test_Report_Q2_2024.docx

✓ Authentication bypass: Tried /admin without token, got 401
✓ SQL injection: Tried ' OR '1'='1, got error message  
✓ Rate limiting: Sent 10 requests, got rate limited
✓ Authorization: User A couldn't access User B's data

Evidence: Screenshots in SharePoint
Next scheduled test: Q3 2024

The tests are valid. The evidence isn’t.

No repeatability: Manual tests run differently each time. Regression goes undetected.

No correlation: Tests run quarterly. Code deploys daily. The gap between “tested” and “deployed” grows with every sprint.

No traceability: Which deployment fixed which vulnerability? That question requires digging through months of documentation.

No automation: Security validation waits for team availability instead of running with every build.

The fix isn’t better documentation. It’s tests that document themselves.

Building Self-Documenting Security Tests

The approach uses xUnit with ASP.NET Core’s WebApplicationFactory. This combination lets you test your application in-memory without deploying to actual infrastructure. More importantly, it integrates seamlessly with CI/CD pipelines that capture structured output.

The key insight: every test should validate a specific security boundary and produce output that links execution to the code version being tested. You’re not writing tests that generate reports. You’re writing tests that generate evidence.

The Core Pattern

Authentication boundaries are the natural starting point. They’re well-understood, frequently attacked, and straightforward to validate. A test for unauthenticated access checks three things: the response code, the presence of proper authentication headers, and the absence of sensitive information in error messages.

public class SecurityTests : IClassFixture<WebApplicationFactory<Program>>
{
    private readonly HttpClient _client;

    public SecurityTests(WebApplicationFactory<Program> factory)
        => _client = factory.CreateClient();

    [Fact]
    [Trait("Category", "Security")]
    public async Task ProtectedEndpoint_NoToken_Returns401()
    {
        var response = await _client.GetAsync("/api/users/profile");

        Assert.Equal(HttpStatusCode.Unauthorized, response.StatusCode);
        
        // Error responses must not leak internal details
        var content = await response.Content.ReadAsStringAsync();
        Assert.DoesNotContain("database", content, StringComparison.OrdinalIgnoreCase);
        Assert.DoesNotContain("stack", content, StringComparison.OrdinalIgnoreCase);
    }

    [Fact]
    [Trait("Category", "Security")]
    public async Task CrossUserAccess_Returns403()
    {
        _client.DefaultRequestHeaders.Authorization = 
            new AuthenticationHeaderValue("Bearer", "token-for-userA");

        // User A attempts to access User B's data
        var response = await _client.GetAsync("/api/users/userB-id/profile");

        Assert.Equal(HttpStatusCode.Forbidden, response.StatusCode);
    }

    [Theory]
    [InlineData("' OR '1'='1")]
    [InlineData("<script>alert('xss')</script>")]
    [Trait("Category", "Security")]
    public async Task SearchEndpoint_MaliciousInput_Sanitized(string payload)
    {
        var response = await _client.GetAsync(
            $"/api/search?q={Uri.EscapeDataString(payload)}");

        if (response.IsSuccessStatusCode)
        {
            var content = await response.Content.ReadAsStringAsync();
            Assert.DoesNotContain("<script>", content);
        }
    }
}

The [Trait("Category", "Security")] attribute enables filtering. You can run dotnet test --filter "Category=Security" to execute only security tests, which is useful for CI/CD pipelines where you want security validation as a separate gate.

What Makes This Self-Documenting

The test output itself becomes evidence. When tests run in CI/CD, the execution context (commit hash, build number, timestamp) gets captured automatically in the pipeline logs. You don’t need to generate separate reports. The test run is the report.

For explicit logging, add a helper that writes structured output:

public static class SecurityTestLog
{
    public static void Write(string testName, bool passed)
    {
        var entry = new
        {
            Timestamp = DateTimeOffset.UtcNow,
            TestName = testName,
            Result = passed ? "PASS" : "FAIL",
            CommitSha = Environment.GetEnvironmentVariable("GITHUB_SHA") ?? "local"
        };
        Console.WriteLine($"SECURITY_TEST: {JsonSerializer.Serialize(entry)}");

        // Save the entry to a file or database if needed
        // ...
    }
}

This structured output gets captured in CI/CD logs, creating a searchable history of every security test execution across every deployment.

Running Tests in CI/CD

The real value emerges when these tests run on every commit. In a CI/CD pipeline, each test execution automatically captures the commit hash, build number, and timestamp. This context transforms test results from “tests passed” into “tests passed for commit abc123 at 2024-06-15T14:32:00Z.”

A minimal GitHub Actions workflow runs security tests and preserves results:

- name: Run Security Tests
  run: dotnet test --filter "Category=Security" --logger "trx"
  env:
    GITHUB_SHA: ${{ github.sha }}

- name: Upload Results
  uses: actions/upload-artifact@v4
  with:
    name: security-results-${{ github.run_number }}
    path: ./TestResults/**/*.trx
    retention-days: 90

The 90-day retention creates a historical record. When someone asks “was this tested before deployment?” you can point to specific artifacts linked to specific commits. The evidence exists independent of any documentation someone might have written.

What Changes

Once security tests run in CI/CD with proper artifact retention, several things shift.

Regression detection becomes automatic. A vulnerability fixed in March stays fixed. If code changes reintroduce it in September, the test fails immediately rather than waiting for the next quarterly review.

The “tested vs. deployed” gap closes. When tests run on every pull request, the code being deployed is the code that was tested. No more hoping that the security validation from three weeks ago still applies to today’s release.

Evidence generation becomes passive. You’re not creating compliance documentation. The documentation creates itself as a byproduct of running tests. Pipeline logs, test artifacts, and commit history combine into an evidence trail that’s harder to fabricate than a Word document.

Security testing scales with development velocity. The team deploys five times a day? Security tests run five times a day. No bottleneck waiting for security team availability.

This matters for compliance reviews, certainly. But it also matters for the simpler question: “Do our security controls actually work in the code we’re shipping?” Automated tests answer that question continuously, not quarterly.

Where to Start

Begin with authentication tests. They validate the most commonly attacked boundary and demonstrate the pattern clearly. Use WebApplicationFactory<TProgram> to test your ASP.NET Core application in-memory. This requires no deployed infrastructure and runs fast enough for CI/CD feedback loops.

Organize tests with [Trait("Category", "Security")] from the start. This enables running security tests separately from unit tests, which is useful when you want security validation as a distinct pipeline gate. For teams seeking a cleaner approach, the open-source NetEvolve.Extensions.XUnit.V3 package provides standardized attributes like [IntegrationTest] or [AcceptanceTest] that work consistently across xUnit, NUnit, MSTest, and TUnit with the same dotnet test --filter TestCategory=... syntax.

Compatibility library for solutions using multiple .NET test frameworks.

Configure artifact retention for at least 90 days. Shorter retention means you can’t demonstrate testing history during compliance reviews. Longer retention costs storage but provides deeper history.

Start small. Three or four authentication tests that run on every commit provide more value than 50 tests that run quarterly. The goal is continuous validation, not comprehensive coverage on day one.

The Shift

Manual testing produces documents. Automated testing produces evidence.

When someone asks “How do you verify security testing?” the answer changes. Instead of pointing to a quarterly report, you point to 847 test executions across 23 deployments, each linked to a specific commit and preserved in pipeline artifacts.

Security professionals still define what to test. The automation handles execution, logging, and retention. The result: security validation that runs continuously and proves itself without requiring anyone to write a report.

Comments