Mutation Testing
We intentionally inject bugs into our plugins to verify the test suite actually catches them. This isn't about making plugins fail—it's about proving our tests actually work.
What is Mutation Testing?
Mutation testing systematically injects bugs (“mutations”) into source code to verify that the test suite detects them. If a test suite passes despite a bug being present, you've found a blind spot in your testing.
Think of it as testing the tests. A comprehensive test suite should catch every realistic bug we can inject.
Recent Campaign: BwsPressure v1.0.0
- Date:
- January 30, 2026
- Target:
- PressureProcessor.cpp
- Mutations planned:
- 10
- Mutations completed:
- 2
- Gaps identified:
- 2
- Remaining:
- 8 pending
Identified Gaps
Gap #1: DC Offset Detection Missing (HIGH)
Mutation: Injected +0.1 DC offset into output buffer at PressureProcessor.cpp:2009.
Result: All 6 stress tests passed (14 assertions in 6 test cases), including “DC Input Handling.” Should have failed.
Root cause: StressTests::validateBuffer() in StressTests.h:619–663 only checks for NaN, Inf, and clipping (values > maxAllowedValue, default 10.0). Does not measure DC offset at all. +0.1 is not NaN, not Inf, and not >10.0—so it passes.
Impact: DC offset can damage speakers, cause clicks/pops when bypassing, and accumulate in processing chains. Affects all plugins using BPTS stress testing.
Fix status: DC offset measurement added to BPTS framework roadmap (v2.2.0). Will validate output <0.001 residual DC (−60dBFS).
Gap #2: Sample-Level Artifact Detection Missing (HIGH)
Mutation: Injected 20dB gain spike (10x amplitude) at sample 256 in PressureProcessor.cpp:2009.
Result: All 5 artifact tests passed (22 assertions), all 6 stress tests passed, all 3 frequency tests passed. Zero failures detected.
Root cause: Artifact tests check for parameter-induced artifacts (Mix smoothing, mode transitions) but don't analyze the output buffer for sudden level changes at the sample level. The 10x spike is still below the clipping threshold (10.0 default in validateBuffer), and no sample-by-sample gradient checking exists.
Real-world bugs this would miss: Filter instabilities causing impulses, saturation/clipping bugs introducing glitches, denormal handling failures causing spikes, buffer corruption or uninitialized memory reads, race conditions causing sample dropouts.
Fix status: Sample-level discontinuity detection on roadmap (Q2 2026). Will detect sudden level changes exceeding 6dB between consecutive samples.
Full Test Matrix
| # | Mutation | Test Suite | Status |
|---|---|---|---|
| 1 | DC Offset Injection | Artifact Tests | GAP |
| 2 | Gain Spike | Artifact Tests | GAP |
| 3 | Break Oversampling | Frequency Tests | Pending |
| 4 | Parameter Range | Characterization | Pending |
| 5 | APVTS Sync | GUI Tests | Pending |
| 6 | NaN Injection | BOL Gate | Pending |
| 7 | Latency Report | Integration Tests | Pending |
| 8 | Undo Manager | Integration Tests | Pending |
| 9 | Memory Leak | Marathon Tests | Pending |
| 10 | RT Violation | Artifact Tests | Pending |
Why We Publish Test Gaps
Most companies hide their testing weaknesses. We publish them because transparency matters more than appearing perfect.
These gaps don't mean the plugins have bugs. They mean our test framework has blind spots that we're actively fixing. The plugins themselves passed every test we ran—we just discovered we need better tests.
If you find a bug our tests missed, we genuinely want to know. That's how we improve the framework for future plugins.
Case Study: DC Offset Detection (January 2026)
Mutation testing revealed our framework didn't validate DC offset—a critical issue that can damage speakers. We added the test within 48 hours, re-validated all plugins, and updated the testing standard. This is why we publish methodology: finding gaps improves the process.