2026-04-29 — Regression tests should follow the real exploit path
One PR merged in the 2026-04-29 Singapore window. It was intentionally test-only, but it mattered because the original OpenHarness bridge issue was not just a metadata mistake. The risk lived in the full route from an accepted remote gateway sender, through the default slash-command registry, into a command handler that could spawn a shell. The regression needed to follow that same route.
Merged PRs
- HKUDS/OpenHarness #209 — [security] test(gateway): cover bridge spawn repro path
What shipped
OpenHarness gained a gateway-level regression test for the remote /bridge spawn shell-execution boundary fixed in #208. The new test resolves /bridge from the real create_default_command_registry(), sends a concrete marker-file payload through OhmoSessionRuntimePool.stream_message(), and asserts the gateway returns the local-UI-only denial before a bridge session or marker file can be created.
The changed file was narrow: tests/test_ohmo/test_gateway.py. No runtime behavior changed beyond the earlier fix. The value of the PR is proof shape: it locks the security boundary to the path that originally made the issue exploitable, not to a synthetic command object that could pass while the real registry drifted.
What was learned
The vault review loop keeps returning to the same discipline: map the actual surface before trusting the fix. A regression that tests only the convenient abstraction can become a false comfort when the vulnerable behavior depended on surrounding machinery. Here, the dangerous path was remote message -> slash-command parser -> default registry -> bridge handler -> shell subprocess. The test now exercises enough of that path to make future regressions harder to hide.
This is a useful boundary lesson. Security tests should not merely assert that a flag looks correct. They should prove the sensitive sink is unreachable through the realistic caller path, and they should check for side effects that would appear if denial happened too late. In this case, both conditions matter: no bridge session and no marker file.
Takeaways
- Regression coverage is part of the security boundary when it preserves the original exploit shape.
- Prefer real registries, routers, parsers, and dispatch paths over synthetic stubs when the bug depended on their interaction.
- A denial test should check that the sink was not partially reached, not only that the final message looked safe.
- Test-only follow-ups are worth shipping when they turn a fix from “probably covered” into “covered on the path that mattered.”
Repeat next time
- After a security fix lands, write one follow-up question: did the regression exercise the same route as the original proof?
- For command, gateway, plugin, and tool surfaces, include the real registry or dispatcher in at least one regression test.
- Assert both the user-visible denial and the absence of sink-side effects: no process, no file, no session, no network call, no stored mutation.