skylakegrepv0.4.2

P0 hot-fix · v0.4.2

skylakegrep 0.4.2 — P0 hot-fix: KeyError 'snippet' on every escalated query

TL;DR

Hot-fix. 0.4.0 / 0.4.1 shipped a production crash — KeyError: 'snippet' — that fired on every CLI query that escalated AND graph-expanded contributed candidates. Users hit this whenever the cascade fell off the cheap path (~20 % of queries). Upgrading is mandatory.

The bug: 0.4.0's _expand_via_reference_graph() returned result dicts without a snippet field. The CLI's merge_results() reads result["snippet"] for every dict in the rerank pool. Crash. Hidden by tests that called cascade_search directly, never through the CLI's full path.

Compatibility

  • Python: unchanged — 3.9+
  • Index format: unchanged
  • Default behaviour for cheap-path queries: unchanged from 0.4.x
  • Escalated queries no longer crash — the only change

How it shipped past tests

This is the fourth receipt of the same anti-pattern in 0.3.x → 0.4.x:

Receipt What was tested What was missed
0.3.0 by-construction arguments actual recall
0.4.0 synthetic 3-file unit tests real-corpus behaviour
0.4.1 direct cascade_search API call the CLI merge_results path that consumes the dicts
0.4.2 (this) actual skygrep search CLI on a real index (this is the fix)

Auto-memory rule added: feedback_test_through_actual_user_path.md — every release MUST exercise skygrep search (the user's actual entry point) on a real index, with at least one query that escalates.

What's in 0.4.2

Code (1 line of substance):

# storage.py:_expand_via_reference_graph()
# Result dict shape must match cli.merge_results expectations:
out_results = [
    {"path": p, "score": s, "start_line": 0, "end_line": 0,
     "language": "", "chunk": "", "snippet": ""}     # ← was missing
    for p, s in kept
]

Verification: real CLI search query against the skylakegrep index, escalation path:

✓ 7.277s · quality=BEST
   path     : cosine-escalated-rerank (escalated to rerank)
   router   : fallback-rules → intent=semantic (0.60)
   evidence : σ-gap=0.0040 < τ=0.0060 (adaptive)
   pool     : 0 filename + 0 lexical · cascade
   index    : 21s ago ago · 66 files · L2 symbols + graph prior

No crash. Cascade and graph_expand both ran end-to-end. (Compare: the same query before the fix raised KeyError: 'snippet' at cli.py:92.)

Tests: 206 / 206 still pass — hot-fix only adds dict fields, no logic change.

Public OSS bench (Django + React + Tokio): pending verification

The published 30 / 30 OSS bench is not re-run in 0.4.2; it's deferred to 0.4.3. Why: the bench wrapper (benchmarks/public_oss_bench.py) currently stalls on the first fixture for reasons we have not yet diagnosed (no log output after === Django ===). The architectural invariant still holds — the 0.4.x changes are additive to the cascade rerank pool and cannot demote a candidate the cross-encoder ranks high — but the user's explicit instruction is never claim numbers without measurement, so 0.4.2 reports only what was actually verified: the hot-fix unblocks the CLI escalation path on a real index.

0.4.3 will: 1. Diagnose and fix the bench stall. 2. Run the full Django / React / Tokio 30-task bench. 3. Publish the verified 30 / 30 (or whatever the real number is) as the 0.4.x recall figure.

If the real number is < 30 / 30, 0.4.x will be rolled back to 0.2.21 cascade + the hot-fix only — per the user's invariant "never ship something worse than the best previous version".

Why this took four releases to surface

The pattern: every test in 0.3.x / 0.4.x exercised one component in isolation — cascade_search direct, synthetic seed mappers, unit-fixture graphs. None of them went through the CLI's merge_results(), which is the choke-point where every result dict's shape gets consumed. A snippet field missing in one candidate-source's dicts is invisible to those tests because they return their dicts to the test harness, not to merge_results.

The lesson — now in deepest memory: testing through internal APIs only verifies those APIs. The user's experience runs through skygrep search, and that's what the bench has to exercise.

Acknowledgments

User caught the missing test surface directly — "if 出现重大问题 你先止血呀然后找出真正的问题进行修复肯定不能比之前最好的那一版差吧" ("if there's a critical issue first stop the bleeding, then find the real problem and fix it; absolutely cannot ship something worse than the best previous version"). This release stops the bleeding. The OSS bench investigation continues in 0.4.3.