release notes · v0.3.1 — principled rollback
skylakegrep 0.3.1 — principled rollback of graph-walk integration
This is a patch release: zero new features, one principled rollback. Production behaviour now strictly equals 0.2.21 — both cheap path and escalation path are unchanged from 0.2.21.
What happened
0.3.0 shipped a graph-walk retrieval substrate:
- SQLite tables for nodes + edges (4 cheap edge types)
graph_walk.py— bounded Personalized PageRankquery_seeds.py— cold-start 4-matcher seed mappergraph_substrate.py— index-time edge builder- Cascade integration gated behind
SKYGREP_GRAPH_WALK=1
The integration was claimed (in the 0.3.0 release notes) to satisfy two by-construction invariants — latency unchanged and accuracy non-regressing. Those claims were not measured.
After shipping, the first real-corpus end-to-end benchmark
(benchmarks/graph_walk_bench.py, 5 queries on
skylakegrep/src/) was run. Result:
- 2 / 5 queries hit the expected file (40 %).
- Walk latency was fine (p50 = 6.4 ms).
- But the right answer was correctly identified by the seed
mapper at 55 % seed mass (Q1:
graph_walk.py), then diluted out of the top-5 by the PPR walk's preset edge weights.
The root cause: 0.3.0 introduced 9 + preset hyperparameters:
match_filenames score_per_hit = 1.0
match_symbols score_per_hit = 1.5
match_path_tokens score_per_hit = 0.5
match_semantic threshold = 0.45
graph_substrate path_prox = 0.35
graph_walk α = 0.15
graph_walk eps = 1e-3
graph_walk max_visited = 200
graph_walk top_k_edges = 8
Each one a hand-chosen magic number. Each one a future tech-debt
liability. Together they fully violate docs/PRINCIPLES.md
Principle 1 — Understanding > Enumeration. A learned bandit on
top of nine handcrafted constants is not "intelligent retrieval";
it's a lookup table with extra steps.
User flagged it directly:
你这里所谓的权重是怎么调的你是 preset 的呢还是这个 automatically 的调的 … 所有的东西都不能 preset 要不然这就变成了 hyperparameter 我们不能要这么多的 hyperparameter 因为这会 accumulate 这种 technical debt
What 0.3.1 changes
Three concrete moves. Detail in
benchmarks/release-0.3.0-graph-walk.md.
1. Cascade integration removed
_graph_walk_candidates is no longer called from cascade_search.
The SKYGREP_GRAPH_WALK env var is dead — toggling it has no
effect. The escalation path reverts to 0.2.21's pure
Round-A ∪ Round-C union.
2. Preset score-per-hit constants stripped from query_seeds.py
All four matchers (match_filenames, match_symbols,
match_path_tokens, match_semantic) now return raw token-hit
counts (or, for semantic, the cosine score itself which is
data-derived). No 1.0 / 1.5 / 0.5 ratios anywhere in the
seed mapper.
3. path_prox edge weight derived from data
Instead of the preset 0.35 (earlier 0.7) constant, the weight is
now 1 / (1 + max(0, 8 - depth)) — depth-derived from the file's
path itself, no free parameter.
What stays — substrate as primitive
The schema (graph_node, graph_edge tables), the modules
(graph_walk.py, query_seeds.py, graph_substrate.py), and the
20 unit tests all stay. They're useful primitives — just not
on the cascade critical path until weights can be derived from
corpus statistics rather than human picks.
When graph-walk reintegration is attempted again, the weights and walk parameters MUST come from one of:
- TF-IDF derived from the corpus: edge weight =
log(N_files / N_files_containing_token) - Learned per-query by the LLM router: α tuned by the same head that already classifies intent / scope / primary_token
- σ-adaptive walk stop: replace
epswith the same Bayesian σ-evidence framing already used incascade_search—eps_eff = max(eps_floor, k · σ(top_K_residuals))
None of these are implemented in 0.3.1; they're the design contract for the next attempt.
Compatibility
- Python: unchanged — 3.9+
- Default embedder / LLM router: unchanged
- Wheel surface: unchanged
- Index format: forward-compatible (graph tables stay; data written by 0.3.0 is preserved but not consumed)
- JSON output schema: unchanged
- Production behaviour identical to 0.2.21 (the v0.3.0 cascade integration is the only thing that changed; rolling it back returns to the prior baseline)
Bench numbers
- Public-OSS bench: identical to 0.2.21 (cascade unchanged).
- Real-corpus 5-query bench: cascade-only baseline (no graph
walk) is now the operating point; the substrate-integrated
2 / 5 result is the rollback receipt, documented in
benchmarks/release-0.3.0-graph-walk.md. - Tests: 221 / 221 pass.
Auto-memory entry added
feedback_comprehensive_test_before_release.md — every release
must include real-corpus end-to-end benchmark with concrete numbers,
not just unit tests. Unit tests passed in 0.3.0 (20 / 20 in
test_graph_walk.py); the integration still failed because unit
correctness is not end-to-end intelligence.
Acknowledgments
User caught the principle violation in two messages: 1. "你有做test来去证明和benchmark吗以後每一次release前都要做 comprehensive test" — every release needs comprehensive end- to-end test 2. "所有的东西都不能 preset … 我们不能要这么多的 hyperparameter … 我们要的 intelligence 是要足够的 generic 的" — no preset hyperparameters; intelligence must be generic
Both lessons are now in auto-memory and reflected in the rollback. This is the right kind of release: smaller, more honest, restores strict baseline behaviour, documents what didn't work and why.