release notes · v0.5.9
skylakegrep 0.5.9 — generic adaptive routing and scoped search performance
0.5.9 is a product-level routing release. It keeps the existing CLI and JSON surfaces compatible, but changes how skylakegrep plans a query before choosing retrieval lanes.
The core shift: a user query is no longer treated as one terminal intent. It can contain independent facets at the same time:
- scope — the folder / repo / workspace the user wants searched;
- target — the file, artifact, symbol, or document clue;
- metadata — created / opened / modified / size as either a full answer or a ranking constraint;
- answer depth — path-only, preview, evidence, or deeper semantic content.
The router can still use a small local model, but model output is not trusted blindly. Filesystem scope, filename evidence, lexical evidence, metadata facets, and semantic depth each have their own validation gate.
What changed
- Added a first-class query scope facet. Phrases such as
in CASE42 folder,inside Research Workspace, and在合同档案文件夹...now resolve to a concrete local root before retrieval starts. Scope constrains every lane and prevents broad hidden / tool-directory sweeps from outranking the folder the user actually named. - Scope clauses are stripped from the text that is sent to fast-intent, the LLM router, metadata analysis, and lexical gates. This prevents the folder name from becoming the primary filename token or semantic target.
- Metadata is now a plan facet, not a terminal intent by default.
show recently created filesremains a zero-semantic filesystem answer, whileshow where project brief recently created in PROJECT folderkeeps filename / lexical / semantic retrieval alive and uses creation time only as a modifier. - CJK and mixed-language scope handling is generic at the grammar layer:
scope suffixes such as
文件夹,目录, and项目are recognized even when the query continues immediately after the suffix. - Warm scoped semantic queries can finish from strong lexical evidence without waiting for expensive cascade / rerank. The fallback is scoped and evidence-based: a small result set plus snippet hits can satisfy the default human view even when the path vocabulary differs from the query.
- JSON / agent calls benefit from the same scoped lexical shortcut. A
structured
--json --detail summarycall can now return the relevant code and document snippets directly instead of waiting for cascade when lexical evidence is already sufficient. - Markdown, text, PDF, and docx content now participate more consistently in indexing. Text-like files are chunked directly; PDF/docx files use the existing bounded text extraction path.
- Filename lookup ranking now considers basename coverage, query-term coverage, hidden-path penalties, document/code extension priority, and metadata facets. This keeps concrete document anchors above unrelated generated or hidden files.
- Human output no longer says
sibling-folder semantic searchunless a sibling-folder pass actually ran. When lexical evidence is final, the UI sayskeyword matchesand markscascade-skipped.
Before vs after
The release benchmark uses only synthetic placeholder files and folders.
| Query shape | Previous behavior observed during validation | 0.5.9 behavior |
|---|---|---|
| Scoped filename lookup | Could wait behind broad scans or semantic setup | 0.397s, path=filename-lookup |
| Scoped semantic content | Could wait ~7 s because scope polluted routing | 0.338s, source=fast-intent |
| Metadata-only scoped query | Could fall into content search | 0.279s, path=metadata-created |
| Metadata modifier query | Could wait for LLM or be treated as metadata-only | 0.379s, metadata=created:modifier |
| Folder names with spaces | Could approach multi-second LLM routing | 0.401s, scoped fast plan |
| CJK filename scope | Could miss the folder scope | 0.370s, scoped filename lookup |
| CJK + English semantic scope | Could miss scope and search outside | 0.333s, scoped lexical evidence |
| Wrong-directory filename lookup | Keeps bounded proactive recovery | 0.372s, proactive filename result |
| Warm vocabulary-mismatch semantic query | Could wait 9-16 s for cascade/rerank | 0.453s, scoped lexical evidence |
| Warm JSON agent query | Could wait ~9.5 s | 0.877s, JSON rg-shortcut results |
Release benchmark
The 0.5.9 release gate ran a 12-case real CLI benchmark on a synthetic workspace:
OK filename_scope_simple 0.397s
OK semantic_scope_content 0.338s
OK metadata_terminal_scope 0.279s
OK metadata_modifier_scope 0.379s
OK scope_with_spaces_and_modifier 0.401s
OK cjk_filename_scope 0.370s
OK cjk_semantic_scope 0.333s
OK wrong_dir_proactive_no_scope 0.372s
OK json_filename_scope 0.334s
OK warm_semantic_vocab_mismatch 0.453s
OK warm_code_identifier_collision 0.616s
OK warm_json_semantic_agent_shape 0.877s
These cases cover cold-start, warm-index, filename, semantic, metadata terminal, metadata modifier, folder names with spaces, CJK / mixed language, wrong-directory proactive search, code identifier collisions, and JSON / LLM-agent output shape.
Verification
- Full test suite:
295 passed, 20 subtests passed. - Targeted routing suite:
122 passed, 3 subtests passed. - End-to-end synthetic CLI benchmark:
12 / 12 passed. git diff --check: clean.- Source privacy scan: required before build.
- Wheel/sdist privacy scan: required before upload.
Compatibility
- No CLI flag was removed.
- Required JSON fields are unchanged.
- No user action is required for existing project indexes. Text/PDF/docx coverage improves as indexes refresh or rebuild naturally.
- Router cache entries are keyed by the scope-stripped routing query, so stale folder-token routing decisions are not reused for scoped variants.
Privacy note
All examples in this release use fictional placeholders such as CASE42,
PROJECT folder, project-report.pdf, and 合同档案. No real user prompt,
private filename, private folder name, local machine path, screenshot, or
document category is included in the release notes, tests, docs, wheel, or
sdist.