v0.5.13 · 100% local · PolyForm Noncommercial 1.0.0
Find anything on your machine.
Semantic search for code, PDFs, notes, and docs.
Fully offline. No cloud. No telemetry. No subscription.
Ask in plain English — or any of 100+ languages — get back the
right file and line range in about a second, even when
the working directory isn't the right project.
$skygrep "where does the auth token get refreshed?"
═══ auth/middleware.py:78-94score 0.91 · python
async def renew_session(req: Request):
# swap the access cookie when the refresh JWT is still valid
if req.cookies.get("rt") and access_expired(req):
return await refresh_token(claims, key)
$skygrep "the design doc on rate limiter rewrite"
═══ docs/rate-limiter-redesign.md:1-12score 0.87 · markdown
# Rate limiter redesign — Q3
We replace the leaky-bucket implementation with a sliding-window
log-based approach. [[wiki-link to design-2024-Q3]]
═══ designs/rate-limiter-2024Q3.pdf:p4 score 0.83 · pdf
Throughput projections under sustained 5k QPS load...
$skygrep "我昨天写的 cascade 调度代码"
═══ src/storage.py:847-892score 0.89 · python
def cascade_search(conn, query_embedding, *, ...):
# σ-adaptive early-exit threshold (MacKay/Williams)
tau_eff = max(CASCADE_TAU_FLOOR,
CASCADE_K_SIGMA * sigma_topK)
$cd /tmp/scratch && skygrep "where does the parallel umbrella dispatch?"
═══ ~/code/skylakegrep/src/cli.py:912-928score 0.86 · python · home-dir lazy hit
with ThreadPoolExecutor(max_workers=4) as ex:
# cascade ‖ filename_extend ‖ lazy_cwd ‖ lazy_cross_folder
cascade_fut = ex.submit(_run_cascade, query, db_path)
proactive_fut = ex.submit(_run_proactive, query, db_path)
return _stream_first_confident(cascade_fut, proactive_fut)
30 / 30 public-OSS recall
·
37.74× less agent context vs rg
·
~1.1 s wrong-path first answer
·
~1 s warm queries
·
100 % local
·
45 releases shipped
Three ways people use it
Show, don't tell.
Three real-shaped queries that demonstrate the value across
content types — code, docs, multilingual.
CODE BY CONCEPT
Find code by what it does, not what it's called.
You ask in plain English. The semantic substrate
(bge-m3) bridges your phrasing to the actual
identifier — even when the function name uses different
words from the query.
$skygrep "where does session refresh logic live?"
→ auth/middleware.py:78 · renew_session()
No rg hit for "session refresh"; semantic retrieval bridges to renew_session() from the project index.
CROSS-CONTENT
One query across code, PDFs, notes, and docs.
Markdown, PDF, Word, plain text — all indexed via the same
content-agnostic substrate. Your query searches all of
them at once, ranked by semantic relevance.
$skygrep "the design doc on rate limiter rewrite"
→ docs/rate-limiter.md · designs/q3-rewrite.pdf
Markdown link graph + PDF text-layer extraction in one cascade.
MULTILINGUAL · PRIVATE
Any language. Files never leave your laptop.
bge-m3 understands 100+ languages out of the
box. Index, retrieval, ranking, optional answer synthesis —
all run locally via Ollama. Zero network calls.
$skygrep "我昨天写的 cascade 调度代码"
→ src/storage.py:847 · cascade_search()
Mixed Chinese / English query. Zero network. Audit-friendly.
Command examples
Ask for the depth you need.
The bare command is intentionally fast for location and concept lookup.
Add depth only when you need file contents, synthesis, or structured
context for another agent. These examples are command patterns: run them
from the relevant project root, or add --include /
--lexical-root when an agent already knows the scope.
LOCATION
Find the right file first.
Use the bare form for where is..., find..., and
which file... questions. The router can answer from filename,
metadata, lexical, or semantic evidence without making you pick a lane.
$skygrep "where is the project brief I edited recently?"
→ docs/project-brief.md · scoped metadata / filename evidence
Fast path: return the path, skip unnecessary content reading.
CONTENT
Show the source text that supports the answer.
Add --content when the next step depends on what the
file says, not only where the file is.
$skygrep --content --detail standard "what does the API migration plan say about rollback?"
→ docs/migration-plan.md:42-58 · relevant snippet
Good default for human review and agent context.
DEEP READ
Read more after narrowing the target.
Use --detail full only when you intentionally want a
deeper local read. Bare --detail "query" is accepted as
shorthand for --detail full "query". Pair it with
--include to avoid dumping unrelated context.
$skygrep --content --detail full --include "docs/migration-plan.md" "show the deployment steps"
→ extended extracted text from the selected file
Higher depth, still scoped and local.
AGENT / JSON
Give another LLM structured context.
Agents should prefer --json over scraping terminal
output. Include a scope when the caller knows one; this avoids broad
home-folder exploration and gives the next LLM compact, relevant
evidence. Add --answer only when you want local synthesis
instead of source evidence.
$skygrep --json --content --detail standard --include "src/**" "where is token refresh implemented?"
$skygrep --answer --content --include "docs/**" "summarize the payment retry policy"
→ compact records with paths, scores, snippets, and route metadata
Lower token cost and less ambiguity for Claude Code, Codex, OpenCode, and other agents.
New in 0.5.x
Four qualitative leaps since 0.4.
The through-line: less ceremony from you, more
intelligence from the tool. No `skygrep index .` to
run, no need to be in the right folder, no silent stalls.
🚀 NO SETUP NEEDED
Just ask — no skygrep index ..
The first query in any fresh repo works. A background process
builds the semantic index while a rg fallback
handles your first turn; from the second query on, the full
cascade is online.
$cd /path/to/brand-new-project
$skygrep "how does auth handle expired tokens?"
→ src/auth/token.py:140 · refresh_or_redirect()
Cold-start vocabulary-mismatch hit-rate 0/10 → 4/10 over plain rg on the Django oracle bench (0.5.3, real-CLI verified).
🧭 WRONG FOLDER · NO PROBLEM
Smart from the wrong folder.
Run skygrep from /tmp and ask about a real
project. The router dispatches two retrieval lanes
in parallel; a proactive umbrella that searches
sibling roots in SKYGREP_PROACTIVE_DIRS can answer
before the cascade has time to run its first rerank.
$cd /tmp/scratch
$skygrep "where does the parallel umbrella dispatch?"
→ ~/code/skylakegrep/src/cli.py:912 · cascade ‖ proactive umbrella
Wrong-cwd discovery is bounded; use SKYGREP_PROACTIVE_DIRS or an explicit scope when an agent knows where to look.
🧠 STREAMING ROUTING
Honest about what's pending.
Each query is classified by a local LLM router
(qwen2.5:3b) for intent / scope / primary token,
then dispatched to multiple lanes in parallel. Results land
tagged with the route they came from and the still-searching
status of the others — never silent.
$skygrep "the design doc on rate limiter rewrite"
├─ proactive umbrella · filename glob
│ cascade still searching
═══ docs/rate-limiter-redesign.md:1
Confidence-streaming: results stream as they're ready, with the route they came from. Each answer's provenance is auditable.
🔍 ROUTING TRANSPARENCY · 0.5.8+
The active router lane is visible by default; --explain adds the deeper why.
The first human-output line now shows the active router lane
(├─ route router: semantic · primary_token=... · conf=... · source=...).
Foreground semantic waits use a TTY-only rotating particle flow in the
narrow left workflow rail by default; set
SKYGREP_UI_ANIMATION=off to disable it.
The result rail stays compact and copyable; set
SKYGREP_UI_RAIL=helix if you want the rail itself to use
a denser three-cell rotating particle field (• ·,
·•, · •, •· ) plus a slim
separator line instead of box connectors; it continues through progress,
result cards, and the final footer.
Interactive
terminals show Nerd Font step icons by default; set
SKYGREP_UI_ICONS=off to disable them. JSON and redirected
output stay stable.
Pass --explain when you need the deeper audit trail:
a one-sentence reason, per-result via: lines
showing which channel(s) contributed (cosine cascade · symbol
RRF · filename-lookup · ripgrep) and the score, and a
cascade-lane summary at the bottom with the
σ-adaptive evidence (gap=…, tau=…). No new model
calls, no extra retrieval — every signal was already in the
pipeline.
$skygrep "what does Project_Report say about retries"
├─ route router: semantic · primary_token="Project_Report" · conf=0.81 · source=fast-intent
├─ seed preliminary filename anchors + keyword matches (lazy semantic refinement starting)
═══ Project_Report.txt · bounded content preview
├─ embed lazy refinement continues in the same invocation
0.5.13: candidate recall keeps likely files visible before cascade, while content/agent calls get compact same-file support evidence.
Why skylakegrep
vs. ripgrep · mgrep · autodev-codebase · Cody.
What makes skylakegrep distinct in the local-search landscape — sized
against four named alternatives, not generic categories.
comparison matrix · 2026
skylakegrep
v0.5.13
ripgreplexical baseline
Sourcegraph Codycloud commercial
Find by concept, not just token
✓
✗
✓
✓
✓
Privacy — no data egress
✓
✓
✗ cloud-backed
✓
✗ cloud index
Content — multimodal
code · md · PDF · docx
text only
code · text · PDF · img
code-first
code-first
Setup
pip install
brew install
npm + Mixedbread acct
npm + Ollama
account + sub
Cost
$ 0 / mo
$ 0 / mo
sub + usage-based
$ 0 / mo
$ 20 – 100+ / mo
Multilingual queries (NL → code id)
bge-m3 native
n/a
cloud embedder
embedder-dep.
supported
How it works
Router → two retrieval lanes race in parallel.
After the router classifies your intent, a σ-adaptive cosine
cascade and a parallel proactive umbrella search at the same
time — not one after the other. The first confident answer
streams to your terminal; ranked refinements arrive as later
lanes finish. All against your local Ollama + SQLite. Zero
network.
1
LLM router
qwen2.5:3b · ~50 ms
Classifies intent (filename / lexical /
semantic / mixed), scope (content / recency /
size), and primary token. Persistent SQLite
cache; same query never pays the LLM cost twice.
→
2
Cosine cascade
bge-m3 · 0.5 – 2 s
σ-adaptive cascade: high-confidence queries early-exit on
cheap file-mean cosine; uncertain ones escalate to HyDE +
cross-encoder rerank. Bayesian-evidence framing
(τ_eff = max(τ_floor, k · σ_topK)).
→
3
Proactive umbrella
filename_extend · lazy_cwd · lazy_cross_folder · streaming dispatcher
Runs concurrent with the cascade, not after.
Four tiers race at the same time: filename_extend
for fast filename matching, lazy_cwd for
auto-indexing the current folder, lazy_cross_folder
for sibling roots in SKYGREP_PROACTIVE_DIRS, and a
streaming dispatcher that posts the first confident answer.
Hard caps: 30 s cascade, 8 s cross-folder. Wrong-path queries
answer in ~1 s.
vs Elasticsearch
Different niche, different design.
Elasticsearch is a multi-tenant, TB-scale, distributed search engine
for data centers. skylakegrep is a single-user, single-machine,
zero-ops CLI for a developer asking their own laptop a question.
Both can be called "search engines"; they answer different problems.
capability matrix · skylakegrep 0.5.x vs Elasticsearch 8.x
| skylakegrep 0.5.x | Elasticsearch |
| Setup | python3 -m pip install --user skylakegrep; cold-start lazy auto-trigger | JVM, cluster, mappings, ingest pipeline, dense-vector plugin, reindex |
| Semantic retrieval | bge-m3 (1024-d, 100+ languages) via local Ollama, out of the box | Manual: pick embedder, pipeline, dimension, reindex |
| Intent understanding | qwen2.5:3b LLM router classifies intent / scope / primary_token per query | None natively; you write query DSL by hand |
| Code AST awareness | tree-sitter symbol channel, RRF-fused with cosine | None; code is plain text |
| Cold-start / wrong-folder | lazy_cwd + lazy_cross_folder 4-lane parallel umbrella, ~1.1 s | Empty index = 0 results |
| Why-this-matched explainability | --explain shows router rationale + channel breakdown + lane evidence | BM25 highlight only |
| Cross-file context | reference-graph PageRank tiebreak | None |
| Privacy / offline | 100 % local by design | Index can be local, but most embeddings are external API calls |
| Latency p95 (50k-file repo) | 0.3 – 1.1 s including LLM router | ms-level after you've paid the operational cost |
| Scale | single-machine, single-repo sweet spot | billions of docs, multi-shard, distributed |
| Multi-tenant / ACL | not designed for this | first-class |
| Aggregations / facets | not designed for this | first-class |
| Operational cost | zero (no daemon, no GC, no shard rebalance) | non-trivial (GC, heap, shard rebalance, monitoring) |
Installation
Install the CLI and a local embedding model.
Package install
python3 --version
python3 -m pip install --user skylakegrep
skygrep doctor
macOS often ships without a python command. Use
python3 -m pip so the package, script, and diagnostics all
refer to the same Python installation. If skygrep is not
on PATH after install, inspect python3 -m site --user-base,
which -a skygrep, and python3 -m pip show skylakegrep.
export PATH="$(python3 -m site --user-base)/bin:$PATH"
From source
git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Ollama runtime
Install Ollama from
ollama.com
and pull at least one embedding model:
ollama pull bge-m3 # default; 1024-dim, multilingual, symmetric
# alternatives:
ollama pull mxbai-embed-large # 1024-dim, English/code only
ollama pull nomic-embed-text # 768-dim, smaller
Pull the default generation/router model separately. Ollama expects one
model per ollama pull command:
ollama pull qwen2.5:3b
Clean reinstall
skygrep setup --uninstall || true
python3 -m pip uninstall -y skylakegrep
rm -rf ~/.skylakegrep # optional: deletes only skygrep indexes/config
ollama rm bge-m3 # optional
ollama rm qwen2.5:3b # optional
python3 -m pip install --user --no-cache-dir skylakegrep
ollama pull bge-m3
ollama pull qwen2.5:3b
skygrep doctor
Register agent instructions once after install. The managed snippet
teaches Claude Code, Codex, OpenCode, Gemini CLI, and Cursor to choose
the right depth: bare skygrep for location/concept lookup,
--content for snippets, --detail full only
after narrowing, --answer for local synthesis, and
--json plus --include for machine-readable
agent calls when the scope is known. Re-running
skygrep setup refreshes the managed block when these
instructions improve. After an upgrade, normal skygrep
searches and skygrep doctor also refresh already-registered
managed blocks automatically; new integrations still require explicit
setup.
skygrep setup
Quickstart
Index a repository, then query it.
skygrep index /path/to/repo --reset
skygrep search "where is token refresh implemented?" -m 10
The first command walks the repository, chunks supported source files,
embeds each chunk via Ollama, and writes the result to
~/.skylakegrep/index.db. Subsequent runs re-index only
files whose modification time has changed and remove rows for files
that no longer exist.
For machine-readable output, add --json:
skygrep search "where is token refresh implemented?" -m 10 --json
Performance
30 / 30 fully-indexed · 31× denser agent context.
Reproducible against three popular codebases —
Django, React, Tokio.
30 hand-labelled questions, 10 per repo for the fully-indexed cascade.
0.5.13 adds an agent tool-context benchmark that measures whether one
structured skygrep --json --content call gives the next LLM
enough compact evidence compared with a raw rg agent that
runs multiple term searches. Every number on this page is reproducible
from public commands and generic task fixtures.
30 / 30
recall on public-OSS bench (fully-indexed cascade)
vs. 28 / 30 in 0.1.0; bge-m3 substrate broke the ceiling.
37.74×
less context vs. raw rg agent baseline
56 K tokens from skygrep vs. 2.13 M tokens from multi-term rg on the 0.5.13 agent-context benchmark.
31.27×
higher sufficiency density for LLM context
fewer calls and denser evidence; raw rg remains the recall ceiling when huge context is acceptable.
Honesty
What the data does and does NOT show.
-
Sample sizes are measurement-grade, not study-grade.
The agent tool-context benchmark is 8 generic tasks × 3 effort profiles;
the multi-language benchmark is n = 30. Statistical significance is
improving, but small.
-
Latency is honest, not hidden. bge-m3 inference
is slower than mxbai on Tokio (+57 % per-query); the three-repo
average is still −19 %. Tokio is the worst case, not the
average.
-
The published benchmark is not your benchmark.
30 hand-labelled queries on three public repos isn't your
codebase, your queries, your hardware. Reproduce locally.
-
Some queries are out of scope. Metadata
questions ("recent files", "largest files", "all files") get
routed to a hint suggesting
git log /
find. skygrep is content search, not file
management.