skylakegrep

v0.5.13 · 100% local · PolyForm Noncommercial 1.0.0

Find anything on your machine.

Semantic search for code, PDFs, notes, and docs. Fully offline. No cloud. No telemetry. No subscription. Ask in plain English — or any of 100+ languages — get back the right file and line range in about a second, even when the working directory isn't the right project.

30 / 30 public-OSS recall · 37.74× less agent context vs rg · ~1.1 s wrong-path first answer · ~1 s warm queries · 100 % local · 45 releases shipped

Three ways people use it

Show, don't tell.

Three real-shaped queries that demonstrate the value across content types — code, docs, multilingual.

CODE BY CONCEPT

Find code by what it does, not what it's called.

You ask in plain English. The semantic substrate (bge-m3) bridges your phrasing to the actual identifier — even when the function name uses different words from the query.

$skygrep "where does session refresh logic live?"
auth/middleware.py:78 · renew_session()
No rg hit for "session refresh"; semantic retrieval bridges to renew_session() from the project index.
CROSS-CONTENT

One query across code, PDFs, notes, and docs.

Markdown, PDF, Word, plain text — all indexed via the same content-agnostic substrate. Your query searches all of them at once, ranked by semantic relevance.

$skygrep "the design doc on rate limiter rewrite"
docs/rate-limiter.md · designs/q3-rewrite.pdf
Markdown link graph + PDF text-layer extraction in one cascade.
MULTILINGUAL · PRIVATE

Any language. Files never leave your laptop.

bge-m3 understands 100+ languages out of the box. Index, retrieval, ranking, optional answer synthesis — all run locally via Ollama. Zero network calls.

$skygrep "我昨天写的 cascade 调度代码"
src/storage.py:847 · cascade_search()
Mixed Chinese / English query. Zero network. Audit-friendly.

Command examples

Ask for the depth you need.

The bare command is intentionally fast for location and concept lookup. Add depth only when you need file contents, synthesis, or structured context for another agent. These examples are command patterns: run them from the relevant project root, or add --include / --lexical-root when an agent already knows the scope.

LOCATION

Find the right file first.

Use the bare form for where is..., find..., and which file... questions. The router can answer from filename, metadata, lexical, or semantic evidence without making you pick a lane.

$skygrep "where is the project brief I edited recently?"
docs/project-brief.md · scoped metadata / filename evidence
Fast path: return the path, skip unnecessary content reading.
CONTENT

Show the source text that supports the answer.

Add --content when the next step depends on what the file says, not only where the file is.

$skygrep --content --detail standard "what does the API migration plan say about rollback?"
docs/migration-plan.md:42-58 · relevant snippet
Good default for human review and agent context.
DEEP READ

Read more after narrowing the target.

Use --detail full only when you intentionally want a deeper local read. Bare --detail "query" is accepted as shorthand for --detail full "query". Pair it with --include to avoid dumping unrelated context.

$skygrep --content --detail full --include "docs/migration-plan.md" "show the deployment steps"
→ extended extracted text from the selected file
Higher depth, still scoped and local.
AGENT / JSON

Give another LLM structured context.

Agents should prefer --json over scraping terminal output. Include a scope when the caller knows one; this avoids broad home-folder exploration and gives the next LLM compact, relevant evidence. Add --answer only when you want local synthesis instead of source evidence.

$skygrep --json --content --detail standard --include "src/**" "where is token refresh implemented?"
$skygrep --answer --content --include "docs/**" "summarize the payment retry policy"
→ compact records with paths, scores, snippets, and route metadata
Lower token cost and less ambiguity for Claude Code, Codex, OpenCode, and other agents.

New in 0.5.x

Four qualitative leaps since 0.4.

The through-line: less ceremony from you, more intelligence from the tool. No `skygrep index .` to run, no need to be in the right folder, no silent stalls.

🚀 NO SETUP NEEDED

Just ask — no skygrep index ..

The first query in any fresh repo works. A background process builds the semantic index while a rg fallback handles your first turn; from the second query on, the full cascade is online.

$cd /path/to/brand-new-project
$skygrep "how does auth handle expired tokens?"
src/auth/token.py:140 · refresh_or_redirect()
Cold-start vocabulary-mismatch hit-rate 0/10 → 4/10 over plain rg on the Django oracle bench (0.5.3, real-CLI verified).
🧭 WRONG FOLDER · NO PROBLEM

Smart from the wrong folder.

Run skygrep from /tmp and ask about a real project. The router dispatches two retrieval lanes in parallel; a proactive umbrella that searches sibling roots in SKYGREP_PROACTIVE_DIRS can answer before the cascade has time to run its first rerank.

$cd /tmp/scratch
$skygrep "where does the parallel umbrella dispatch?"
~/code/skylakegrep/src/cli.py:912 · cascade ‖ proactive umbrella
Wrong-cwd discovery is bounded; use SKYGREP_PROACTIVE_DIRS or an explicit scope when an agent knows where to look.
🧠 STREAMING ROUTING

Honest about what's pending.

Each query is classified by a local LLM router (qwen2.5:3b) for intent / scope / primary token, then dispatched to multiple lanes in parallel. Results land tagged with the route they came from and the still-searching status of the others — never silent.

$skygrep "the design doc on rate limiter rewrite"
├─ proactive umbrella · filename glob
│ cascade still searching
═══ docs/rate-limiter-redesign.md:1
Confidence-streaming: results stream as they're ready, with the route they came from. Each answer's provenance is auditable.
🔍 ROUTING TRANSPARENCY · 0.5.8+

The active router lane is visible by default; --explain adds the deeper why.

The first human-output line now shows the active router lane (├─ route router: semantic · primary_token=... · conf=... · source=...). Foreground semantic waits use a TTY-only rotating particle flow in the narrow left workflow rail by default; set SKYGREP_UI_ANIMATION=off to disable it. The result rail stays compact and copyable; set SKYGREP_UI_RAIL=helix if you want the rail itself to use a denser three-cell rotating particle field (• ·, ·•, · •, •· ) plus a slim separator line instead of box connectors; it continues through progress, result cards, and the final footer. Interactive terminals show Nerd Font step icons by default; set SKYGREP_UI_ICONS=off to disable them. JSON and redirected output stay stable. Pass --explain when you need the deeper audit trail: a one-sentence reason, per-result via: lines showing which channel(s) contributed (cosine cascade · symbol RRF · filename-lookup · ripgrep) and the score, and a cascade-lane summary at the bottom with the σ-adaptive evidence (gap=…, tau=…). No new model calls, no extra retrieval — every signal was already in the pipeline.

$skygrep "what does Project_Report say about retries"
├─ route router: semantic · primary_token="Project_Report" · conf=0.81 · source=fast-intent
├─ seed preliminary filename anchors + keyword matches (lazy semantic refinement starting)
═══ Project_Report.txt · bounded content preview
├─ embed lazy refinement continues in the same invocation
0.5.13: candidate recall keeps likely files visible before cascade, while content/agent calls get compact same-file support evidence.

Why skylakegrep

vs. ripgrep · mgrep · autodev-codebase · Cody.

What makes skylakegrep distinct in the local-search landscape — sized against four named alternatives, not generic categories.

comparison matrix · 2026
skylakegrep v0.5.13
ripgreplexical baseline
mgrepMixedbread · paid
autodev-codebaseOllama · OSS
Sourcegraph Codycloud commercial
Find by concept, not just token
Privacy — no data egress
cloud-backed
cloud index
Content — multimodal
code · md · PDF · docx
text only
code · text · PDF · img
code-first
code-first
Setup
pip install
brew install
npm + Mixedbread acct
npm + Ollama
account + sub
Cost
$ 0 / mo
$ 0 / mo
sub + usage-based
$ 0 / mo
$ 20 – 100+ / mo
Multilingual queries (NL → code id)
bge-m3 native
n/a
cloud embedder
embedder-dep.
supported

data: each project's public site / README · 2026-05 · positions are descriptive, not endorsements

How it works

Router → two retrieval lanes race in parallel.

After the router classifies your intent, a σ-adaptive cosine cascade and a parallel proactive umbrella search at the same time — not one after the other. The first confident answer streams to your terminal; ranked refinements arrive as later lanes finish. All against your local Ollama + SQLite. Zero network.

1
LLM router
qwen2.5:3b · ~50 ms
Classifies intent (filename / lexical / semantic / mixed), scope (content / recency / size), and primary token. Persistent SQLite cache; same query never pays the LLM cost twice.
2
Cosine cascade
bge-m3 · 0.5 – 2 s
σ-adaptive cascade: high-confidence queries early-exit on cheap file-mean cosine; uncertain ones escalate to HyDE + cross-encoder rerank. Bayesian-evidence framing (τ_eff = max(τ_floor, k · σ_topK)).
3
Proactive umbrella
filename_extend · lazy_cwd · lazy_cross_folder · streaming dispatcher
Runs concurrent with the cascade, not after. Four tiers race at the same time: filename_extend for fast filename matching, lazy_cwd for auto-indexing the current folder, lazy_cross_folder for sibling roots in SKYGREP_PROACTIVE_DIRS, and a streaming dispatcher that posts the first confident answer. Hard caps: 30 s cascade, 8 s cross-folder. Wrong-path queries answer in ~1 s.

vs Elasticsearch

Different niche, different design.

Elasticsearch is a multi-tenant, TB-scale, distributed search engine for data centers. skylakegrep is a single-user, single-machine, zero-ops CLI for a developer asking their own laptop a question. Both can be called "search engines"; they answer different problems.

capability matrix · skylakegrep 0.5.x vs Elasticsearch 8.x
skylakegrep 0.5.xElasticsearch
Setuppython3 -m pip install --user skylakegrep; cold-start lazy auto-triggerJVM, cluster, mappings, ingest pipeline, dense-vector plugin, reindex
Semantic retrievalbge-m3 (1024-d, 100+ languages) via local Ollama, out of the boxManual: pick embedder, pipeline, dimension, reindex
Intent understandingqwen2.5:3b LLM router classifies intent / scope / primary_token per queryNone natively; you write query DSL by hand
Code AST awarenesstree-sitter symbol channel, RRF-fused with cosineNone; code is plain text
Cold-start / wrong-folderlazy_cwd + lazy_cross_folder 4-lane parallel umbrella, ~1.1 sEmpty index = 0 results
Why-this-matched explainability--explain shows router rationale + channel breakdown + lane evidenceBM25 highlight only
Cross-file contextreference-graph PageRank tiebreakNone
Privacy / offline100 % local by designIndex can be local, but most embeddings are external API calls
Latency p95 (50k-file repo)0.3 – 1.1 s including LLM routerms-level after you've paid the operational cost
Scalesingle-machine, single-repo sweet spotbillions of docs, multi-shard, distributed
Multi-tenant / ACLnot designed for thisfirst-class
Aggregations / facetsnot designed for thisfirst-class
Operational costzero (no daemon, no GC, no shard rebalance)non-trivial (GC, heap, shard rebalance, monitoring)

Installation

Install the CLI and a local embedding model.

Package install

python3 --version
python3 -m pip install --user skylakegrep
skygrep doctor

macOS often ships without a python command. Use python3 -m pip so the package, script, and diagnostics all refer to the same Python installation. If skygrep is not on PATH after install, inspect python3 -m site --user-base, which -a skygrep, and python3 -m pip show skylakegrep.

export PATH="$(python3 -m site --user-base)/bin:$PATH"

From source

git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Ollama runtime

Install Ollama from ollama.com and pull at least one embedding model:

ollama pull bge-m3 # default; 1024-dim, multilingual, symmetric
# alternatives:
ollama pull mxbai-embed-large # 1024-dim, English/code only
ollama pull nomic-embed-text # 768-dim, smaller

Pull the default generation/router model separately. Ollama expects one model per ollama pull command:

ollama pull qwen2.5:3b

Clean reinstall

skygrep setup --uninstall || true
python3 -m pip uninstall -y skylakegrep
rm -rf ~/.skylakegrep        # optional: deletes only skygrep indexes/config
ollama rm bge-m3             # optional
ollama rm qwen2.5:3b         # optional
python3 -m pip install --user --no-cache-dir skylakegrep
ollama pull bge-m3
ollama pull qwen2.5:3b
skygrep doctor

Register agent instructions once after install. The managed snippet teaches Claude Code, Codex, OpenCode, Gemini CLI, and Cursor to choose the right depth: bare skygrep for location/concept lookup, --content for snippets, --detail full only after narrowing, --answer for local synthesis, and --json plus --include for machine-readable agent calls when the scope is known. Re-running skygrep setup refreshes the managed block when these instructions improve. After an upgrade, normal skygrep searches and skygrep doctor also refresh already-registered managed blocks automatically; new integrations still require explicit setup.

skygrep setup

Quickstart

Index a repository, then query it.

skygrep index /path/to/repo --reset
skygrep search "where is token refresh implemented?" -m 10

The first command walks the repository, chunks supported source files, embeds each chunk via Ollama, and writes the result to ~/.skylakegrep/index.db. Subsequent runs re-index only files whose modification time has changed and remove rows for files that no longer exist.

For machine-readable output, add --json:

skygrep search "where is token refresh implemented?" -m 10 --json

Performance

30 / 30 fully-indexed · 31× denser agent context.

Reproducible against three popular codebases — Django, React, Tokio. 30 hand-labelled questions, 10 per repo for the fully-indexed cascade. 0.5.13 adds an agent tool-context benchmark that measures whether one structured skygrep --json --content call gives the next LLM enough compact evidence compared with a raw rg agent that runs multiple term searches. Every number on this page is reproducible from public commands and generic task fixtures.

30 / 30
recall on public-OSS bench (fully-indexed cascade)
vs. 28 / 30 in 0.1.0; bge-m3 substrate broke the ceiling.
37.74×
less context vs. raw rg agent baseline
56 K tokens from skygrep vs. 2.13 M tokens from multi-term rg on the 0.5.13 agent-context benchmark.
31.27×
higher sufficiency density for LLM context
fewer calls and denser evidence; raw rg remains the recall ceiling when huge context is acceptable.

Honesty

What the data does and does NOT show.

  • Sample sizes are measurement-grade, not study-grade. The agent tool-context benchmark is 8 generic tasks × 3 effort profiles; the multi-language benchmark is n = 30. Statistical significance is improving, but small.
  • Latency is honest, not hidden. bge-m3 inference is slower than mxbai on Tokio (+57 % per-query); the three-repo average is still −19 %. Tokio is the worst case, not the average.
  • The published benchmark is not your benchmark. 30 hand-labelled queries on three public repos isn't your codebase, your queries, your hardware. Reproduce locally.
  • Some queries are out of scope. Metadata questions ("recent files", "largest files", "all files") get routed to a hint suggesting git log / find. skygrep is content search, not file management.