v0.5.13 · 100% local · PolyForm Noncommercial 1.0.0

Find anything on your machine.

Semantic search for code, PDFs, notes, and docs. Fully offline. No cloud. No telemetry. No subscription. Ask in plain English — or any of 100+ languages — get back the right file and line range in about a second, even when the working directory isn't the right project.

$skygrep "where does the auth token get refreshed?"

═══ auth/middleware.py:78-94score 0.91 · python

async def renew_session(req: Request):
    # swap the access cookie when the refresh JWT is still valid
    if req.cookies.get("rt") and access_expired(req):
        return await refresh_token(claims, key)

$skygrep "the design doc on rate limiter rewrite"

═══ docs/rate-limiter-redesign.md:1-12score 0.87 · markdown

# Rate limiter redesign — Q3
We replace the leaky-bucket implementation with a sliding-window
log-based approach. [[wiki-link to design-2024-Q3]]
═══ designs/rate-limiter-2024Q3.pdf:p4  score 0.83 · pdf
Throughput projections under sustained 5k QPS load...

$skygrep "我昨天写的 cascade 调度代码"

═══ src/storage.py:847-892score 0.89 · python

def cascade_search(conn, query_embedding, *, ...):
    # σ-adaptive early-exit threshold (MacKay/Williams)
    tau_eff = max(CASCADE_TAU_FLOOR,
                  CASCADE_K_SIGMA * sigma_topK)

$cd /tmp/scratch && skygrep "where does the parallel umbrella dispatch?"

═══ ~/code/skylakegrep/src/cli.py:912-928score 0.86 · python · home-dir lazy hit

with ThreadPoolExecutor(max_workers=4) as ex:
    # cascade ‖ filename_extend ‖ lazy_cwd ‖ lazy_cross_folder
    cascade_fut   = ex.submit(_run_cascade, query, db_path)
    proactive_fut = ex.submit(_run_proactive, query, db_path)
    return _stream_first_confident(cascade_fut, proactive_fut)

Install in 30 seconds → How it works See benchmarks

30 / 30 public-OSS recall · 37.74× less agent context vs rg · ~1.1 s wrong-path first answer · ~1 s warm queries · 100 % local · 45 releases shipped

Three ways people use it

Show, don't tell.

Three real-shaped queries that demonstrate the value across content types — code, docs, multilingual.

CODE BY CONCEPT

Find code by what it does, not what it's called.

You ask in plain English. The semantic substrate (bge-m3) bridges your phrasing to the actual identifier — even when the function name uses different words from the query.

$skygrep "where does session refresh logic live?"

→ auth/middleware.py:78 · renew_session()

No rg hit for "session refresh"; semantic retrieval bridges to renew_session() from the project index.

CROSS-CONTENT

One query across code, PDFs, notes, and docs.

Markdown, PDF, Word, plain text — all indexed via the same content-agnostic substrate. Your query searches all of them at once, ranked by semantic relevance.

$skygrep "the design doc on rate limiter rewrite"

→ docs/rate-limiter.md · designs/q3-rewrite.pdf

Markdown link graph + PDF text-layer extraction in one cascade.

MULTILINGUAL · PRIVATE

Any language. Files never leave your laptop.

bge-m3 understands 100+ languages out of the box. Index, retrieval, ranking, optional answer synthesis — all run locally via Ollama. Zero network calls.

$skygrep "我昨天写的 cascade 调度代码"

→ src/storage.py:847 · cascade_search()

Mixed Chinese / English query. Zero network. Audit-friendly.

Command examples

Ask for the depth you need.

The bare command is intentionally fast for location and concept lookup. Add depth only when you need file contents, synthesis, or structured context for another agent. These examples are command patterns: run them from the relevant project root, or add --include / --lexical-root when an agent already knows the scope.

LOCATION

Find the right file first.

Use the bare form for where is..., find..., and which file... questions. The router can answer from filename, metadata, lexical, or semantic evidence without making you pick a lane.

$skygrep "where is the project brief I edited recently?"

→ docs/project-brief.md · scoped metadata / filename evidence

Fast path: return the path, skip unnecessary content reading.

CONTENT

Show the source text that supports the answer.

Add --content when the next step depends on what the file says, not only where the file is.

$skygrep --content --detail standard "what does the API migration plan say about rollback?"

→ docs/migration-plan.md:42-58 · relevant snippet

Good default for human review and agent context.

DEEP READ

Give another LLM structured context.

Agents should prefer --json over scraping terminal output. Include a scope when the caller knows one; this avoids broad home-folder exploration and gives the next LLM compact, relevant evidence. Add --answer only when you want local synthesis instead of source evidence.

$skygrep --json --content --detail standard --include "src/**" "where is token refresh implemented?"

$skygrep --answer --content --include "docs/**" "summarize the payment retry policy"

→ compact records with paths, scores, snippets, and route metadata

Lower token cost and less ambiguity for Claude Code, Codex, OpenCode, and other agents.

New in 0.5.x

Four qualitative leaps since 0.4.

The through-line: less ceremony from you, more intelligence from the tool. No `skygrep index .` to run, no need to be in the right folder, no silent stalls.

🚀 NO SETUP NEEDED

Just ask — no `skygrep index .`.

The first query in any fresh repo works. A background process builds the semantic index while a rg fallback handles your first turn; from the second query on, the full cascade is online.

$cd /path/to/brand-new-project

$skygrep "how does auth handle expired tokens?"

→ src/auth/token.py:140 · refresh_or_redirect()

Cold-start vocabulary-mismatch hit-rate 0/10 → 4/10 over plain rg on the Django oracle bench (0.5.3, real-CLI verified).

🧭 WRONG FOLDER · NO PROBLEM

Smart from the wrong folder.

Run skygrep from /tmp and ask about a real project. The router dispatches two retrieval lanes in parallel; a proactive umbrella that searches sibling roots in SKYGREP_PROACTIVE_DIRS can answer before the cascade has time to run its first rerank.

$cd /tmp/scratch

$skygrep "where does the parallel umbrella dispatch?"

→ ~/code/skylakegrep/src/cli.py:912 · cascade ‖ proactive umbrella

Wrong-cwd discovery is bounded; use SKYGREP_PROACTIVE_DIRS or an explicit scope when an agent knows where to look.

🧠 STREAMING ROUTING

Honest about what's pending.

Each query is classified by a local LLM router (qwen2.5:3b) for intent / scope / primary token, then dispatched to multiple lanes in parallel. Results land tagged with the route they came from and the still-searching status of the others — never silent.

$skygrep "the design doc on rate limiter rewrite"

├─ proactive umbrella · filename glob

│ cascade still searching

═══ docs/rate-limiter-redesign.md:1

Confidence-streaming: results stream as they're ready, with the route they came from. Each answer's provenance is auditable.

🔍 ROUTING TRANSPARENCY · 0.5.8+

The active router lane is visible by default; `--explain` adds the deeper why.

The first human-output line now shows the active router lane (├─ route router: semantic · primary_token=... · conf=... · source=...). Foreground semantic waits use a TTY-only rotating particle flow in the narrow left workflow rail by default; set SKYGREP_UI_ANIMATION=off to disable it. The result rail stays compact and copyable; set SKYGREP_UI_RAIL=helix if you want the rail itself to use a denser three-cell rotating particle field (• ·, ·•, · •, •· ) plus a slim separator line instead of box connectors; it continues through progress, result cards, and the final footer. Interactive terminals show Nerd Font step icons by default; set SKYGREP_UI_ICONS=off to disable them. JSON and redirected output stay stable. Pass --explain when you need the deeper audit trail: a one-sentence reason, per-result via: lines showing which channel(s) contributed (cosine cascade · symbol RRF · filename-lookup · ripgrep) and the score, and a cascade-lane summary at the bottom with the σ-adaptive evidence (gap=…, tau=…). No new model calls, no extra retrieval — every signal was already in the pipeline.

$skygrep "what does Project_Report say about retries"

├─ route router: semantic · primary_token="Project_Report" · conf=0.81 · source=fast-intent

├─ seed preliminary filename anchors + keyword matches (lazy semantic refinement starting)

═══ Project_Report.txt · bounded content preview

├─ embed lazy refinement continues in the same invocation

0.5.13: candidate recall keeps likely files visible before cascade, while content/agent calls get compact same-file support evidence.

Why skylakegrep

vs. ripgrep · mgrep · autodev-codebase · Cody.

What makes skylakegrep distinct in the local-search landscape — sized against four named alternatives, not generic categories.

comparison matrix · 2026

skylakegrep v0.5.13

ripgreplexical baseline

mgrepMixedbread · paid

autodev-codebaseOllama · OSS

Sourcegraph Codycloud commercial

Find by concept, not just token

✓

✗

✓

Privacy — no data egress

✓

✗ cloud-backed

✓

✗ cloud index

Content — multimodal

code · md · PDF · docx

text only

code · text · PDF · img

code-first

Setup

pip install

brew install

npm + Mixedbread acct

npm + Ollama

account + sub

Cost

$ 0 / mo

sub + usage-based

$ 0 / mo

$ 20 – 100+ / mo

Multilingual queries (NL → code id)

bge-m3 native

n/a

cloud embedder

embedder-dep.

supported

data: each project's public site / README · 2026-05 · positions are descriptive, not endorsements

How it works

Router → two retrieval lanes race in parallel.

After the router classifies your intent, a σ-adaptive cosine cascade and a parallel proactive umbrella search at the same time — not one after the other. The first confident answer streams to your terminal; ranked refinements arrive as later lanes finish. All against your local Ollama + SQLite. Zero network.

LLM router

qwen2.5:3b · ~50 ms

Classifies intent (filename / lexical / semantic / mixed), scope (content / recency / size), and primary token. Persistent SQLite cache; same query never pays the LLM cost twice.

→

Cosine cascade

bge-m3 · 0.5 – 2 s

σ-adaptive cascade: high-confidence queries early-exit on cheap file-mean cosine; uncertain ones escalate to HyDE + cross-encoder rerank. Bayesian-evidence framing (τ_eff = max(τ_floor, k · σ_topK)).

→

Proactive umbrella

filename_extend · lazy_cwd · lazy_cross_folder · streaming dispatcher

Runs concurrent with the cascade, not after. Four tiers race at the same time: filename_extend for fast filename matching, lazy_cwd for auto-indexing the current folder, lazy_cross_folder for sibling roots in SKYGREP_PROACTIVE_DIRS, and a streaming dispatcher that posts the first confident answer. Hard caps: 30 s cascade, 8 s cross-folder. Wrong-path queries answer in ~1 s.

Architecture deep-dive → Concepts: indexing · ranking · output →

vs Elasticsearch

Different niche, different design.

Elasticsearch is a multi-tenant, TB-scale, distributed search engine for data centers. skylakegrep is a single-user, single-machine, zero-ops CLI for a developer asking their own laptop a question. Both can be called "search engines"; they answer different problems.

capability matrix · skylakegrep 0.5.x vs Elasticsearch 8.x

	skylakegrep 0.5.x	Elasticsearch
Setup	`python3 -m pip install --user skylakegrep`; cold-start lazy auto-trigger	JVM, cluster, mappings, ingest pipeline, dense-vector plugin, reindex
Semantic retrieval	bge-m3 (1024-d, 100+ languages) via local Ollama, out of the box	Manual: pick embedder, pipeline, dimension, reindex
Intent understanding	qwen2.5:3b LLM router classifies intent / scope / primary_token per query	None natively; you write query DSL by hand
Code AST awareness	tree-sitter symbol channel, RRF-fused with cosine	None; code is plain text
Cold-start / wrong-folder	lazy_cwd + lazy_cross_folder 4-lane parallel umbrella, ~1.1 s	Empty index = 0 results
Why-this-matched explainability	`--explain` shows router rationale + channel breakdown + lane evidence	BM25 highlight only
Cross-file context	reference-graph PageRank tiebreak	None
Privacy / offline	100 % local by design	Index can be local, but most embeddings are external API calls
Latency p95 (50k-file repo)	0.3 – 1.1 s including LLM router	ms-level after you've paid the operational cost
Scale	single-machine, single-repo sweet spot	billions of docs, multi-shard, distributed
Multi-tenant / ACL	not designed for this	first-class
Aggregations / facets	not designed for this	first-class
Operational cost	zero (no daemon, no GC, no shard rebalance)	non-trivial (GC, heap, shard rebalance, monitoring)

Installation

Install the CLI and a local embedding model.

Package install

python3 --version
python3 -m pip install --user skylakegrep
skygrep doctor

macOS often ships without a python command. Use python3 -m pip so the package, script, and diagnostics all refer to the same Python installation. If skygrep is not on PATH after install, inspect python3 -m site --user-base, which -a skygrep, and python3 -m pip show skylakegrep.

export PATH="$(python3 -m site --user-base)/bin:$PATH"

From source

git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Ollama runtime

Install Ollama from ollama.com and pull at least one embedding model:

ollama pull bge-m3 # default; 1024-dim, multilingual, symmetric
# alternatives:
ollama pull mxbai-embed-large # 1024-dim, English/code only
ollama pull nomic-embed-text # 768-dim, smaller

Pull the default generation/router model separately. Ollama expects one model per ollama pull command:

ollama pull qwen2.5:3b

Clean reinstall

skygrep setup --uninstall || true
python3 -m pip uninstall -y skylakegrep
rm -rf ~/.skylakegrep        # optional: deletes only skygrep indexes/config
ollama rm bge-m3             # optional
ollama rm qwen2.5:3b         # optional
python3 -m pip install --user --no-cache-dir skylakegrep
ollama pull bge-m3
ollama pull qwen2.5:3b
skygrep doctor

Register agent instructions once after install. The managed snippet teaches Claude Code, Codex, OpenCode, Gemini CLI, and Cursor to choose the right depth: bare skygrep for location/concept lookup, --content for snippets, --detail full only after narrowing, --answer for local synthesis, and --json plus --include for machine-readable agent calls when the scope is known. Re-running skygrep setup refreshes the managed block when these instructions improve. After an upgrade, normal skygrep searches and skygrep doctor also refresh already-registered managed blocks automatically; new integrations still require explicit setup.

skygrep setup

Quickstart

Index a repository, then query it.

skygrep index /path/to/repo --reset
skygrep search "where is token refresh implemented?" -m 10

The first command walks the repository, chunks supported source files, embeds each chunk via Ollama, and writes the result to ~/.skylakegrep/index.db. Subsequent runs re-index only files whose modification time has changed and remove rows for files that no longer exist.

For machine-readable output, add --json:

skygrep search "where is token refresh implemented?" -m 10 --json

Performance

30 / 30 fully-indexed · 31× denser agent context.

Reproducible against three popular codebases — Django, React, Tokio. 30 hand-labelled questions, 10 per repo for the fully-indexed cascade. 0.5.13 adds an agent tool-context benchmark that measures whether one structured skygrep --json --content call gives the next LLM enough compact evidence compared with a raw rg agent that runs multiple term searches. Every number on this page is reproducible from public commands and generic task fixtures.

30 / 30

recall on public-OSS bench (fully-indexed cascade)

vs. 28 / 30 in 0.1.0; bge-m3 substrate broke the ceiling.

37.74×

less context vs. raw rg agent baseline

56 K tokens from skygrep vs. 2.13 M tokens from multi-term rg on the 0.5.13 agent-context benchmark.

31.27×

higher sufficiency density for LLM context

fewer calls and denser evidence; raw rg remains the recall ceiling when huge context is acceptable.

Read all 5 benchmarks → Reproduce in 3 commands

Honesty

What the data does and does NOT show.

Sample sizes are measurement-grade, not study-grade. The agent tool-context benchmark is 8 generic tasks × 3 effort profiles; the multi-language benchmark is n = 30. Statistical significance is improving, but small.
Latency is honest, not hidden. bge-m3 inference is slower than mxbai on Tokio (+57 % per-query); the three-repo average is still −19 %. Tokio is the worst case, not the average.
The published benchmark is not your benchmark. 30 hand-labelled queries on three public repos isn't your codebase, your queries, your hardware. Reproduce locally.
Some queries are out of scope. Metadata questions ("recent files", "largest files", "all files") get routed to a hint suggesting git log / find. skygrep is content search, not file management.

Find anything on your machine.

Show, don't tell.

Find code by what it does, not what it's called.

One query across code, PDFs, notes, and docs.

Any language. Files never leave your laptop.

Ask for the depth you need.

Find the right file first.

Show the source text that supports the answer.

Read more after narrowing the target.

Give another LLM structured context.

Four qualitative leaps since 0.4.

Just ask — no skygrep index ..

Smart from the wrong folder.

Honest about what's pending.

The active router lane is visible by default; --explain adds the deeper why.

vs. ripgrep · mgrep · autodev-codebase · Cody.

Router → two retrieval lanes race in parallel.

Different niche, different design.

Install the CLI and a local embedding model.

Package install

From source

Ollama runtime

Clean reinstall

Index a repository, then query it.

30 / 30 fully-indexed · 31× denser agent context.

What the data does and does NOT show.

Just ask — no `skygrep index .`.

The active router lane is visible by default; `--explain` adds the deeper why.