Reference · CLI
Command cheatsheet — every command at a glance.
The bare form covers ~ 95 % of real-world use:
skygrep "<your question>". No subcommand, no
flags. The system auto-routes (LLM router → find /
rg / semantic cascade), auto-indexes on first query,
and auto-recovers when the embedder is upgraded (0.2.2+). The
subcommand form below is for power use, admin, and CI.
| Command | When to use | Example |
|---|---|---|
skygrep "<query>" (bare) |
Default. Just ask a question. Auto-indexes and auto-recovers. | skygrep "where is the auth refresh logic" |
skygrep search <query> |
Explicit form when you need flags (--top, --json, --agentic, …). |
skygrep search "session token" --top 20 --json |
skygrep doctor |
First-time troubleshooting. Probes Ollama, lists models, summarises the project index, checks LLM-CLI integrations. | skygrep doctor |
skygrep setup |
Register skygrep with detected LLM CLIs (Claude Code, Codex, OpenCode, Gemini CLI, Cursor) so agents prefer it and know which information depth to request. Existing managed snippets auto-refresh after upgrade. | skygrep setup · skygrep setup --uninstall |
skygrep stats |
Print chunk and file counts for the current project's index. | skygrep stats |
skygrep index [PATH] [--reset] |
Rarely needed. The first query auto-indexes; the 0.2.2 recovery worker handles embedder upgrades. Use --reset only when you actively want a clean rebuild from scratch. |
skygrep index . · skygrep index . --reset |
skygrep watch [PATH] --interval N |
Keep the index live in the background. Polls PATH every N seconds (default 5) and reindexes changed files. |
skygrep watch . · skygrep watch . -i 2 |
skygrep serve [--host H] [--port P] |
Daemon mode. Keeps the cross-encoder reranker and Ollama session warm in memory so warm queries land in 0.5–2 s. | skygrep serve --port 7878 |
skygrep enrich |
Advanced. Generate doc2query-style natural-language descriptions for each chunk (improves vocab-mismatch retrieval at index-time cost). | skygrep enrich |
First-time workflow (Python-safe, < 2 min)
# 0. confirm Python 3.9+ is available
python3 --version
# 1. install with the same Python that will own the CLI
python3 -m pip install --user skylakegrep
# 2. pull required models, one command per model
ollama pull bge-m3
ollama pull qwen2.5:3b
# 3. verify runtime, models, install path, and index state
skygrep doctor
# 4. register skygrep with your LLM CLI(s) — claude code / codex / etc.
skygrep setup
# 5. cd into any project (no manual index step needed)
cd ~/your-project
# 6. ask anything
skygrep "where does the cascade tau threshold get computed"
skygrep setup writes a managed instruction block that teaches
agents to choose the right output depth. Re-running setup refreshes that
block, and normal searches / skygrep doctor refresh older managed
blocks after an upgrade. New integrations still require explicit setup.
Information-depth examples
| Need | Command |
|---|---|
| Locate a file or folder quickly | skygrep "where is the project brief I edited recently?" |
| Show relevant snippets | skygrep --content --detail standard "what does the API migration plan say about rollback?" |
| Read deeper after narrowing | skygrep --content --detail full --include "docs/migration-plan.md" "show the deployment steps" |
| Synthesize a local answer | skygrep --answer --content "summarize the payment retry policy" |
| Feed an LLM/agent structured context | skygrep --json --content --detail standard --include "src/**" "where is token refresh implemented?" |
In 0.5.13, semantic and mixed queries also use an adaptive candidate
recall substrate before final cascade ranking. For agents, this means
--json --content --detail standard can return compact path
and evidence packs from include scope, indexed path tokens, symbols,
chunk text, and bounded ripgrep recall. The required contract is still
simple: use --include when the caller knows the scope, add
--content when the next step needs source text, and use
--detail full only after narrowing.
On macOS, python may not exist; use python3.
If the shell cannot find skygrep after install, inspect
python3 -m site --user-base, which -a skygrep,
and python3 -m pip show skylakegrep. The usual PATH fix is:
export PATH="$(python3 -m site --user-base)/bin:$PATH"
The first query auto-indexes in the background and
falls back to rg for the first ~ 1 s; subsequent
warm queries land in 0.5–2 s.
Clean uninstall / reinstall
# Remove LLM-CLI snippets written by `skygrep setup`.
skygrep setup --uninstall || true
# Remove the Python package from the Python that installed it.
python3 -m pip uninstall -y skylakegrep
# Optional: delete local indexes/config. This does not delete your files.
rm -rf ~/.skylakegrep
# Optional: remove downloaded Ollama models.
ollama rm bge-m3
ollama rm qwen2.5:3b
# Reinstall.
python3 -m pip install --user --no-cache-dir skylakegrep
ollama pull bge-m3
ollama pull qwen2.5:3b
skygrep doctor
Flags for skygrep search (and the bare form)
The full --help is on every install. Quick reference:
| Flag | Default | Effect |
|---|---|---|
-m, -n, --top | 5 | Number of final results. |
--json | off | Emit results as a JSON array; suppresses human formatting. |
--answer | off | Synthesize an answer from retrieved snippets via Ollama. |
--content / --no-content | on | Show or hide snippet bodies in human output. |
--detail brief|summary|standard|full | standard | How much body to show. full extracts content from concrete filename matches; semantic-depth filename anchors may show a bounded content preview in standard human output. Bare --detail "query" is accepted as shorthand for --detail full "query". |
--language | — | Restrict to one or more language keys; repeatable. |
--include | — | Glob; only paths matching at least one pattern are kept; repeatable. |
--exclude | — | Glob; paths matching any pattern are dropped; repeatable. |
--semantic-only | off | Skip lexical reranking; rank by cosine similarity alone. |
--agentic | off | Decompose the query into subqueries via Ollama before search. |
--max-subqueries | 3 | Upper bound on agentic subqueries. |
--hyde | off (cascade handles internally) | Force HyDE query rewriting (rarely needed; cascade escalation already uses HyDE for low-σ queries). |
--rerank | auto | Force or skip the cross-encoder rerank step. |
--cascade / --no-cascade | on | Use the σ-adaptive cascade. --no-cascade drops to chunk-level cosine. |
--lazy / --no-lazy | on (0.5.1+) | Cold-start auto-trigger: when the index is not yet built and ripgrep alone returns a path-token-weak result, a bounded LLM-routed lazy semantic tier fires and merges. The foreground path uses a small seed set, prunes hidden/vendor/dependency-cache trees before descent, and times out slow cross-folder exploration so background indexing can continue without blocking the user. Completely off when ripgrep is strong; --no-lazy opts out for benchmarking pure rg cold-start. |
--rg-shortcut / --no-rg-shortcut | on | Lexical pre-gate: small clustered rg result with path/token overlap returns directly, skipping the semantic cascade. Pass --no-rg-shortcut to force pure cascade for benchmarking. |
--filename-shortcut / --no-filename-shortcut | on (0.13+) | Filename-lookup pre-gate: queries that look like "where is foo file" / "find package.json" route to find -iname '*token*' and skip both content shortcuts. |
--explain / -x | off (0.5.8+) | The concise router lane now prints by default in human output (├─ route router: <intent> · primary_token=... · conf=... · source=...). --explain adds the one-sentence router reason, per-result via: lines showing which channel(s) contributed (cosine cascade · symbol RRF · filename-lookup · ripgrep) and the score, plus a ├─ cascade lane: summary with σ-adaptive evidence (gap=…, tau=…). All signals read straight from existing pipeline telemetry — no extra retrieval, no model calls. Use --json for machine-readable output without human routing/status lines. |
Environment variables (0.5.x)
| Variable | Default | Effect |
|---|---|---|
SKYGREP_DB_PATH | — | Override the project-scoped SQLite path. When set, --auto-index defaults to off so curated indexes aren't auto-mutated; pass --auto-index explicitly to opt in. |
SKYGREP_PROACTIVE_DIRS | — | Colon-separated absolute paths the proactive enhancers and lazy_explore_cross_folder walk when the in-cwd answer is weak. Replaces 0.5.2's hardcoded ~/Downloads:~/Desktop:~/Documents. Stale or missing dirs are silently dropped. Example: SKYGREP_PROACTIVE_DIRS="$HOME/repos:$HOME/work:/data/projects". |
SKYGREP_COLD_LAZY_TOTAL_BUDGET_S | 8.0 | Total foreground budget for cold-start lazy semantic work. Raise when you prefer waiting longer for a first semantic answer; lower when shell responsiveness matters more. |
SKYGREP_COLD_LAZY_CWD_BUDGET_S | 5.0 | Foreground budget for lazy semantic exploration inside the current project root. |
SKYGREP_COLD_LAZY_CROSS_BUDGET_S | 2.5 | Foreground budget for sibling / proactive cross-folder lazy exploration when cwd has no keyword or filename evidence. |
SKYGREP_COLD_LAZY_SEED_BUDGET | 12 | Maximum cwd seed files embedded in the first cold lazy pass. Higher can improve recall at higher latency. |
SKYGREP_COLD_CROSS_SEED_BUDGET | 3 | Maximum cross-folder seed files embedded in the foreground pass. |
SKYGREP_CROSS_FOLDER_CRAWL_BUDGET_S | 1.5 | Wall-clock crawl budget for cross-folder candidate roots before embedding starts. |
SKYGREP_CROSS_FOLDER_MAX_FILES_PER_ROOT | 2000 | Maximum files sampled from each proactive candidate root during foreground cross-folder exploration. |
SKYGREP_UI_ICONS | Nerd icons in interactive terminals, plain labels otherwise | Set to off to disable Nerd Font glyphs, or nerd to force them in captured output. Icons mark every workflow step (route, scan, seed, embed, done) and remain separate from the particle rail. |
SKYGREP_UI_RAIL | helix for interactive terminals, otherwise tree | Force the static result rail style. Use tree for the compact connector or helix to replace box connectors with a denser three-cell rotating particle field (• ·, ·•, · •, •· ) plus a slim separator line that continues through progress, result cards, and the final routing footer. |
SKYGREP_UI_ANIMATION | helix for interactive terminals | TTY-only rotating particle flow in the narrow left rail during foreground semantic waits, colored blue/cyan/white. Set to off to disable. Animation stays disabled for --json, redirected output, logs, and agent tool calls. |
SKYGREP_LLM_ROUTER_MODEL | $OLLAMA_HYDE_MODEL or qwen2.5:3b | Override which Ollama model the lazy cold-start dir-router uses. Larger models (e.g. qwen2.5:7b) typically improve hit-rate on long natural-language queries at higher latency. |
OLLAMA_URL | http://localhost:11434 | Where to reach the local Ollama daemon. |
OLLAMA_EMBED_MODEL | bge-m3 | Embedder for both warm cascade and 0.5.x lazy batch-embed. |
SKYGREP_AUTOSTART_OLLAMA | 1 (0.5.8+) | If 1 (default), skygrep search autostarts ollama serve in the background when Ollama is unreachable but the binary is on PATH, and polls until it answers (5 s budget). Set to 0 to disable and force the rule-based router fallback when Ollama is down. |
SKYGREP_OLLAMA_AUTOSTART_TIMEOUT | 5.0 (0.5.8+) | Seconds to wait for the autostarted ollama serve to answer the /api/tags probe before giving up and falling back to rule-based routing. |
SKYGREP_LLM_ROUTER_TIMEOUT_SECONDS | 8.0 (0.5.8+, was 0.5) | HTTP timeout for the LLM router's POST /api/generate call. Bumped from 0.5 s in 0.5.8 because cold qwen2.5:3b needs ~3 s to answer with format=json + num_predict=256; warm calls (keep_alive=-1 now actually working — see release notes) finish in 50–100 ms well under the new ceiling. Lower this only when running a much smaller router model. |
Reading the per-query telemetry footer (0.2.2+)
Every search prints a structured workflow footer so you can see which retrieval path answered your query and why:
╰─ done 0.42s · quality=BEST
path : cosine-cheap (high-confidence early-exit)
router : llm -> intent=mixed (0.83)
evidence : σ-gap=0.0820 ≥ τ=0.0050 (adaptive)
pool : 1 filename + 0 lexical · cascade
index : 20s ago · 36 files · L2 symbols + graph prior
Field-by-field:
path=— the retrieval strategy this query took. One of:cosine-cheap,cosine-escalated-rerank,rg-only,cascade-skipped. (The symbol-channel path will appear once the 0.3.x auto-router lands.)σ-gap=… → reason— Bayesian-evidence proxy that drove the cascade decision. High σ-gap = top-K candidates well-separated → cosine trusted, exit cheap. Low σ-gap = candidates tied → escalate to cross-encoder rerank.recovery=in-progress chunks=N/T coverage=N% ETA=Nm— only when the 0.2.2 recovery worker is active. Reads live from the metadata table.quality=BEST/quality=DEGRADED-recovery— at-a-glance "is this the full-quality semantic answer or partial during recovery".DEGRADEDonly means some files are still pending re-embed; your specific query may be unaffected.
Reference · configuration
Environment variables.
Configuration is read from environment variables on every invocation; there is no config file. Defaults assume a local Ollama server on the default port.
| Variable | Default | Effect |
|---|---|---|
OLLAMA_URL |
http://localhost:11434 |
Base URL of the Ollama server used for embeddings and (optional) generation. |
OLLAMA_EMBED_MODEL |
bge-m3 |
Embedding model name. Must be the same at index time and query time. |
OLLAMA_LLM_MODEL |
qwen2.5:3b |
Generation model used by --answer and --agentic. |
SKYGREP_DB_PATH |
~/.skylakegrep/index.db |
SQLite index location. Use distinct paths to keep per-project indexes. |
Switching OLLAMA_EMBED_MODEL after indexing produces a
dimension or semantic mismatch and will not return useful results.
Reindex with --reset when the embedding model changes.