CLI reference · skylakegrep

Reference · CLI

Command cheatsheet — every command at a glance.

The bare form covers ~ 95 % of real-world use: skygrep "<your question>". No subcommand, no flags. The system auto-routes (LLM router → find / rg / semantic cascade), auto-indexes on first query, and auto-recovers when the embedder is upgraded (0.2.2+). The subcommand form below is for power use, admin, and CI.

Command	When to use	Example
`skygrep "<query>"` (bare)	Default. Just ask a question. Auto-indexes and auto-recovers.	`skygrep "where is the auth refresh logic"`
`skygrep search <query>`	Explicit form when you need flags (`--top`, `--json`, `--agentic`, …).	`skygrep search "session token" --top 20 --json`
`skygrep doctor`	First-time troubleshooting. Probes Ollama, lists models, summarises the project index, checks LLM-CLI integrations.	`skygrep doctor`
`skygrep setup`	Register skygrep with detected LLM CLIs (Claude Code, Codex, OpenCode, Gemini CLI, Cursor) so agents prefer it and know which information depth to request. Existing managed snippets auto-refresh after upgrade.	`skygrep setup` · `skygrep setup --uninstall`
`skygrep stats`	Print chunk and file counts for the current project's index.	`skygrep stats`
`skygrep index [PATH] [--reset]`	Rarely needed. The first query auto-indexes; the 0.2.2 recovery worker handles embedder upgrades. Use `--reset` only when you actively want a clean rebuild from scratch.	`skygrep index .` · `skygrep index . --reset`
`skygrep watch [PATH] --interval N`	Keep the index live in the background. Polls `PATH` every `N` seconds (default 5) and reindexes changed files.	`skygrep watch .` · `skygrep watch . -i 2`
`skygrep serve [--host H] [--port P]`	Daemon mode. Keeps the cross-encoder reranker and Ollama session warm in memory so warm queries land in 0.5–2 s.	`skygrep serve --port 7878`
`skygrep enrich`	Advanced. Generate doc2query-style natural-language descriptions for each chunk (improves vocab-mismatch retrieval at index-time cost).	`skygrep enrich`

First-time workflow (Python-safe, < 2 min)

# 0. confirm Python 3.9+ is available
python3 --version

# 1. install with the same Python that will own the CLI
python3 -m pip install --user skylakegrep

# 2. pull required models, one command per model
ollama pull bge-m3
ollama pull qwen2.5:3b

# 3. verify runtime, models, install path, and index state
skygrep doctor

# 4. register skygrep with your LLM CLI(s) — claude code / codex / etc.
skygrep setup

# 5. cd into any project (no manual index step needed)
cd ~/your-project

# 6. ask anything
skygrep "where does the cascade tau threshold get computed"

skygrep setup writes a managed instruction block that teaches agents to choose the right output depth. Re-running setup refreshes that block, and normal searches / skygrep doctor refresh older managed blocks after an upgrade. New integrations still require explicit setup.

Information-depth examples

Need	Command
Locate a file or folder quickly	`skygrep "where is the project brief I edited recently?"`
Show relevant snippets	`skygrep --content --detail standard "what does the API migration plan say about rollback?"`
Read deeper after narrowing	`skygrep --content --detail full --include "docs/migration-plan.md" "show the deployment steps"`
Synthesize a local answer	`skygrep --answer --content "summarize the payment retry policy"`
Feed an LLM/agent structured context	`skygrep --json --content --detail standard --include "src/**" "where is token refresh implemented?"`

Option playbook for humans and agents

Choose options by the answer depth the next step needs. The best call is usually the shallowest call that can produce enough evidence.

Problem shape	Use	Why
Where is X? / Which file handles X?	`skygrep --agent-fast "where is token refresh implemented?"`	Path-only, high-recall anchors; cheap first pass for agents.
What does X say about Y?	`skygrep --agent-context "what does the migration plan say about rollback?"`	Fast first-pass snippets and line ranges without dumping full files. Re-run without the preset only if rerank is needed for ambiguity.
Read this known file/folder deeply	`skygrep --content --detail full --include "docs/migration-plan.md" "show the deployment steps"`	Full depth only after scope is known; avoids repo-wide context blowups.
Summarize / answer from local evidence	`skygrep --answer --content "summarize the payment retry policy"`	Retrieves evidence first, then synthesizes locally through Ollama.
An LLM/agent will consume this	`skygrep --agent-context --include "src/**" "where is token refresh implemented?"`	Machine-readable, compact, and scoped; do not scrape human terminal output.
Several implementation files may matter	`skygrep --agent-fast "where is request routing assembled?"`	Separate path discovery from file reading; then read returned candidates directly.
The query is broad or noisy	Add `--include`, `--exclude`, `--language`, or run from the relevant project root.	Scope is the largest latency and accuracy lever.
I need to audit routing	`skygrep --explain "why is this policy selected?"`	Shows router intent, contributing lanes, and cascade evidence.
I need exact regex output	Use `rg` directly.	`skygrep` is for natural-language search, not regex authoring.

Closed-loop agent policy: start with skygrep --agent-fast for implementation-location questions, use skygrep --agent-context when the next reasoning step needs source text, add --include whenever the scope is known, and reserve --detail full for a narrowed file/folder or parsed documents where direct reads are unavailable. --agent-context automatically fuses path tokens, symbols, bounded ripgrep recall, source-type priors, compact chunk evidence, and confidence summaries. Use --answer only when the user asked for synthesis; use bounded rg -l or targeted rg only for exact lexical/regex needs or raw grep output.

For repeated GPT / Cloud Code / Superconductor-style tool calls, keep the process warm with skygrep serve --port 7878 and call skygrep --agent-daemon --agent-fast ... or skygrep --agent-daemon --agent-context .... --agent-daemon uses SKYGREP_DAEMON_URL when set, otherwise http://127.0.0.1:7878, and falls back in-process if no daemon is running.

Agent presets default to rule-based routing, no cascade, and confidence-aware first-pass evidence for bounded latency. Low-confidence JSON includes agent_summary, why_ranked, and a scoped follow-up probe. Add --llm-router or --cascade only when ambiguity is worth paying a local model call or deeper semantic refinement.

In 0.5.15, this policy is now first-class CLI UX rather than only a documented playbook: --agent-fast expands to JSON path anchors, --agent-context expands to compact JSON snippets, and --agent-daemon makes daemon-first repeated calls explicit. The underlying router is unchanged: path-only probes return compact JSON anchors, snippet probes return bounded evidence for the next LLM step, and daemon-first repeated calls avoid paying process startup and model warmup on every tool invocation.

In 0.5.16, those agent presets are bounded in the runtime as well as in the documentation. Machine-readable agent calls default away from the local LLM router and cascade unless explicitly requested, use shorter model budgets, skip heavyweight foreground refresh lanes, keep background indexing alive, return top-8 compact evidence by default, and filter low-confidence JSON candidates before an LLM consumes them.

In 0.5.17, --agent-context closes the old manual-fallback gap by treating bounded ripgrep, path tokens, symbols, chunk text, source-type priors, and symbol anchors as one hybrid recall substrate. JSON callers get optional agent_summary, why_ranked, evidence_bundle, source_type, and evidence_terms fields while the top-level schema remains a list of result objects for backward compatibility.

In 0.5.13, semantic and mixed queries also use an adaptive candidate recall substrate before final cascade ranking. For agents, this means --json --content --detail standard can return compact path and evidence packs from include scope, indexed path tokens, symbols, chunk text, and bounded ripgrep recall. The required contract is still simple: use --include when the caller knows the scope, add --content when the next step needs source text, and use --detail full only after narrowing.

On macOS, python may not exist; use python3. If the shell cannot find skygrep after install, inspect python3 -m site --user-base, which -a skygrep, and python3 -m pip show skylakegrep. The usual PATH fix is:

export PATH="$(python3 -m site --user-base)/bin:$PATH"

The first query auto-indexes in the background and falls back to rg for the first ~ 1 s; subsequent warm queries land in 0.5–2 s.

Clean uninstall / reinstall

# Remove LLM-CLI snippets written by `skygrep setup`.
skygrep setup --uninstall || true

# Remove the Python package from the Python that installed it.
python3 -m pip uninstall -y skylakegrep

# Optional: delete local indexes/config. This does not delete your files.
rm -rf ~/.skylakegrep

# Optional: remove downloaded Ollama models.
ollama rm bge-m3
ollama rm qwen2.5:3b

# Reinstall.
python3 -m pip install --user --no-cache-dir skylakegrep
ollama pull bge-m3
ollama pull qwen2.5:3b
skygrep doctor

Flags for `skygrep search` (and the bare form)

The full --help is on every install. Quick reference:

Flag	Default	Effect
`-m`, `-n`, `--top`	5	Number of final results.
`--json`	off	Emit results as a JSON array; suppresses human formatting.
`--answer`	off	Synthesize an answer from retrieved snippets via Ollama.
`--content / --no-content`	on	Show or hide snippet bodies in human output.
`--detail brief\|summary\|standard\|full`	standard	How much body to show. `full` extracts content from concrete filename matches; semantic-depth filename anchors may show a bounded content preview in standard human output. Bare `--detail "query"` is accepted as shorthand for `--detail full "query"`.
`--language`	—	Restrict to one or more language keys; repeatable.
`--include`	—	Glob; only paths matching at least one pattern are kept; repeatable.
`--exclude`	—	Glob; paths matching any pattern are dropped; repeatable.
`--semantic-only`	off	Skip lexical reranking; rank by cosine similarity alone.
`--agentic`	off	Decompose the query into subqueries via Ollama before search.
`--max-subqueries`	3	Upper bound on agentic subqueries.
`--hyde`	off (cascade handles internally)	Force HyDE query rewriting (rarely needed; cascade escalation already uses HyDE for low-σ queries).
`--rerank`	auto	Force or skip the cross-encoder rerank step.
`--cascade / --no-cascade`	on	Use the σ-adaptive cascade. `--no-cascade` drops to chunk-level cosine.
`--lazy / --no-lazy`	on (0.5.1+)	Cold-start auto-trigger: when the index is not yet built and ripgrep alone returns a path-token-weak result, a bounded LLM-routed lazy semantic tier fires and merges. The foreground path uses a small seed set, prunes hidden/vendor/dependency-cache trees before descent, and times out slow cross-folder exploration so background indexing can continue without blocking the user. Completely off when ripgrep is strong; `--no-lazy` opts out for benchmarking pure rg cold-start.
`--rg-shortcut / --no-rg-shortcut`	on	Lexical pre-gate: small clustered rg result with path/token overlap returns directly, skipping the semantic cascade. Pass `--no-rg-shortcut` to force pure cascade for benchmarking.
`--filename-shortcut / --no-filename-shortcut`	on (0.13+)	Filename-lookup pre-gate: queries that look like "where is foo file" / "find package.json" route to `find -iname 'token'` and skip both content shortcuts.
`--explain / -x`	off (0.5.8+)	The concise router lane now prints by default in human output (`├─ route router: <intent> · primary_token=... · conf=... · source=...`). `--explain` adds the one-sentence router reason, per-result `via:` lines showing which channel(s) contributed (cosine cascade · symbol RRF · filename-lookup · ripgrep) and the score, plus a `├─ cascade lane:` summary with σ-adaptive evidence (`gap=…, tau=…`). All signals read straight from existing pipeline telemetry — no extra retrieval, no model calls. Use `--json` for machine-readable output without human routing/status lines.

Environment variables (0.5.x)

Variable	Default	Effect
`SKYGREP_DB_PATH`	—	Override the project-scoped SQLite path. When set, `--auto-index` defaults to off so curated indexes aren't auto-mutated; pass `--auto-index` explicitly to opt in.
`SKYGREP_PROACTIVE_DIRS`	—	Colon-separated absolute paths the proactive enhancers and `lazy_explore_cross_folder` walk when the in-cwd answer is weak. Replaces 0.5.2's hardcoded `~/Downloads:~/Desktop:~/Documents`. Stale or missing dirs are silently dropped. Example: `SKYGREP_PROACTIVE_DIRS="$HOME/repos:$HOME/work:/data/projects"`.
`SKYGREP_COLD_LAZY_TOTAL_BUDGET_S`	`8.0`	Total foreground budget for cold-start lazy semantic work. Raise when you prefer waiting longer for a first semantic answer; lower when shell responsiveness matters more.
`SKYGREP_COLD_LAZY_CWD_BUDGET_S`	`5.0`	Foreground budget for lazy semantic exploration inside the current project root.
`SKYGREP_COLD_LAZY_CROSS_BUDGET_S`	`2.5`	Foreground budget for sibling / proactive cross-folder lazy exploration when cwd has no keyword or filename evidence.
`SKYGREP_COLD_LAZY_SEED_BUDGET`	`12`	Maximum cwd seed files embedded in the first cold lazy pass. Higher can improve recall at higher latency.
`SKYGREP_COLD_CROSS_SEED_BUDGET`	`3`	Maximum cross-folder seed files embedded in the foreground pass.
`SKYGREP_CROSS_FOLDER_CRAWL_BUDGET_S`	`1.5`	Wall-clock crawl budget for cross-folder candidate roots before embedding starts.
`SKYGREP_CROSS_FOLDER_MAX_FILES_PER_ROOT`	`2000`	Maximum files sampled from each proactive candidate root during foreground cross-folder exploration.
`SKYGREP_UI_ICONS`	Nerd icons in interactive terminals, plain labels otherwise	Set to `off` to disable Nerd Font glyphs, or `nerd` to force them in captured output. Icons mark every workflow step (`route`, `scan`, `seed`, `embed`, `done`) and remain separate from the particle rail.
`SKYGREP_UI_RAIL`	`helix` for interactive terminals, otherwise `tree`	Force the static result rail style. Use `tree` for the compact connector or `helix` to replace box connectors with a denser three-cell rotating particle field (`• ·`, `·•`, `· •`, `•·` ) plus a slim separator line that continues through progress, result cards, and the final routing footer.
`SKYGREP_UI_ANIMATION`	`helix` for interactive terminals	TTY-only rotating particle flow in the narrow left rail during foreground semantic waits, colored blue/cyan/white. Set to `off` to disable. Animation stays disabled for `--json`, redirected output, logs, and agent tool calls.
`SKYGREP_LLM_ROUTER_MODEL`	`$OLLAMA_HYDE_MODEL` or `qwen2.5:3b`	Override which Ollama model the lazy cold-start dir-router uses. Larger models (e.g. `qwen2.5:7b`) typically improve hit-rate on long natural-language queries at higher latency.
`OLLAMA_URL`	`http://localhost:11434`	Where to reach the local Ollama daemon.
`OLLAMA_EMBED_MODEL`	`bge-m3`	Embedder for both warm cascade and 0.5.x lazy batch-embed.
`SKYGREP_AUTOSTART_OLLAMA`	`1` (0.5.8+)	If `1` (default), `skygrep search` autostarts `ollama serve` in the background when Ollama is unreachable but the binary is on PATH, and polls until it answers (5 s budget). Set to `0` to disable and force the rule-based router fallback when Ollama is down.
`SKYGREP_OLLAMA_AUTOSTART_TIMEOUT`	`5.0` (0.5.8+)	Seconds to wait for the autostarted `ollama serve` to answer the `/api/tags` probe before giving up and falling back to rule-based routing.
`SKYGREP_LLM_ROUTER_TIMEOUT_SECONDS`	`8.0` (0.5.8+, was `0.5`)	HTTP timeout for the LLM router's `POST /api/generate` call. Bumped from 0.5 s in 0.5.8 because cold qwen2.5:3b needs ~3 s to answer with `format=json + num_predict=256`; warm calls (`keep_alive=-1` now actually working — see release notes) finish in 50–100 ms well under the new ceiling. Lower this only when running a much smaller router model.
`SKYGREP_AGENT_ROUTER_TIMEOUT_S`	`1.5` (0.5.16+)	Router budget for JSON / `--agent-fast` / `--agent-context` / `--agent-mode deep` calls. If the local router model is slow, skygrep falls back instead of blocking the agent loop.
`SKYGREP_AGENT_MODEL_TIMEOUT_S`	`3.0` (0.5.16+)	Per-call local-model budget for machine-readable agent contexts.
`SKYGREP_AGENT_CASCADE_TIMEOUT_S`	`8.0` (0.5.16+)	Foreground cascade budget for machine-readable agent contexts. Background indexing can continue after the bounded response returns.
`SKYGREP_FOREGROUND_MODEL_TIMEOUT_S`	`12.0` (0.5.16+)	Interactive human-output local-model budget. Human searches can wait longer than agent tool calls because the terminal displays progress.
`SKYGREP_CASCADE_TIMEOUT_S`	`30.0` (0.5.16+)	Interactive human-output foreground cascade budget before reporting that background indexing will continue.
`SKYGREP_AGENT_MIN_EVIDENCE_SCORE`	`0.50` (0.5.16+)	Minimum score for machine-readable agent results. Raise it to reduce low-confidence context; lower it only when recall matters more than context cleanliness.

Reading the per-query telemetry footer (0.2.2+)

Every search prints a structured workflow footer so you can see which retrieval path answered your query and why:

╰─ done   0.42s · quality=BEST
   path     : cosine-cheap (high-confidence early-exit)
   router   : llm -> intent=mixed (0.83)
   evidence : σ-gap=0.0820 ≥ τ=0.0050 (adaptive)
   pool     : 1 filename + 0 lexical · cascade
   index    : 20s ago · 36 files · L2 symbols + graph prior

Field-by-field:

path= — the retrieval strategy this query took. One of: cosine-cheap, cosine-escalated-rerank, rg-only, cascade-skipped. (The symbol-channel path will appear once the 0.3.x auto-router lands.)
σ-gap=… → reason — Bayesian-evidence proxy that drove the cascade decision. High σ-gap = top-K candidates well-separated → cosine trusted, exit cheap. Low σ-gap = candidates tied → escalate to cross-encoder rerank.
recovery=in-progress chunks=N/T coverage=N% ETA=Nm — only when the 0.2.2 recovery worker is active. Reads live from the metadata table.
quality=BEST / quality=DEGRADED-recovery — at-a-glance "is this the full-quality semantic answer or partial during recovery". DEGRADED only means some files are still pending re-embed; your specific query may be unaffected.

Reference · configuration

Environment variables.

Configuration is read from environment variables on every invocation; there is no config file. Defaults assume a local Ollama server on the default port.

Variable	Default	Effect
`OLLAMA_URL`	`http://localhost:11434`	Base URL of the Ollama server used for embeddings and (optional) generation.
`OLLAMA_EMBED_MODEL`	`bge-m3`	Embedding model name. Must be the same at index time and query time.
`OLLAMA_LLM_MODEL`	`qwen2.5:3b`	Generation model used by `--answer` and `--agentic`.
`SKYGREP_DB_PATH`	`~/.skylakegrep/index.db`	SQLite index location. Use distinct paths to keep per-project indexes.

Note

Switching OLLAMA_EMBED_MODEL after indexing produces a dimension or semantic mismatch and will not return useful results. Reindex with --reset when the embedding model changes.