skylakegrep CLI reference

Reference · CLI

Command cheatsheet — every command at a glance.

The bare form covers ~ 95 % of real-world use: skygrep "<your question>". No subcommand, no flags. The system auto-routes (LLM router → find / rg / semantic cascade), auto-indexes on first query, and auto-recovers when the embedder is upgraded (0.2.2+). The subcommand form below is for power use, admin, and CI.

Command When to use Example
skygrep "<query>" (bare) Default. Just ask a question. Auto-indexes and auto-recovers. skygrep "where is the auth refresh logic"
skygrep search <query> Explicit form when you need flags (--top, --json, --agentic, …). skygrep search "session token" --top 20 --json
skygrep doctor First-time troubleshooting. Probes Ollama, lists models, summarises the project index, checks LLM-CLI integrations. skygrep doctor
skygrep setup Register skygrep with detected LLM CLIs (Claude Code, Codex, OpenCode, Gemini CLI, Cursor) so agents prefer it and know which information depth to request. Existing managed snippets auto-refresh after upgrade. skygrep setup · skygrep setup --uninstall
skygrep stats Print chunk and file counts for the current project's index. skygrep stats
skygrep index [PATH] [--reset] Rarely needed. The first query auto-indexes; the 0.2.2 recovery worker handles embedder upgrades. Use --reset only when you actively want a clean rebuild from scratch. skygrep index . · skygrep index . --reset
skygrep watch [PATH] --interval N Keep the index live in the background. Polls PATH every N seconds (default 5) and reindexes changed files. skygrep watch . · skygrep watch . -i 2
skygrep serve [--host H] [--port P] Daemon mode. Keeps the cross-encoder reranker and Ollama session warm in memory so warm queries land in 0.5–2 s. skygrep serve --port 7878
skygrep enrich Advanced. Generate doc2query-style natural-language descriptions for each chunk (improves vocab-mismatch retrieval at index-time cost). skygrep enrich

First-time workflow (Python-safe, < 2 min)

# 0. confirm Python 3.9+ is available
python3 --version

# 1. install with the same Python that will own the CLI
python3 -m pip install --user skylakegrep

# 2. pull required models, one command per model
ollama pull bge-m3
ollama pull qwen2.5:3b

# 3. verify runtime, models, install path, and index state
skygrep doctor

# 4. register skygrep with your LLM CLI(s) — claude code / codex / etc.
skygrep setup

# 5. cd into any project (no manual index step needed)
cd ~/your-project

# 6. ask anything
skygrep "where does the cascade tau threshold get computed"

skygrep setup writes a managed instruction block that teaches agents to choose the right output depth. Re-running setup refreshes that block, and normal searches / skygrep doctor refresh older managed blocks after an upgrade. New integrations still require explicit setup.

Information-depth examples

NeedCommand
Locate a file or folder quicklyskygrep "where is the project brief I edited recently?"
Show relevant snippetsskygrep --content --detail standard "what does the API migration plan say about rollback?"
Read deeper after narrowingskygrep --content --detail full --include "docs/migration-plan.md" "show the deployment steps"
Synthesize a local answerskygrep --answer --content "summarize the payment retry policy"
Feed an LLM/agent structured contextskygrep --json --content --detail standard --include "src/**" "where is token refresh implemented?"

In 0.5.13, semantic and mixed queries also use an adaptive candidate recall substrate before final cascade ranking. For agents, this means --json --content --detail standard can return compact path and evidence packs from include scope, indexed path tokens, symbols, chunk text, and bounded ripgrep recall. The required contract is still simple: use --include when the caller knows the scope, add --content when the next step needs source text, and use --detail full only after narrowing.

On macOS, python may not exist; use python3. If the shell cannot find skygrep after install, inspect python3 -m site --user-base, which -a skygrep, and python3 -m pip show skylakegrep. The usual PATH fix is:

export PATH="$(python3 -m site --user-base)/bin:$PATH"

The first query auto-indexes in the background and falls back to rg for the first ~ 1 s; subsequent warm queries land in 0.5–2 s.

Clean uninstall / reinstall

# Remove LLM-CLI snippets written by `skygrep setup`.
skygrep setup --uninstall || true

# Remove the Python package from the Python that installed it.
python3 -m pip uninstall -y skylakegrep

# Optional: delete local indexes/config. This does not delete your files.
rm -rf ~/.skylakegrep

# Optional: remove downloaded Ollama models.
ollama rm bge-m3
ollama rm qwen2.5:3b

# Reinstall.
python3 -m pip install --user --no-cache-dir skylakegrep
ollama pull bge-m3
ollama pull qwen2.5:3b
skygrep doctor

Flags for skygrep search (and the bare form)

The full --help is on every install. Quick reference:

FlagDefaultEffect
-m, -n, --top5Number of final results.
--jsonoffEmit results as a JSON array; suppresses human formatting.
--answeroffSynthesize an answer from retrieved snippets via Ollama.
--content / --no-contentonShow or hide snippet bodies in human output.
--detail brief|summary|standard|fullstandardHow much body to show. full extracts content from concrete filename matches; semantic-depth filename anchors may show a bounded content preview in standard human output. Bare --detail "query" is accepted as shorthand for --detail full "query".
--languageRestrict to one or more language keys; repeatable.
--includeGlob; only paths matching at least one pattern are kept; repeatable.
--excludeGlob; paths matching any pattern are dropped; repeatable.
--semantic-onlyoffSkip lexical reranking; rank by cosine similarity alone.
--agenticoffDecompose the query into subqueries via Ollama before search.
--max-subqueries3Upper bound on agentic subqueries.
--hydeoff (cascade handles internally)Force HyDE query rewriting (rarely needed; cascade escalation already uses HyDE for low-σ queries).
--rerankautoForce or skip the cross-encoder rerank step.
--cascade / --no-cascadeonUse the σ-adaptive cascade. --no-cascade drops to chunk-level cosine.
--lazy / --no-lazyon (0.5.1+)Cold-start auto-trigger: when the index is not yet built and ripgrep alone returns a path-token-weak result, a bounded LLM-routed lazy semantic tier fires and merges. The foreground path uses a small seed set, prunes hidden/vendor/dependency-cache trees before descent, and times out slow cross-folder exploration so background indexing can continue without blocking the user. Completely off when ripgrep is strong; --no-lazy opts out for benchmarking pure rg cold-start.
--rg-shortcut / --no-rg-shortcutonLexical pre-gate: small clustered rg result with path/token overlap returns directly, skipping the semantic cascade. Pass --no-rg-shortcut to force pure cascade for benchmarking.
--filename-shortcut / --no-filename-shortcuton (0.13+)Filename-lookup pre-gate: queries that look like "where is foo file" / "find package.json" route to find -iname '*token*' and skip both content shortcuts.
--explain / -xoff (0.5.8+)The concise router lane now prints by default in human output (├─ route router: <intent> · primary_token=... · conf=... · source=...). --explain adds the one-sentence router reason, per-result via: lines showing which channel(s) contributed (cosine cascade · symbol RRF · filename-lookup · ripgrep) and the score, plus a ├─ cascade lane: summary with σ-adaptive evidence (gap=…, tau=…). All signals read straight from existing pipeline telemetry — no extra retrieval, no model calls. Use --json for machine-readable output without human routing/status lines.

Environment variables (0.5.x)

VariableDefaultEffect
SKYGREP_DB_PATHOverride the project-scoped SQLite path. When set, --auto-index defaults to off so curated indexes aren't auto-mutated; pass --auto-index explicitly to opt in.
SKYGREP_PROACTIVE_DIRSColon-separated absolute paths the proactive enhancers and lazy_explore_cross_folder walk when the in-cwd answer is weak. Replaces 0.5.2's hardcoded ~/Downloads:~/Desktop:~/Documents. Stale or missing dirs are silently dropped. Example: SKYGREP_PROACTIVE_DIRS="$HOME/repos:$HOME/work:/data/projects".
SKYGREP_COLD_LAZY_TOTAL_BUDGET_S8.0Total foreground budget for cold-start lazy semantic work. Raise when you prefer waiting longer for a first semantic answer; lower when shell responsiveness matters more.
SKYGREP_COLD_LAZY_CWD_BUDGET_S5.0Foreground budget for lazy semantic exploration inside the current project root.
SKYGREP_COLD_LAZY_CROSS_BUDGET_S2.5Foreground budget for sibling / proactive cross-folder lazy exploration when cwd has no keyword or filename evidence.
SKYGREP_COLD_LAZY_SEED_BUDGET12Maximum cwd seed files embedded in the first cold lazy pass. Higher can improve recall at higher latency.
SKYGREP_COLD_CROSS_SEED_BUDGET3Maximum cross-folder seed files embedded in the foreground pass.
SKYGREP_CROSS_FOLDER_CRAWL_BUDGET_S1.5Wall-clock crawl budget for cross-folder candidate roots before embedding starts.
SKYGREP_CROSS_FOLDER_MAX_FILES_PER_ROOT2000Maximum files sampled from each proactive candidate root during foreground cross-folder exploration.
SKYGREP_UI_ICONSNerd icons in interactive terminals, plain labels otherwiseSet to off to disable Nerd Font glyphs, or nerd to force them in captured output. Icons mark every workflow step (route, scan, seed, embed, done) and remain separate from the particle rail.
SKYGREP_UI_RAILhelix for interactive terminals, otherwise treeForce the static result rail style. Use tree for the compact connector or helix to replace box connectors with a denser three-cell rotating particle field (• ·, ·•, · •, •· ) plus a slim separator line that continues through progress, result cards, and the final routing footer.
SKYGREP_UI_ANIMATIONhelix for interactive terminalsTTY-only rotating particle flow in the narrow left rail during foreground semantic waits, colored blue/cyan/white. Set to off to disable. Animation stays disabled for --json, redirected output, logs, and agent tool calls.
SKYGREP_LLM_ROUTER_MODEL$OLLAMA_HYDE_MODEL or qwen2.5:3bOverride which Ollama model the lazy cold-start dir-router uses. Larger models (e.g. qwen2.5:7b) typically improve hit-rate on long natural-language queries at higher latency.
OLLAMA_URLhttp://localhost:11434Where to reach the local Ollama daemon.
OLLAMA_EMBED_MODELbge-m3Embedder for both warm cascade and 0.5.x lazy batch-embed.
SKYGREP_AUTOSTART_OLLAMA1 (0.5.8+)If 1 (default), skygrep search autostarts ollama serve in the background when Ollama is unreachable but the binary is on PATH, and polls until it answers (5 s budget). Set to 0 to disable and force the rule-based router fallback when Ollama is down.
SKYGREP_OLLAMA_AUTOSTART_TIMEOUT5.0 (0.5.8+)Seconds to wait for the autostarted ollama serve to answer the /api/tags probe before giving up and falling back to rule-based routing.
SKYGREP_LLM_ROUTER_TIMEOUT_SECONDS8.0 (0.5.8+, was 0.5)HTTP timeout for the LLM router's POST /api/generate call. Bumped from 0.5 s in 0.5.8 because cold qwen2.5:3b needs ~3 s to answer with format=json + num_predict=256; warm calls (keep_alive=-1 now actually working — see release notes) finish in 50–100 ms well under the new ceiling. Lower this only when running a much smaller router model.

Reading the per-query telemetry footer (0.2.2+)

Every search prints a structured workflow footer so you can see which retrieval path answered your query and why:

╰─ done   0.42s · quality=BEST
   path     : cosine-cheap (high-confidence early-exit)
   router   : llm -> intent=mixed (0.83)
   evidence : σ-gap=0.0820 ≥ τ=0.0050 (adaptive)
   pool     : 1 filename + 0 lexical · cascade
   index    : 20s ago · 36 files · L2 symbols + graph prior

Field-by-field:

  • path= — the retrieval strategy this query took. One of: cosine-cheap, cosine-escalated-rerank, rg-only, cascade-skipped. (The symbol-channel path will appear once the 0.3.x auto-router lands.)
  • σ-gap=… → reason — Bayesian-evidence proxy that drove the cascade decision. High σ-gap = top-K candidates well-separated → cosine trusted, exit cheap. Low σ-gap = candidates tied → escalate to cross-encoder rerank.
  • recovery=in-progress chunks=N/T coverage=N% ETA=Nm — only when the 0.2.2 recovery worker is active. Reads live from the metadata table.
  • quality=BEST / quality=DEGRADED-recovery — at-a-glance "is this the full-quality semantic answer or partial during recovery". DEGRADED only means some files are still pending re-embed; your specific query may be unaffected.

Reference · configuration

Environment variables.

Configuration is read from environment variables on every invocation; there is no config file. Defaults assume a local Ollama server on the default port.

VariableDefaultEffect
OLLAMA_URL http://localhost:11434 Base URL of the Ollama server used for embeddings and (optional) generation.
OLLAMA_EMBED_MODEL bge-m3 Embedding model name. Must be the same at index time and query time.
OLLAMA_LLM_MODEL qwen2.5:3b Generation model used by --answer and --agentic.
SKYGREP_DB_PATH ~/.skylakegrep/index.db SQLite index location. Use distinct paths to keep per-project indexes.
Note

Switching OLLAMA_EMBED_MODEL after indexing produces a dimension or semantic mismatch and will not return useful results. Reindex with --reset when the embedding model changes.