Server data from the Official MCP Registry
Semantic code retrieval engine with hybrid search, context expansion, and 40+ language support.
Semantic code retrieval engine with hybrid search, context expansion, and 40+ language support.
ContextWeaver is a well-architected semantic code retrieval engine for AI assistants with generally sound security practices. The codebase demonstrates good error handling, proper environment variable management for sensitive credentials, and appropriate permission scoping for its purpose as a developer tool. Minor code quality findings around broad exception handling and input validation do not significantly impact the security posture. The server's permissions (file I/O, network for embeddings API, environment variables) align with its stated purpose of indexing codebases and retrieving context for LLMs. Supply chain analysis found 2 known vulnerabilities in dependencies (0 critical, 2 high severity). Package verification found 1 issue.
3 files analyzed ยท 8 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-wchiway-contextweaver": {
"args": [
"-y",
"@chiway/contextweaver"
],
"command": "npx"
}
}
}From the project's GitHub README.
ContextWeaver is a semantic retrieval engine purpose-built for AI coding assistants. It combines hybrid search (vector + lexical), intelligent context expansion, and token-aware packing to deliver precise, relevant, and context-complete code snippets to LLMs.
displayCode for presentation, vectorText for embeddingSourceAdapter.toCharOffset before writing metadata, preventing multi-byte character slicing errors (v1.4.0+)normalized query + projectId + index version + search-config fingerprint, so it invalidates automatically after an index update or config change โ stale results are never returnedcontextweaver watch watches the filesystem and triggers incremental indexing automatically, with debouncing (500ms by default) and scan de-duplication (no concurrent scans)contextweaver stats CLI (with --json) plus the MCP stats toolpending_marks backlog, missing vector rows, and more โ with suggested fixesfiles.content, reducing index size by 30โ50%pending/done/aborted persisted, auto-rebuilt on crash recovery# Global install
npm install -g @chiway/contextweaver
# Or with pnpm
pnpm add -g @chiway/contextweaver
# Create the config file (~/.contextweaver/.env)
contextweaver init
# Or the short alias
cw init
Edit ~/.contextweaver/.env and fill in your API keys:
# Embedding API config (required)
EMBEDDINGS_API_KEY=your-api-key-here
EMBEDDINGS_BASE_URL=https://api.siliconflow.cn/v1/embeddings
EMBEDDINGS_MODEL=BAAI/bge-m3
EMBEDDINGS_MAX_CONCURRENCY=10
EMBEDDINGS_DIMENSIONS=1024
# Reranker config (required)
RERANK_API_KEY=your-api-key-here
RERANK_BASE_URL=https://api.siliconflow.cn/v1/rerank
RERANK_MODEL=BAAI/bge-reranker-v2-m3
RERANK_TOP_N=20
# Search parameters (optional, override built-in defaults)
CW_SEARCH_WVEC=0.6
CW_SEARCH_WLEX=0.4
CW_SEARCH_RERANK_TOP_N=10
CW_SEARCH_MAX_TOTAL_CHARS=48000
CW_SEARCH_VECTOR_TOP_K=80
CW_SEARCH_SMART_MAX_K=8
CW_SEARCH_IMPORT_FILES_PER_SEED=3
# Ignore patterns (optional, comma-separated)
# IGNORE_PATTERNS=.venv,node_modules
# Run from the codebase root
contextweaver index
# Specify a path
contextweaver index /path/to/your/project
# Force a full re-index
contextweaver index --force
# Watch for file changes and auto-index incrementally (Ctrl+C to stop)
contextweaver watch
# Specify a path and debounce window (ms)
contextweaver watch /path/to/project --debounce 800
watch runs one full incremental scan on startup, then listens to filesystem events; changes trigger a de-duplicated scan within the debounce window, and paths excluded by ignore rules never trigger a scan.
# Semantic search
cw search --information-request "How is the user authentication flow implemented?"
# With exact terms
cw search --information-request "Database connection logic" --technical-terms "DatabasePool,Connection"
The following commands are CLI mirrors of MCP tools, with zero Embedding API cost:
# List indexed files (supports glob / language / count filters)
contextweaver list-files --glob "src/**/*.ts" --language typescript --max-results 100
# Look up a symbol definition
contextweaver definition SearchService --hint-path src/search
# Look up symbol references
contextweaver references handleStats --exclude-definition
# Human-readable stats report
contextweaver stats
# JSON output (for scripting)
contextweaver stats --json
# Specify a project path
contextweaver stats --path /path/to/project
# Launch the MCP server (for use by Claude and other AI assistants)
contextweaver mcp
# Show LanceDB migration state
contextweaver migrate
# Clear the aborted state: wipe LanceDB and trigger a full rebuild
# Triggered when: the Indexer refuses to write after sampling validation fails;
# run this, then index again.
contextweaver migrate --reset
# Specify a project path
contextweaver migrate --path /path/to/project
Add the following to your Claude Desktop config file:
{
"mcpServers": {
"contextweaver": {
"command": "contextweaver",
"args": ["mcp"]
}
}
}
ContextWeaver exposes 5 MCP tools, following a layered design of "semantic retrieval first, structure browsing second":
| Tool | Purpose | Embedding cost |
|---|---|---|
codebase-retrieval | Primary tool: hybrid semantic + exact-match retrieval | Yes |
list-files | List indexed file structure (path/language/size) | No |
find-references | Find heuristic text references to a symbol | No |
get-symbol-definition | Find likely definition blocks for a symbol | No |
stats | Index/search/health statistics | No |
codebase-retrieval Parameters| Parameter | Type | Required | Description |
|---|---|---|---|
repo_path | string | โ | Absolute path to the repository root |
information_request | string | โ | The semantic intent in natural language |
technical_terms | string[] | โ | Exact technical terms (class/function names, etc.) |
mode | string | โ | Retrieval profile: quick, balanced, or deep |
include_globs | string[] | โ | File glob allowlist applied after retrieval |
exclude_globs | string[] | โ | File glob denylist applied after retrieval |
language | string[] | โ | Language allowlist applied after retrieval |
max_total_chars | number | โ | Per-call output budget in characters |
max_files | number | โ | Maximum number of files returned after packing |
max_segments_per_file | number | โ | Maximum non-contiguous segments per file |
return_debug | boolean | โ | Include debug metadata in structured output |
low_confidence_behavior | string | โ | Low-confidence handling: return_top1, return_empty, or return_with_warning |
output_format | string | โ | Response format: markdown, json, or both |
list-files Parameters| Parameter | Type | Required | Description |
|---|---|---|---|
repo_path | string | โ | Absolute path to the repository root |
glob | string | โ | Glob pattern to filter paths |
language | string | โ | Language filter (matched against files.language) |
max_results | number | โ | Max files to return (default 200) |
find-references Parameters| Parameter | Type | Required | Description |
|---|---|---|---|
repo_path | string | โ | Absolute path to the repository root |
symbol | string | โ | Exact symbol name |
exclude_definition | boolean | โ | Exclude chunks whose breadcrumb tail matches the symbol name |
max_results | number | โ | Max references to return (default 50) |
get-symbol-definition Parameters| Parameter | Type | Required | Description |
|---|---|---|---|
repo_path | string | โ | Absolute path to the repository root |
symbol | string | โ | Exact symbol name to resolve |
hint_path | string | โ | Preferred path to disambiguate same-name definitions |
max_results | number | โ | Max definitions to return (default 3) |
Note:
find-referencesandget-symbol-definitionare heuristic text lookups over indexed chunks, not compiler-accurate navigation. For exhaustive raw text matching, usegrepoutside MCP.
information_request describes "what to do", technical_terms filters "what it's called"flowchart TB
subgraph Interface["CLI / MCP Interface"]
CLI[contextweaver CLI]
MCP[MCP Server]
end
subgraph Search["SearchService"]
QC[QueryCache<br/>LRU]
VR[Vector Retrieval]
LR[Lexical Retrieval]
RRF[RRF Fusion + Rerank]
QC -.cache hit.-> CP
VR --> RRF
LR --> RRF
end
subgraph Expand["Context Expansion"]
GE[GraphExpander]
CP[ContextPacker]
GE --> CP
end
subgraph Storage["Storage Layer"]
VS[(VectorStore<br/>LanceDB)]
DB[(SQLite<br/>FTS5)]
end
subgraph Index["Indexing Pipeline"]
CR[Crawler<br/>fdir] --> SS[SemanticSplitter<br/>Tree-sitter] --> IX[Indexer<br/>Batch Embedding]
end
Interface --> Search
RRF --> GE
Search <--> Storage
Expand <--> Storage
Index --> Storage
| Module | Responsibility |
|---|---|
| SearchService | Hybrid search core: coordinates vector/lexical recall, RRF fusion, rerank; integrates QueryCache |
| QueryCache | Per-project in-process LRU cache (v1.5.0+); a hit skips the entire retrieval pipeline |
| GraphExpander | Context expander: runs the E1/E2/E3 three-stage expansion strategy |
| ContextPacker | Context packer: segment merging and token budget control |
| ChunkContentLoader | Slices files.content by (path, start_index, end_index) (v1.4.0+) |
| VectorStore | LanceDB adapter; exposes pure vector operations only |
| Database (SQLite) | Metadata storage + FTS5 full-text index + statistics counters, schema_version=3 |
| Bootstrap | Cross-store init coordinator: pending_marks replay + LanceDB schema migration (v1.4.0+) |
| SemanticSplitter | AST semantic chunker (Tree-sitter); normalizes offsets to the UTF-16 character domain on write |
| Watcher | File-watch coordinator (v1.5.0+): debounce + scan de-duplication + ignore filtering |
| Stats | Statistics aggregation layer (v1.5.0+): combines index/search/health metrics |
~/.contextweaver/<projectId>/
โโโ index.db # SQLite
โ โโโ files # File metadata + full content (content column, the only source for text slicing)
โ โโโ files_fts # External-content table, inverted index pointing to files
โ โโโ chunks_fts # Chunk-level inverted index, per-file wholesale replacement
โ โโโ metadata # schema_version / lancedb_migration_state / lock
โ โโโ stats # Cumulative index/search counters (v1.5.0+)
โ โโโ pending_marks # Outbox: replayed when a vector_index_hash mark failed
โโโ vectors.lance/ # LanceDB chunks table (vectors + locating metadata only, no content)
Key invariants:
files.content; ChunkContentLoader slices via start_index/end_index (same source as displayCode)pending/done/aborted is persisted, with cross-process mutual exclusion via an advisory lockcontextweaver/
โโโ src/
โ โโโ index.ts # CLI entry (init / index / watch / search / mcp / migrate / stats)
โ โโโ config.ts # Config management (environment variables)
โ โโโ defaultEnv.ts # Default .env template
โ โโโ cli/
โ โ โโโ mirrorCommands.ts # CLI mirrors of MCP tools (list-files / definition / references)
โ โโโ api/ # External API wrappers
โ โ โโโ embedding.ts # Embedding API
โ โ โโโ reranker.ts # Reranker API
โ โโโ chunking/ # Semantic chunking
โ โ โโโ SemanticSplitter.ts # AST semantic chunker
โ โ โโโ SourceAdapter.ts # Source adapter (UTF-16/UTF-8 domain normalization)
โ โ โโโ LanguageSpec.ts # Language spec definitions
โ โ โโโ ParserPool.ts # Tree-sitter parser pool
โ โ โโโ types.ts # Chunking type definitions
โ โโโ scanner/ # File scanning
โ โ โโโ index.ts # Scan orchestration
โ โ โโโ crawler.ts # Filesystem traversal
โ โ โโโ processor.ts # File processing
โ โ โโโ watcher.ts # File-watch coordinator (v1.5.0+)
โ โ โโโ filter.ts # Filter rules
โ โ โโโ hash.ts # File hash
โ โ โโโ language.ts # Language detection
โ โโโ indexer/ # Indexer
โ โ โโโ index.ts # Three-stage transaction (LanceDB โ FTS+outbox โ SQLite mark)
โ โโโ vectorStore/ # Vector storage
โ โ โโโ index.ts # LanceDB adapter (pure vector operations)
โ โโโ db/ # Database
โ โ โโโ index.ts # SQLite + FTS5 + pending_marks + migration state machine + stats counters
โ โ โโโ bootstrap.ts # Cross-store init coordinator (v1.4.0+)
โ โโโ search/ # Search service
โ โ โโโ SearchService.ts # Core search service (cache-integrated)
โ โ โโโ QueryCache.ts # Per-project LRU query cache (v1.5.0+)
โ โ โโโ GraphExpander.ts # Context expander
โ โ โโโ ContextPacker.ts # Context packer
โ โ โโโ ChunkContentLoader.ts # Slices by (path, start_index, end_index) (v1.4.0+)
โ โ โโโ fts.ts # Full-text search (per-file wholesale replacement)
โ โ โโโ config.ts # Search default config + value bounds
โ โ โโโ loadConfig.ts # Env-var overrides + config fingerprint (v1.5.0+)
โ โ โโโ types.ts # Type definitions
โ โ โโโ utils.ts # Token-overlap scoring
โ โ โโโ resolvers/ # Multi-language import resolvers
โ โ โโโ JsTsResolver.ts
โ โ โโโ PythonResolver.ts
โ โ โโโ GoResolver.ts
โ โ โโโ JavaResolver.ts
โ โ โโโ RustResolver.ts
โ โ โโโ CppResolver.ts
โ โ โโโ CSharpResolver.ts
โ โโโ stats/ # Statistics aggregation layer (v1.5.0+)
โ โ โโโ index.ts # Aggregates and renders index/search/health metrics
โ โโโ mcp/ # MCP server
โ โ โโโ server.ts # MCP server implementation (registers 5 tools)
โ โ โโโ main.ts # MCP entry
โ โ โโโ tools/
โ โ โโโ index.ts # Tool registry
โ โ โโโ shared.ts # Shared tool logic
โ โ โโโ codebaseRetrieval.ts # Code retrieval tool
โ โ โโโ listFiles.ts # File structure browsing (v1.5.0+)
โ โ โโโ findReferences.ts # Symbol reference lookup (v1.5.0+)
โ โ โโโ getSymbolDefinition.ts # Symbol definition lookup (v1.5.0+)
โ โ โโโ stats.ts # Statistics tool (v1.5.0+)
โ โโโ utils/ # Utilities
โ โโโ logger.ts # Logging system
โ โโโ encoding.ts # Encoding detection
โ โโโ lock.ts # File lock
โโโ tests/ # Unit + integration tests (28 test files, 156 test cases)
โ โโโ chunking/ # SourceAdapter / chunking
โ โโโ cli/ # mirrorCommands
โ โโโ db/ # migration, outbox, advisory lock, index-version
โ โโโ indexer/ # transaction compensation, GC, aborted guard
โ โโโ integration/ # real LanceDB end-to-end
โ โโโ mcp/ # list-files / find-references / get-symbol-definition / shared / tool registry
โ โโโ scanner/ # watcher / index-version
โ โโโ search/ # FTS, ChunkContentLoader, Packer, cache, loadConfig
โ โโโ stats/ # statistics aggregation
โ โโโ vectorStore/ # chunk_id de-duplication, sampling validation
โโโ package.json
โโโ tsconfig.json
| Variable | Required | Default | Description |
|---|---|---|---|
EMBEDDINGS_API_KEY | โ | - | Embedding API key |
EMBEDDINGS_BASE_URL | โ | - | Embedding API URL |
EMBEDDINGS_MODEL | โ | - | Embedding model name |
EMBEDDINGS_MAX_CONCURRENCY | โ | 10 | Embedding concurrency |
EMBEDDINGS_DIMENSIONS | โ | 1024 | Vector dimensions |
RERANK_API_KEY | โ | - | Reranker API key |
RERANK_BASE_URL | โ | - | Reranker API URL |
RERANK_MODEL | โ | - | Reranker model name |
RERANK_TOP_N | โ | 20 | Rerank return count |
IGNORE_PATTERNS | โ | - | Extra ignore patterns |
The following environment variables override built-in defaults; out-of-range values are automatically clamped to the valid interval. When only one of wVec/wLex is set, the other is automatically set to 1 - x.
| Variable | Default | Bounds | Description |
|---|---|---|---|
CW_SEARCH_WVEC | 0.6 | 0โ1 | Vector weight (fusion stage) |
CW_SEARCH_WLEX | 0.4 | 0โ1 | Lexical weight (complements wVec) |
CW_SEARCH_RERANK_TOP_N | 10 | 5โ20 | Results kept after rerank |
CW_SEARCH_MAX_TOTAL_CHARS | 48000 | 20000โ80000 | Token budget (in chars, ~12k tokens) |
CW_SEARCH_VECTOR_TOP_K | 80 | 40โ200 | Vector recall candidates |
CW_SEARCH_SMART_MAX_K | 8 | 5โ15 | Smart TopK hard upper bound |
CW_SEARCH_IMPORT_FILES_PER_SEED | 3 | 0โ5 | E3 import files resolved per seed (0 disables cross-file expansion) |
interface SearchConfig {
// === Recall ===
vectorTopK: number; // Vector recall candidates (default 80)
vectorTopM: number; // Vectors kept after dedup (default 60)
ftsTopKFiles: number; // FTS recall file count (default 20)
lexChunksPerFile: number; // Lexical chunks per file (default 2)
lexTotalChunks: number; // Total lexical chunks (default 40)
// === Fusion ===
rrfK0: number; // RRF smoothing constant (default 20)
wVec: number; // Vector weight (default 0.6)
wLex: number; // Lexical weight (default 0.4)
fusedTopM: number; // Candidates fed into rerank after fusion (default 60)
// === Rerank ===
rerankTopN: number; // Results kept after rerank (default 10)
maxRerankChars: number; // Max chars per chunk sent to reranker (default 1000)
maxBreadcrumbChars: number;// Max chars for breadcrumb context (default 250)
headRatio: number; // Head/tail ratio when truncating (default 0.67)
// === Expansion ===
neighborHops: number; // E1 neighbor hops (default 2)
breadcrumbExpandLimit: number; // E2 breadcrumb completions (default 3)
importFilesPerSeed: number; // E3 import files per seed (default 3)
chunksPerImportFile: number; // E3 chunks per import file (default 3)
// === ContextPacker ===
maxSegmentsPerFile: number; // Max non-contiguous segments per file (default 3)
maxTotalChars: number; // Token budget (chars, default 48000)
// === Smart TopK ===
enableSmartTopK: boolean; // Enable smart cutoff (default true)
smartTopScoreRatio: number; // Dynamic threshold ratio (default 0.5)
smartTopScoreDeltaAbs: number; // Max absolute drop from Top1 (default 0.25)
smartMinScore: number; // Absolute floor (default 0.25)
smartMinK: number; // Safe Harbor count (default 2)
smartMaxK: number; // Hard upper bound (default 8)
}
ContextWeaver natively supports AST parsing for the following languages via Tree-sitter:
| Language | AST Parsing | Import Resolution | Extensions |
|---|---|---|---|
| TypeScript | โ | โ | .ts, .tsx |
| JavaScript | โ | โ | .js, .jsx, .mjs, .cjs |
| Python | โ | โ | .py |
| Go | โ | โ | .go |
| Java | โ | โ | .java |
| Rust | โ | โ | .rs |
| C | โ | โ | .c, .h |
| C++ | โ | โ | .cpp, .cc, .cxx, .hpp |
| C# | โ | โ | .cs |
Other languages fall back to line-based chunking and can still be indexed and searched normally.
0. Bootstrap โ pending_marks replay + LanceDB schema migration (first launch)
1. Crawler โ traverse the filesystem, filter ignored items
2. Processor โ read file content, compute hash
3. Splitter โ AST parse, semantic chunking (offsets normalized to UTF-16 char domain)
4. Indexer โ batch embedding
5. Stages 4-6 pseudo-transaction:
โโ LanceDB write (pre-delete (path, hash) to avoid duplicates โ add โ clear old versions)
โโ FTS + outbox single SQLite transaction (rolls back LanceDB on failure)
โโ SQLite mark + clear outbox single transaction (outbox kept on failure, replayed next launch)
6. Trailing GC โ clean up LanceDB orphan chunks (time budget 5s)
1. Query Parse โ parse the query, separate semantics from terms
2. Cache Lookup โ return immediately on hit (v1.5.0+, key includes index version + config fingerprint)
3. Hybrid Recall โ dual-channel vector + lexical recall
4. RRF Fusion โ Reciprocal Rank Fusion
5. Rerank โ cross-encoder reranking
6. Smart Cutoff โ intelligent score cutoff
7. Graph Expand โ neighbor/breadcrumb/import expansion
8. Context Pack โ segment merging, token budget
9. Cache Store โ write to cache (v1.5.0+)
10. Format Output โ format and return to the LLM
list-files/find-references/get-symbol-definition do not call the Embedding API (v1.5.0+)contextweaver stats outputs three sections:
pending_marks, language breakdownWhen an abnormal migration state, pending_marks backlog, or missing vector rows are detected, the report appends diagnostic warnings with the corresponding fix commands. The --json output maps to StatsReport for scripts and monitoring systems.
Log file location: ~/.contextweaver/logs/app.YYYY-MM-DD.log
Set the log level:
# Enable debug logging
LOG_LEVEL=debug contextweaver search --information-request "..."
aborted state)Symptom: contextweaver index errors with "LanceDB is in the aborted state, refusing to write to prevent schema pollution."
Cause: during the v1.4.0 upgrade, the old LanceDB index's display_code differs from the current files.content by >1% on sampling (typically on legacy indexes whose chunk offsets used the UTF-8 byte domain).
Fix:
contextweaver migrate --reset # Clear the LanceDB chunks table + reset state to done
contextweaver index # Full rebuild (new schema)
You can also run contextweaver stats first to view diagnostic warnings and confirm the current migration state and pending_marks backlog.
If the MCP server is long-running and another terminal runs contextweaver index, the two processes contend for migration. v1.4.0 introduces an advisory lock with a 10-minute zombie threshold, automatically letting one process skip migration while the other completes it.
If the lock gets stuck (after kill -9), clear it manually:
sqlite3 ~/.contextweaver/<projectId>/index.db \
"DELETE FROM metadata WHERE key = 'lancedb_migration_lock';"
v1.4.0 solves this via the pending_marks outbox: when an FTS write succeeds but the vector_index_hash mark fails, it is replayed automatically on the next launch, avoiding duplicate embeddings.
Confirm incremental indexing has run (or enable contextweaver watch for automatic increments). The query cache key is bound to the index version, so old cache entries invalidate automatically after an index update โ no manual clearing needed.
This project is licensed under the MIT License.
Be the first to review this server!
by Modelcontextprotocol ยท Developer Tools
Read, search, and manipulate Git repositories programmatically
by Modelcontextprotocol ยท Developer Tools
Web content fetching and conversion for efficient LLM usage
by Toleno ยท Developer Tools
Toleno Network MCP Server โ Manage your Toleno mining account with Claude AI using natural language.