Back to Portfolio
Active2025
Atlas / Code Context MCP
MCP server that indexes codebases via Tree-sitter AST and serves semantically relevant context to Claude Code, cutting token usage by 85–90%.
85–90%
Token Reduction
<100ms
Query Latency
15+
MCP Tools
~1k lines/sec
Indexing Speed
Architecture
- SQLite storage layer for symbol index, relationship graph, and error KB
- Tree-sitter AST parsing for Python, TypeScript, Go, Java
- FAISS vector search with text fallback for semantic queries
- Code knowledge graph with multi-granularity traversal (Method → Class → Module)
- Token-aware chunking (max 500 tokens/chunk, 4000 token budget per query)
- Pluggable LLM layer with content-hash caching (optional, gated by env flag)
Key Decisions
SQLite over PostgreSQL
Why: Zero-config, runs anywhere, no server process — matches developer tooling UX expectations
Tradeoff: No concurrent write support, but MCP servers are single-user
LLM features optional (ATLAS_LLM_ENABLED flag)
Why: Core value is AST-based context extraction, not LLM summaries
Tradeoff: Some features (code summarization) are degraded without LLM
Technologies
PythonMCPFAISSTree-sitterSQLite
What I Learned
- Tree-sitter AST parsing handles edge cases (decorators, nested classes, multiline signatures) that regex misses entirely.
- Graph-aware retrieval (traversing call chains and imports) surfaces more relevant context than pure vector similarity.
- An error knowledge base that logs fixes alongside errors turned out to be surprisingly useful for recurring issues.