v0.1.0 · 54 tasks · 5,599 lines of Go

Modular RAG Plugin

Point at a directory. Index it. Query it. Six composable modules that adapt to what you're asking. Built for Cicerone, works standalone.

How It Works

Six modules, one pipeline. Each module can be enabled or disabled independently.

🔍
Search
📚
Retrieve
🧠
Memory
🔀
Fusion
Adapt
🎯
Predict

Search Module

Classifies your query as factual, comparative, procedural, or explanatory. Sets top_k and output format based on type.

factual → 3 results · comparative → 6 · procedural → 5 · explanatory → 4

Retrieve Module

Queries ChromaDB with the search strategy. Embeds documents server-side using all-MiniLM-L6-v2 (384-dim cosine vectors). No local ONNX needed.

Pure Go · ChromaDB handles embedding · HTTP REST API

Memory Module

LRU cache of the last 100 queries. Recalls similar past interactions to provide context continuity across conversations.

100-query cache · case-insensitive matching · thread-safe

Fusion Module

Deduplicates results by source document (keeps best chunk per doc) and sorts by relevance. Reduces noise from repeated sources.

Dedup by source · sort by cosine distance · best-first

Adapt Module

Selects output format per query type. Factual → concise top-3. Comparative → grouped by category. Procedural → ordered steps. Explanatory → narrative.

4 formats · auto-selected · configurable

Predict Module

Generates the final answer with source citations, relevance scores, and a pipeline trace footer showing which modules ran.

Citations · relevance % · module trace · fallback on failure

Proven Performance

Tested against Naive and Advanced RAG with 256 documents across 6 query categories.

0.5447
Avg Best Distance (Modular)
48ms
Avg Latency
+13%
Vague Query Improvement
6
Composable Modules
Experiment Results (256 docs, 13 queries, 6 categories)
┌──────────────┬──────────┬──────────────┬──────────────┬──────────────┬──────────┬──────────┐ │ Architecture │ Factual │ Comparative │ Procedural │ Explanatory │ Vague │ Overall │ ├──────────────┼──────────┼──────────────┼──────────────┼──────────────┼──────────┼──────────┤ │ Modular │ 0.5972 │ 0.5119 │ 0.5521 │ 0.53750.42480.5447 │ │ Advanced │ 0.6038 │ 0.54170.5650 │ 0.5375 │ 0.4393 │ 0.5415 │ │ Naive │ 0.5972 │ 0.5207 │ 0.5685 │ 0.5375 │ 0.4385 │ 0.5325 │ └──────────────┴──────────┴──────────────┴──────────────┴──────────────┴──────────┴──────────┘ Lower distance = better relevance. Modular wins overall, especially on vague queries.

Full experiment data at rag-demos repository

Built for Production

Circuit breakers, query caching, graceful degradation. Your chat doesn't die when ChromaDB does.

Circuit Breaker

3 consecutive ChromaDB failures → 30s cooldown. Half-open recovery test. Your LLM keeps working without context.

Query Cache

LRU cache with 5-minute TTL. Repeated queries skip the pipeline entirely. 100 query capacity.

Partial Results

If any module fails, the pipeline logs it and continues. You get the best answer possible, not an error.