Point at a directory. Index it. Query it. Six composable modules that adapt to what you're asking. Built for Cicerone, works standalone.
Six modules, one pipeline. Each module can be enabled or disabled independently.
Classifies your query as factual, comparative, procedural, or explanatory. Sets top_k and output format based on type.
Queries ChromaDB with the search strategy. Embeds documents server-side using all-MiniLM-L6-v2 (384-dim cosine vectors). No local ONNX needed.
LRU cache of the last 100 queries. Recalls similar past interactions to provide context continuity across conversations.
Deduplicates results by source document (keeps best chunk per doc) and sorts by relevance. Reduces noise from repeated sources.
Selects output format per query type. Factual → concise top-3. Comparative → grouped by category. Procedural → ordered steps. Explanatory → narrative.
Generates the final answer with source citations, relevance scores, and a pipeline trace footer showing which modules ran.
Tested against Naive and Advanced RAG with 256 documents across 6 query categories.
Full experiment data at rag-demos repository
Circuit breakers, query caching, graceful degradation. Your chat doesn't die when ChromaDB does.
3 consecutive ChromaDB failures → 30s cooldown. Half-open recovery test. Your LLM keeps working without context.
LRU cache with 5-minute TTL. Repeated queries skip the pipeline entirely. 100 query capacity.
If any module fails, the pipeline logs it and continues. You get the best answer possible, not an error.