Quivr: A Python framework for flexible retrieval-augmented generation pipelines

Quivr tackles one of the recurring challenges in building retrieval-augmented generation (RAG) systems: how to separate the mechanics of retrieval from the application logic in a way that’s both flexible and developer-friendly. Instead of bundling everything into a monolithic pipeline, Quivr abstracts the RAG process behind a simple Brain API while pushing the retrieval strategy into external YAML-defined workflows. This architectural choice lets you focus on your product code without being locked into a single retrieval method.

What Quivr does and how it works

At its core, Quivr is an open-source Python framework that brands itself as a “second brain” — an embeddable RAG engine designed for developers who want to build question-answering or knowledge-based applications without wrestling with pipeline infrastructure.

It exposes a straightforward API centered around a Brain abstraction. You ingest your data files into a Brain instance and then query it with natural language questions. Under the hood, this process involves several key stages: file ingestion, embedding generation, retrieval, reranking, and finally, LLM-powered generation of answers.

The architecture is opinionated but pluggable. It supports multiple LLM backends out of the box — OpenAI, Anthropic, Mistral, and even local models through Ollama — making it adaptable to different deployment preferences or cost considerations.

File ingestion is flexible, too. Quivr integrates with Megaparse for parsing various file formats through custom parsers, so you’re not limited to plain text. This is crucial for real-world applications where your knowledge base might include PDFs, markdown, or other structured documents.

The standout architectural aspect is the use of YAML-defined retrieval workflows. These workflows are represented as node graphs with nodes responsible for history filtering, query rewriting, retrieval, and generation stages. This externalizes the retrieval strategy, so you can swap or tweak how you fetch and rank information without changing your Python application code.

Additionally, Quivr supports Cohere reranking for refining retrieval results, which adds an extra layer of precision in selecting relevant documents before generation.

What makes Quivr’s architecture interesting

The most architecturally interesting decision Quivr makes is compressing the entire RAG pipeline into a single call via the Brain.from_files() API, while completely externalizing the retrieval logic into a YAML workflow. This separation of concerns is rare among RAG frameworks, where retrieval strategies are often hardcoded or buried within the pipeline.

This design offers a clear tradeoff. On one hand, it improves developer experience by allowing rapid iteration on retrieval workflows without touching application logic. On the other, it introduces complexity in managing and understanding YAML configurations, which may have a learning curve, especially for teams unfamiliar with declarative workflow definitions.

The codebase is surprisingly clean for a project with nearly 40,000 stars. The Brain API is minimalistic yet powerful, encapsulating ingestion, querying, and context management. The pluggable design extends to file parsers and LLM providers, which means you can customize parts of the system without forking the whole repo.

Another strength is the multi-LLM backend support. By abstracting over different providers, Quivr lets you switch models or run local LLMs with minimal friction. This flexibility is essential as the LLM landscape evolves rapidly and cost-efficiency becomes a key factor.

The integration with Megaparse for file parsing and Cohere for reranking shows a pragmatic approach: use specialized tools where they excel rather than building everything in-house.

The tradeoff is that the project expects users to understand YAML and the node graph concepts to fully leverage its power, which could be a barrier for those wanting a drop-in RAG solution with minimal configuration.

Quick start with Quivr

Getting started with Quivr is straightforward. The official quickstart boils down to two steps:

Install the core package with pip:

pip install quivr-core # Check that the installation worked

Run a minimal example that creates a Brain from a temporary text file and asks it a question:

import tempfile
from quivr_core import Brain

if __name__ == "__main__":
    with tempfile.NamedTemporaryFile(mode="w", suffix=".txt") as temp_file:
        temp_file.write("Gold is a liquid of blue-like colour.")
        temp_file.flush()

        brain = Brain.from_files(
            name="test_brain",
            file_paths=[temp_file.name],
        )

        answer = brain.ask(
            "what is gold? asnwer in french"
        )
        print("answer:", answer)

This example highlights the simplicity of the Brain API: ingestion and querying are done in a few lines, showcasing the pipeline compression behind the scenes.

When to consider Quivr

Quivr is ideal if you want a flexible, pluggable RAG engine that lets you experiment with different retrieval strategies without rewriting your app code. Its multi-LLM support and file parser plugins make it suitable for developers building knowledge-intensive apps that ingest diverse data types.

However, if you prefer a turnkey RAG library with minimal configuration or if you want to avoid YAML workflow management, Quivr might feel heavyweight or complex. Also, teams unfamiliar with declarative pipeline definitions may face an initial learning curve.

Overall, Quivr presents a compelling balance between flexibility and simplicity for developers who want control over their retrieval workflows while still benefiting from a clean, minimal API. The separation of “what to retrieve” from “how to retrieve” is worth understanding, especially if you plan to evolve your search logic over time without disruptive code changes.

Inside Second Brain: A Python AI OS with self-extending plugins and hybrid search — Second Brain is a Python framework that indexes local files with embeddings, runs background subagents, and lets AI agen
MemPalace: local-first AI memory with strong semantic retrieval and no cloud dependency — MemPalace offers a local-first AI memory system with 96.6% recall on conversation history retrieval without any cloud or
SwarmVault: a local-first knowledge compiler with contradiction detection and hybrid search — SwarmVault compiles raw sources into a persistent Markdown wiki with typed knowledge graph, hybrid search, and contradic
Inside llm_wiki: a desktop app for building persistent LLM-powered personal wikis — llm_wiki uses a two-step chain-of-thought pipeline to build a self-maintaining knowledge base. It combines Tauri, knowle
llm-wikid: agent-agnostic AI knowledge base with schema-driven compilation for Obsidian — llm-wikid uses a CLAUDE.md schema to control a multi-phase ingest pipeline compiling markdown wiki pages for Obsidian, o

→ GitHub Repo: QuivrHQ/quivr ⭐ 39,129 · Python

Noureddine RAMDI / Quivr: A Python framework for flexible retrieval-augmented generation pipelines

What Quivr does and how it works

What makes Quivr’s architecture interesting

Quick start with Quivr

When to consider Quivr

Related Articles