SwarmVault: a local-first knowledge compiler with contradiction detection and hybrid search

SwarmVault tackles the persistent challenge of building a reliable, local-first knowledge base that integrates diverse sources like code, docs, and transcripts into a unified, queryable wiki. What sets it apart is its production-ready CLI implementation of Karpathy’s LLM Wiki pattern, combined with a layered architecture and a focus on preventing knowledge hallucination through contradiction detection and approval workflows.

architecture and core functionality of SwarmVault

At its core, SwarmVault is a TypeScript-based CLI tool that transforms raw input files — ranging from documentation and source code to transcripts — into a persistent Markdown wiki enriched with a typed knowledge graph. The architecture separates concerns into three layers:

Immutable sources layer: where raw input files live unchanged.
Wiki layer: LLM-generated wiki pages that synthesize information from sources.
Typed schema layer: a co-evolving schema that captures concepts, entities, and their relationships.

This separation helps maintain data integrity and supports incremental updates. Under the hood, it employs a hybrid search engine combining SQLite full-text search (FTS) with semantic embeddings, enabling both keyword-based and vector similarity queries.

SwarmVault supports over 30 input formats and integrates with 16+ AI agents through a Model Context Protocol (MCP) server. It can operate fully offline using a heuristic provider, which is important for privacy-conscious or disconnected environments.

Additional features include watch mode with git hooks to auto-update the vault, approval queues that stage changes for review, and an agent task ledger for tracking agent interactions. These features aim to provide a robust workflow preventing the compounding of LLM hallucinations by tagging knowledge edges as extracted, inferred, or ambiguous.

technical strengths and tradeoffs

What distinguishes SwarmVault is its focus on knowledge correctness and maintainability, not just aggregation. By tagging edges in the knowledge graph with provenance and confidence levels, the system enables contradiction detection and staged approval before new concepts land in the main wiki. This explicit treatment of ambiguity is rare in similar knowledge base tools.

The hybrid search approach balances the speed and familiarity of SQLite FTS with the semantic power of embeddings, meaning queries can retrieve relevant content even if keywords don’t match exactly. This is a practical tradeoff that keeps search efficient while improving relevance.

The typed knowledge graph introduces formalism uncommon in typical Markdown wikis. It allows richer queries and relationships between entities, which can be crucial in complex knowledge domains. However, this also increases complexity for users unfamiliar with schema design.

Supporting 16+ agent integrations via MCP means SwarmVault can fit into diverse AI ecosystems, but it also requires users to manage agent configuration and compatibility. The offline heuristic provider is a useful fallback but obviously limits capabilities compared to cloud-based LLMs.

Overall, the codebase is surprisingly clean for a TypeScript CLI tool of this scope. The layered architecture enforces a clear separation of concerns, and the approval queue mechanism is well thought out, though it adds operational overhead.

quick start with SwarmVault

SwarmVault requires Node >=24.

npm install -g @swarmvaultai/cli

Verify the install:

swarmvault --version

Update to the latest published release:

npm install -g @swarmvaultai/cli@latest

The global CLI includes the graph viewer workflow and MCP server flow. End users do not need to install @swarmvaultai/viewer separately.

A typical vault structure looks like this:

my-vault/
├── swarmvault.schema.md       user-editable vault instructions
├── raw/                       immutable source files and localized assets
├── wiki/                      compiled wiki: sources, concepts, entities, code, outputs, graph
├── state/                     graph.json, retrieval/, embeddings, sessions, approvals
├── .obsidian/                 optional Obsidian workspace config
└── agent/                     generated agent-facing helpers

To set up agents and MCP integrations, you can run commands like:

swarmvault install --agent claude --hook    # Claude Code + graph-first hook
swarmvault install --agent codex --hook     # Codex + graph-first hook
swarmvault install --agent cursor           # Cursor
swarmvault install --agent copilot --hook   # GitHub Copilot CLI + hook
# ... other agents

This setup enables coding agents to interact with the vault knowledge graph seamlessly.

why SwarmVault matters

Most knowledge base tools either focus on static documentation or simple markdown aggregation. SwarmVault’s insistence on a typed knowledge graph combined with contradiction detection and an approval workflow addresses a real pain point: how to keep LLM-augmented knowledge bases accurate and trustworthy over time.

The architecture’s separation of sources, wiki, and schema layers means you can track and manage changes systematically rather than drowning in a deluge of LLM-generated content. This is especially important as hallucination in knowledge graphs compounds and pollutes downstream tasks.

The hybrid search system is a good balance between classical FTS and semantic search, which keeps performance manageable while improving recall.

On the flip side, the system isn’t trivial to set up or run for casual users. Node 24+ is required, and managing agent integrations requires some technical know-how. The complexity of the typed schema layer might be a hurdle for teams without a dedicated knowledge engineering role.

Still, for teams building knowledge-driven AI workflows or running multi-agent environments, SwarmVault’s tooling and patterns offer a solid foundation.

verdict

SwarmVault is a well-engineered, local-first knowledge compiler focused on integrity and maintainability. It’s best suited for AI practitioners and teams who need a persistent, queryable knowledge base that integrates with multiple agents and supports offline operation.

Its explicit contradiction detection and staged approval workflows go beyond most wikis and RAG systems, addressing the real-world challenge of managing hallucination in LLM-augmented knowledge.

The tradeoff is complexity: it requires Node 24+, some familiarity with agent integrations, and a willingness to adopt a typed schema approach. If your use case demands accuracy and auditability in knowledge workflows, SwarmVault is worth exploring.

Awesome LLM Apps: a practical collection of runnable AI agent and RAG templates — Awesome LLM Apps offers 100+ runnable AI agent and RAG templates for quick LLM app development. It supports multiple pro
elizaOS: a TypeScript monorepo for building and deploying AI agents — Explore elizaOS, a TypeScript monorepo for AI agents with CLI and web UI. Build and deploy agents fast or extend with pl
Beads: a distributed graph issue tracker for multi-agent AI workflows — Beads is a Go-based CLI tool that uses Dolt-backed version control to manage AI agent tasks as a dependency-aware graph,
Agno: Building production-ready agentic software with minimal code — Agno provides a minimal, production-ready Python framework for scalable agentic software with per-user isolation and nat
Inside agents: a granular multi-agent orchestration system with PluginEval quality assurance — Explore agents, a Python-based multi-agent orchestration repo featuring 184 AI agents, 78 plugins, and a three-layer Plu

→ GitHub Repo: swarmclawai/swarmvault ⭐ 349 · TypeScript

Noureddine RAMDI / SwarmVault: a local-first knowledge compiler with contradiction detection and hybrid search

architecture and core functionality of SwarmVault

technical strengths and tradeoffs

quick start with SwarmVault

why SwarmVault matters

verdict

Related Articles