Cavemem: deterministic compression and local memory for AI coding assistants

Cavemem tackles a persistent problem in AI coding assistants: how to efficiently store and retrieve session memory without bloating context windows with verbose prose. The repository delivers a local-first, persistent memory layer that compresses and indexes session observations captured through IDE hooks, enabling agents like Claude Code, Cursor, and Codex to query their own memory history with precision and minimal overhead.

How cavemem captures and stores session memory

At its core, cavemem is built around the concept of deterministic compression using a so-called “caveman grammar.” This grammar strips away English prose from captured session data, reducing token counts by about 75%, while preserving code snippets, file paths, and identifiers exactly byte-for-byte. This lossless-for-code compression is the key to avoiding context pollution from natural language chatter without sacrificing the code context AI assistants critically need.

The system hooks into IDEs synchronously, capturing session observations and writing them directly into a local SQLite database. This database is enhanced with FTS5 (full-text search) and a vector index to support hybrid search queries mixing BM25 and cosine similarity. These indexes enable progressive memory retrieval tools — search, timeline, and get_observations — that form the backbone of cavemem’s retrieval capabilities.

Under the hood, the synchronous hooks run quickly, completing in under 150 milliseconds, which is crucial for developer experience (DX) since any lag in an IDE would be disruptive. To handle embedding generation for vector search, cavemem spawns a background worker that runs only when needed and self-exits when idle, avoiding persistent daemon overhead.

A local web viewer accessible at http://127.0.0.1:37777 provides a read-only interface to browse the stored memory, which is helpful for debugging and manual inspection.

What sets cavemem’s memory compression and retrieval apart

The standout feature is the deterministic caveman grammar compression. Unlike common compression algorithms that treat code and prose alike, cavemem’s approach is specialized: it aggressively compresses prose tokens while guaranteeing that code and file paths remain intact byte-for-byte. This means the semantic integrity of code is preserved without inflating token counts with natural language noise.

This tradeoff is significant. Many AI memory systems rely on lossy compression or indiscriminate chunking, which risks losing code fidelity or polluting context windows with irrelevant text. Cavemem’s compression happens at the storage boundary, not at inference time, so queries to the AI assistant benefit from a clean, compact memory representation.

The use of a local SQLite database with FTS5 and a vector index is a pragmatic choice. SQLite is battle-tested, zero-dependency, and fast enough for local use, while FTS5 provides robust full-text search capabilities. Adding a vector index enables semantic similarity search, combining BM25 (lexical) and cosine similarity (embedding-based) re-ranking for more accurate retrieval.

The synchronous design of the hooks and the auto-spawning background worker balance responsiveness and asynchronous processing. Writing observations synchronously ensures no data loss in the hot path, while embedding generation is offloaded to a transient worker process, reducing resource consumption and complexity.

On the flip side, this design may not scale as well in multi-user or distributed environments since it’s heavily local-first and single-instance oriented. The SQLite storage, while simple and robust, could become a bottleneck with very large memory footprints or concurrent access scenarios.

Getting started with cavemem

Cavemem provides a straightforward installation and usage process via npm, making it accessible for developers already working in JavaScript/TypeScript environments.

npm install -g cavemem
cavemem install                    # Claude Code
cavemem install --ide cursor       # cursor | gemini-cli | opencode | codex
cavemem status                     # see wiring + embedding backfill
cavemem viewer                     # open http://127.0.0.1:37777

No daemon needs to be started manually. The hooks write synchronously under 150ms, and the background worker for embedding generation auto-spawns on the first hook and self-exits when idle. If preferred, you can disable the auto-start of the embedding worker with:

cavemem config set embedding.autoStart false

This setup keeps the developer experience smooth and low friction.

who cavemem is for and its caveats

Cavemem is well-suited for developers building or integrating AI coding assistants who want a robust, local-first memory system that doesn’t compromise on code fidelity or introduce heavy infrastructure dependencies.

Its deterministic compression approach is particularly valuable if your use case involves lots of code and file paths mixed with natural language observations, as it effectively reduces noise in the stored memory.

However, cavemem’s local-first SQLite backend and synchronous hooks imply it’s primarily designed for single-user desktop environments rather than distributed or cloud-scale deployments. If your project requires multi-user concurrency or massive memory scale, this might be a limiting factor.

The synchronous nature of hooks, while performant, also means any IDE integration must carefully handle potential latency, although the sub-150ms write times reported suggest this is well-managed.

Overall, cavemem offers a pragmatic, technically sound approach to a tricky problem in AI coding assistants: persistent, compressed memory storage that preserves code integrity and enables precise retrieval without infrastructure overhead.

If you’re working with Claude Code, Cursor, or similar agents and want to experiment with local persistent memory that’s tightly integrated and lossless for code, cavemem is worth exploring.

MemKraft: local-first memory for AI agents with empirical self-improvement — MemKraft is a zero-dependency local-first memory system storing AI agent knowledge as Markdown, featuring bitemporal tra
claude-memory-compiler: automating AI conversation memory compilation for Claude Code — claude-memory-compiler automates capturing and compiling Claude Code conversations into a knowledge base, improving AI a
mem0: optimizing AI agent memory with a new single-pass additive algorithm — mem0 enhances AI agent memory with a new single-pass ADD-only extraction algorithm and multi-signal retrieval, boosting
A-MEM: dynamic semantic memory management for LLM agents inspired by Zettelkasten — A-MEM is a Python agentic memory system that dynamically organizes LLM agent memories using semantic embeddings and auto
claude os: speeding up persistent ai memory for code with hybrid tree-sitter indexing — Claude OS cuts codebase indexing from hours to seconds using hybrid tree-sitter parsing, enabling fast persistent AI mem

→ GitHub Repo: JuliusBrussee/cavemem ⭐ 433 · TypeScript

Noureddine RAMDI / Cavemem: deterministic compression and local memory for AI coding assistants

How cavemem captures and stores session memory

What sets cavemem’s memory compression and retrieval apart

Getting started with cavemem

who cavemem is for and its caveats

Related Articles