Stash: a shared agent memory with no server-side LLM calls

Stash takes a different approach to team knowledge management by capturing and aggregating coding agent sessions into a shared, queryable knowledge base — all without making any LLM calls on the server side. This design sidesteps common pain points around privacy, cost, and trust that plague most multi-agent collaboration tools.

how stash captures and stores coding agent sessions

Stash is a Python-based CLI tool that hooks into popular coding agents like Claude Code, Cursor, Codex, and Gemini CLI. These hooks automatically upload session transcripts to a shared server, which acts as a dumb storage layer for the raw transcript corpus.

Instead of running its own LLM calls, Stash delegates all curation, search, and knowledge extraction to the client side, where the coding agent’s existing API keys are used. This means the server never sees your code or incurs the cost of LLM calls — it simply stores and serves transcripts.

Under the hood, the architecture is modular and agent-agnostic. The server is self-hosted via Docker Compose, with optional support for S3-compatible storage and third-party embedding providers for richer search capabilities.

This split of responsibilities — dumb server, smart client — is rare in this space but solves a real problem: how to build shared agent memory without trusting third-party servers with proprietary code or paying for server-side API usage.

why stash’s client-side model stands out

The core strength of Stash is its zero server-side LLM call architecture. Most team knowledge tools either run LLM calls on the server or require you to trust external SaaS with your codebase. Stash avoids both.

This design means your API keys remain local to your coding agent, and all semantic search or knowledge graph curation happens within your agent’s context. The server is effectively a dumb store and index, minimizing its attack surface and cost.

The codebase is surprisingly clean for a project juggling multiple agent integrations and complex session management. Its hook-based approach plugs into each agent’s lifecycle events to capture transcripts automatically, reducing friction.

One tradeoff is the dependency on users to have valid and configured API keys for their agents. The server itself doesn’t provide a fully managed experience — you are responsible for self-hosting and managing environment variables, including secrets.

Another limitation is that all intelligence runs client-side, which can increase latency or complexity in client implementations. However, internal tests show a 49% speedup on long-running Claude Code instances, indicating efficient session handling.

quick start with stash

Run this in a terminal:

bash -c "$(curl -fsSL https://raw.githubusercontent.com/Fergana-Labs/stash/main/install.sh)"

Then try it: ask your coding agent if it has access to Stash.

self-hosting stash

To self host, just run docker compose on infrastructure of your choice.

git clone https://github.com/Fergana-Labs/stash.git
cd stash
cp .env.example .env          # fill in credentials + API keys

From there, follow the README to configure environment variables and start the server with Docker Compose.

This self-hosted server handles transcript storage and optional embedding indexing but makes no LLM calls itself.

verdict

Stash is a solid option if you want to build a shared coding agent memory across your team without trusting a third-party server with your code or incurring server-side LLM costs. Its client-side LLM invocation model is clever and practical, especially for private or sensitive repos.

That said, it assumes some comfort with Docker Compose and environment management. The reliance on client API keys means it’s best suited for teams already invested in coding agents like Claude Code or Cursor.

If you want a fully managed, server-side LLM knowledge system or a simpler turnkey solution, this might not be the best fit. But for teams prioritizing privacy, cost control, and extensibility, Stash’s architecture offers a compelling tradeoff worth exploring.

Forge: a Rust-based multi-agent AI coding assistant integrated into your terminal workflow — Forge is a Rust-based AI coding agent with multi-agent architecture and a unique ZSH plugin that intercepts shell comman
OpenAI Codex CLI: local-first AI coding assistant with ChatGPT integration — OpenAI Codex CLI brings AI coding assistance local to your terminal, integrating with ChatGPT plans for powerful hybrid
Beads: a distributed graph issue tracker for multi-agent AI workflows — Beads is a Go-based CLI tool that uses Dolt-backed version control to manage AI agent tasks as a dependency-aware graph,
Ferret v2: A declarative Go engine for web data extraction with a new API architecture — Ferret v2 is a Go-based declarative system for web scraping that introduces a native Go API and a compatibility layer to

→ GitHub Repo: Fergana-Labs/stash ⭐ 86 · Python

Noureddine RAMDI / Stash: a shared agent memory with no server-side LLM calls

how stash captures and stores coding agent sessions

why stash’s client-side model stands out

quick start with stash

self-hosting stash

verdict

Related Articles