Noureddine RAMDI / LycheeMemory: a lightweight semantic long-term memory framework for LLM agents

Created Mon, 04 May 2026 10:23:02 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

LycheeMem/LycheeMem

LycheeMemory tackles a common pain point for developers building large language model (LLM) agents: how to maintain a structured, efficient long-term memory that scales without ballooning token usage or operational complexity. Unlike many agent memory frameworks that rely on heavyweight graph databases or add overhead, LycheeMemory delivers semantic memory consolidation and adaptive retrieval that reduce token consumption by about 71% and costs by 55%, while improving task scores by roughly 6% according to PinchBench benchmarks.

what lycheememory does and its architecture

At its core, LycheeMemory is a Python framework designed to provide long-term conversational memory support for LLM agents. It structures memory as a semantic knowledge base backed by SQLite and LanceDB, offering a compact and efficient vector search backend without the need for heavier graph databases like Neo4j. The system exposes both a REST API and MCP (Multi-Agent Communication Protocol) endpoints, enabling seamless integration with various agent frameworks.

The architecture emphasizes adaptive memory management through automatic turn mirroring and boundary consolidation. Turn mirroring duplicates conversation turns to create a richer semantic context, while consolidation merges related memory segments to keep the memory footprint compact and relevant. A visual memory module allows inspecting and interacting with the stored memory, which aids in debugging and understanding the agent’s knowledge state.

LycheeMemory ships native plugins for popular agent protocols like OpenClaw, Hermes, and Claude Code, making it straightforward to plug into existing agent pipelines. The stack is fully Python-based with a CLI backend server serving on port 8000, making it accessible to Python developers and compatible with popular LLM providers like OpenAI and Gemini.

technical strengths and tradeoffs

What distinguishes LycheeMemory is its demonstrated efficiency in real-world benchmarks. On PinchBench, it achieved a ~6% improvement in task performance compared to OpenClaw’s native memory implementation, while reducing token consumption by ~71% and cost by ~55%. This is significant because most long-term memory systems add token overhead as they store more context, but LycheeMemory’s semantic consolidation reduces the amount of token context the agent needs to consume.

The choice of SQLite combined with LanceDB for semantic vector storage is a deliberate tradeoff prioritizing simplicity and lightweight operation over the complexity of graph DBs. This makes the project more accessible and reduces operational burden, but it may limit scalability or advanced graph query capabilities that some complex agent memories might require.

The automatic turn mirroring feature helps replicate conversational context without manual intervention, reducing developer effort while improving semantic retrieval quality. Boundary consolidation keeps memory size manageable but introduces a tradeoff in how aggressively context is merged, which might impact recall precision in edge cases.

The codebase is surprisingly clean, modular, and pragmatic. The use of environment variables for configuration and a CLI server backend is a solid pattern for local or cloud deployment. However, the reliance on Python and specific LLM API compatibility means it’s best suited for teams comfortable with Python and these ecosystems.

quick start

prerequisites

  • Python 3.9+
  • An LLM API key (OpenAI, Gemini, or any litellm-compatible provider)

installation

You can install LycheeMemory directly via pip:

pip install lycheemem

Once installed, you can start the backend server instantly using the CLI:

lycheemem-cli

For development or if you prefer to run from source:

git clone https://github.com/LycheeMem/LycheeMem.git
cd LycheeMem
pip install -e .

configuration

Create a .env file in your working directory and fill in your values. The .env.example provides a full template including session/user DB paths, JWT settings, and working-memory thresholds. The most important configuration values are the LLM API key and backend settings.

If you use OpenClaw, you can quickly install the LycheeMemory plugin and restart the gateway:

openclaw plugins install "/path/to/LycheeMem/openclaw-plugin"
openclaw gateway restart

Refer to the openclaw-plugin/INSTALL_OPENCLAW.md for a full setup guide.

verdict

LycheeMemory offers a practical, lightweight semantic memory solution for LLM agents that need to manage long-term conversational context efficiently. Its combination of SQLite and LanceDB for semantic storage is a sensible tradeoff for teams that want to avoid heavyweight graph databases while benefiting from semantic retrieval.

The concrete token and cost savings demonstrated in benchmarks make it an appealing choice for developers looking to optimize API usage and reduce operational costs. The native plugin support for popular agent protocols lowers integration friction.

On the downside, the framework assumes some familiarity with Python and backend server management. Its scalability beyond moderate use cases isn’t clearly documented, and the tradeoffs in consolidation aggressiveness might require tuning for specific applications.

Overall, if you’re building LLM agents and want a memory system that is efficient, well-structured, and demonstrably reduces token use without adding complexity, LycheeMemory is worth exploring. It solves a real problem without unnecessary bloat, and the codebase is accessible enough to customize or extend in production environments.


→ GitHub Repo: LycheeMem/LycheeMem ⭐ 237 · Python