kvcached: a plugin cache for SGLang and vLLM Python environments

kvcached addresses a real-world challenge in the growing ecosystem of large language model (LLM) runtimes by providing a caching layer that plugs into two popular Python-based LLM environments: SGLang and vLLM. This cache plugin aims to improve performance and efficiency when using these LLM runtimes, which are increasingly common in AI-driven applications.

What kvcached does and how it integrates

At its core, kvcached is a Python package designed to work as a plugin with existing LLM runtimes. It supports SGLang (version 0.5.10 tested) and vLLM (version 0.19.0 tested), two frameworks that provide Python bindings and environments for running language models.

The architecture centers on being a caching intermediary that integrates with these runtimes rather than a standalone cache server or service. This means kvcached is embedded within the LLM runtime’s lifecycle, caching responses or intermediate computations to speed up subsequent requests or reduce redundant operations.

Supported Python versions range from 3.9 to 3.13, which covers most modern Python environments. This compatibility ensures that users working with recent Python releases can adopt kvcached without needing to downgrade or adjust their setup.

Docker plays a role in deployment, with prebuilt images that bundle kvcached with the specific LLM engine (SGLang or vLLM). This approach is practical for developers or teams who want to run the cache alongside their LLM environment without manual dependency management.

Technical strengths and tradeoffs

The standout feature of kvcached is its dual compatibility with both SGLang and vLLM. Each backend has its own Docker image, indicating that the project maintains tailored builds and integration paths for these different runtimes. This flexibility is valuable in the AI tooling landscape where no single LLM runtime dominates.

The plugin approach means kvcached can be seamlessly introduced into existing workflows that use SGLang or vLLM, enhancing them without requiring major architectural changes. This design favors developer experience (DX) and incremental adoption.

However, this integration focus is also a limitation. kvcached is not designed as a universal cache solution for any Python project or LLM runtime; it specifically targets these two ecosystems. That narrows its applicability but also allows it to be optimized for these environments.

The project also provides a developer-oriented Docker image that bundles the cache with tools for debugging or development, which shows attention to the full lifecycle of software usage.

Quick start with kvcached

The README offers succinct installation instructions that work for both PyPI and Docker users, making it straightforward to get started.

Prerequisites

Python 3.9 to 3.13
Either SGLang v0.5.10 or vLLM v0.19.0 installed

Install from PyPI

pip install kvcached --no-build-isolation

Using Docker

The project provides Docker images for both backends:

docker pull ghcr.io/ovg-project/kvcached-sglang:latest   # kvcached-v0.1.5-sglang-v0.5.10
docker pull ghcr.io/ovg-project/kvcached-vllm:latest     # kvcached-v0.1.5-vllm-v0.19.0

For development purposes, there’s an all-in-one Docker image:

docker pull ghcr.io/ovg-project/kvcached-dev:latest

These commands let you quickly deploy the cache plugin with the appropriate LLM runtime, either for production or for testing and development.

Verdict

kvcached is a practical, focused cache plugin that fits neatly into the workflows of developers using either SGLang or vLLM Python environments. Its main strength is offering caching optimizations tailored to these runtimes, backed by clear installation paths including PyPI and Docker images.

That said, its scope is deliberately narrow; it is not a general-purpose caching solution, nor does it support other LLM runtimes or frameworks out of the box. If your work centers on SGLang or vLLM, kvcached can boost performance with minimal fuss.

For teams or projects outside these specific ecosystems, the project’s value diminishes, and other caching or LLM tooling might be more suitable. Overall, kvcached is worth exploring if you are invested in these LLM runtimes and want to add an efficient cache layer without rearchitecting your stack.

docker_practice: a comprehensive open-source Docker learning book with containerized local reading — docker_practice offers a systematic Docker learning book with basics, advanced topics, and practical tooling. It uses Do
Ollama: a unified CLI and API platform for local large language models — Ollama simplifies running and managing open-source large language models locally with a unified CLI and REST API, suppor
Gogs: a lightweight, cross-platform self-hosted Git service in Go — Gogs is a self-hosted Git service built in Go, notable for its low resource footprint and cross-platform support, runnin
Inside the golang/go repository: The source of Go’s simplicity and efficiency — Explore the golang/go repo, the official source for the Go language, its architecture, design tradeoffs, and how to get

→ GitHub Repo: ovg-project/kvcached ⭐ 903 · Python

Noureddine RAMDI / kvcached: a plugin cache for SGLang and vLLM Python environments