Kitaru: a durable runtime for autonomous AI agents with checkpointed execution

Kitaru tackles a real operational headache in autonomous AI agent workflows: state loss and token waste when agents crash or time out mid-execution. Most agent frameworks restart from scratch on failure, forcing you to rerun everything and burn tokens. Kitaru’s durable execution primitives solve this by checkpointing progress — letting you fix bugs, replay flows, and resume without recomputing completed steps.

What kitaru is and how it works

Kitaru is a self-hosted, framework-agnostic runtime layer designed to sit between your autonomous AI agent harnesses and your platform governance. It doesn’t impose a specific agent framework or model; instead, it wraps existing SDKs teams are already using, like LangGraph, PydanticAI, or Claude SDK. This means you retain full control over your agents and models but gain robust execution management.

Under the hood, Kitaru provides durable execution primitives essential for production-grade AI orchestration:

Checkpointing: save intermediate states of your agent’s execution.
Replay and resume: restart flows from the last checkpoint instead of from zero.
Versioned deployments: support multiple flow versions and upgrades without losing history.
Isolated step execution: run steps independently for safer error handling and debugging.
Durable memory: persist agent memory across runs for continuity.

The core API is Python-first and minimalistic, using just two decorators: @flow to define a workflow and @checkpoint to mark durable steps. This avoids complex graph DSLs and keeps developer experience straightforward.

Kitaru is deployable on local machines, Kubernetes clusters, or cloud platforms like AWS, GCP, and Azure. Artifacts are stored in user-owned object storage, ensuring data sovereignty.

What sets kitaru apart technically

The defining feature of Kitaru is its focus on durable execution to fix the “agent state loss” problem. When an AI agent pod crashes or the process times out mid-flow, most agent frameworks restart from scratch, replaying the entire workflow and burning tokens unnecessarily.

Kitaru’s checkpoint-based design means each step can persist its output. Upon failure, the runtime can resume execution from the last checkpoint, dramatically reducing wasted compute and API calls. This is a practical win for production systems where reliability and cost-efficiency matter.

The tradeoff is that Kitaru doesn’t provide a built-in agent or model harness. Users must integrate their existing agents and SDKs. This increases flexibility but requires more setup and integration effort upfront.

The codebase is clean and pragmatic, focusing on minimal dependencies and a simple API surface. By relying on decorators rather than DSLs or complex orchestration definitions, it lowers the barrier to adoption for Python developers familiar with function decorators.

One limitation is that Kitaru’s durability features mainly shine in workflows with discrete, checkpointable steps. Highly dynamic or unstructured agent flows may require adaptation to fit this model.

Quick start with kitaru

Kitaru’s installation and setup is straightforward. The Python package is available on PyPI and can be installed with pip or uv (a faster installer recommended by the maintainers).

pip install kitaru

uv pip install kitaru

If you want to wrap a PydanticAI agent, install the adapter extra:

uv pip install "kitaru[pydantic-ai]"

For a local server with dashboard and REST API, install the local extra and login:

uv pip install "kitaru[local]"
kitaru login
kitaru status

To connect to a remote Kitaru server, use:

kitaru login https://my-server.example.com
kitaru status

Initialize your project with:

kitaru init

Writing your first flow involves decorating Python functions with @flow and @checkpoint to mark durable steps. This API simplicity makes it easy to integrate Kitaru into existing projects without heavy refactoring.

Verdict

Kitaru is a solid choice if you’re building autonomous AI agents and need reliable execution with state preservation. Its checkpointing and replay capabilities address a key pain point ignored by many frameworks: avoiding costly restarts and token waste on failure.

The framework-agnostic approach means you’re not locked into a specific agent SDK or model, allowing flexibility for teams with established tooling. However, this also means you must handle integration and orchestration details yourself.

For teams running production AI workflows where cost, reliability, and state durability matter, Kitaru offers a pragmatic runtime layer worth exploring. If you prefer an all-in-one agent framework with built-in models and routing, this might not be the best fit.

Overall, Kitaru’s design is clean, practical, and focused on solving a real-world operational problem. Its simplicity and deployment flexibility make it a tool to consider for durable AI agent orchestration.

Agno: Building production-ready agentic software with minimal code — Agno provides a minimal, production-ready Python framework for scalable agentic software with per-user isolation and nat
CrewAI: A lean Python framework for orchestrating autonomous AI agents with precise control — CrewAI is a Python framework for autonomous AI agents emphasizing speed, flexibility, and precise control through ‘Crews
AgentGPT: building autonomous AI agents with a full-stack web platform — AgentGPT offers a full-stack solution to deploy autonomous AI agents in the browser using Next.js, FastAPI, and Langchai
AutoGPT: A modular platform for continuous AI agents and workflow automation — AutoGPT is a Python-based platform for building and managing continuous AI agents that automate workflows, featuring a m

→ GitHub Repo: zenml-io/kitaru ⭐ 135 · Python

Noureddine RAMDI / Kitaru: a durable runtime for autonomous AI agents with checkpointed execution

What kitaru is and how it works

What sets kitaru apart technically

Quick start with kitaru

Verdict

Related Articles