Noureddine RAMDI / esp-claw: running a full AI agent loop on ESP32 edge devices

Created Tue, 05 May 2026 13:37:39 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

espressif/esp-claw

Running a full AI agent loop on a $3 ESP32 chip might sound far-fetched, but esp-claw pulls it off by combining edge AI, dynamic scripting, and standardized device communication. It transforms passive IoT devices into active decision-making agents, all while keeping memory and processing local to preserve privacy and reduce latency. This framework offers a glimpse at what AI on constrained hardware can look like beyond cloud-dependent setups.

what esp-claw does and its architecture

esp-claw is an edge AI agent framework developed by Espressif for ESP32-series chips, notably the ESP32-S3. Written in C, it implements a complete agent loop on-device, including sensing inputs, making decisions based on natural language conversation, and executing actions. This is done without offloading critical logic to the cloud, a notable departure from typical AI integrations.

At its core, esp-claw enables Chat Coding — users define device behavior through conversational interaction rather than traditional programming. The framework supports dynamic Lua scripts that can be loaded at runtime, allowing customizable behaviors tailored to specific use cases.

The architecture revolves around several key components:

  • Edge AI agent runtime: Runs the full agent loop locally with millisecond-level response times.
  • Dynamic Lua scripting: Allows user-defined behaviors and extensions without recompiling firmware.
  • Structured local memory: Maintains privacy by keeping context and state on-device.
  • MCP protocol support: Implements the Model Context Protocol for standardized device communication and interoperability.
  • IM platform integration: Supports Telegram, QQ, Feishu, and WeChat for device control via instant messaging.
  • LLM API integration: Works with OpenAI-style APIs as well as Anthropic-style APIs, supporting models like GPT, Claude, Qwen, and DeepSeek.

This stack is optimized for constrained hardware, balancing flexibility, privacy, and responsiveness. The repo also includes support for multiple ESP32-S3-based development boards, with online flashing and configuration via browser to simplify onboarding.

technical design strengths and tradeoffs

The most striking aspect of esp-claw is how it fits an AI agent runtime, natural language interaction, and dynamic scripting into the limited resources of the ESP32 platform. The codebase is written in C with an emphasis on efficiency and minimal dependencies.

Lua scripting is a smart choice here: it’s lightweight, embeddable, and enables users to extend device behavior dynamically. Scripts can be updated without rebuilding firmware, making development and iteration faster. This design favors flexibility at the edge but trades off some performance compared to native code.

Structured local memory management is crucial for privacy and context retention. esp-claw avoids cloud dependency by keeping conversation history and decision context on-device. This is a double-edged sword: it enhances privacy and reduces latency but requires careful memory management and limits the complexity of context that can be stored.

The MCP protocol integration standardizes communication, enabling interoperability with other MCP-compliant devices and services. This choice positions esp-claw well for future extensibility and ecosystem participation. However, MCP is still emerging, so adoption and tooling might be limited.

Supporting multiple instant messaging platforms broadens control options, but maintaining these integrations across different APIs can be cumbersome and brittle given platform changes.

The framework supports both OpenAI-style and Anthropic-style LLM APIs and recommends models with strong instruction-following and tool-use capabilities, such as gpt-5.4 and claude4.6-sonnet. This enables self-programming agents that can generate and adapt Lua code dynamically. The tradeoff is a dependency on external LLM services for advanced capabilities, which can impact offline usability and introduce latency.

Despite these constraints, esp-claw boasts response times as fast as milliseconds, which is impressive for an AI agent running on a microcontroller.

quick start

ESP-Claw already supports multiple ESP32-S3-based development boards, including breadboards, M5Stack CoreS3, and more. Supported boards in ./application/edge_agent/boards/ can be flashed online directly: configuration and flashing are done entirely in the browser, with no need to compile firmware locally or install a development environment first.

You can also build ESP-Claw locally. Please refer to the local build documentation for board adaptation, building, and flashing. Boards not listed above, as well as chips like the ESP32-P4, can also be supported through local builds and flashing.

You can find practical examples in our documentation.

Supported Platforms

LLM: ESP-Claw now supports both OpenAI-style APIs and Anthropic-style APIs. It natively supports GPT models from OpenAI, Qwen models from Alibaba Cloud Bailian, Claude models from Anthropic, DeepSeek models from DeepSeek API, and also supports custom endpoints.

ESP-Claw’s self-programming capability depends on models with strong tool use and instruction-following ability. We recommend gpt-5.4, qwen3.6-plus, claude4.6-sonnet, deepseek-v4-pro or models with comparable capability.

IM: ESP-Claw supports Telegram, QQ, Feishu, and WeChat, and can be extended further.

verdict

esp-claw is a compelling example of how far edge AI has come, packing a full conversational AI agent loop into tiny, low-cost ESP32 chips. It’s particularly relevant for developers building privacy-conscious IoT devices that need local decision-making and natural language interaction without cloud dependency.

That said, it’s not a silver bullet. Its reliance on external LLM APIs for advanced self-programming means it still depends on cloud services for some capabilities. The hardware constraints impose limits on context complexity and script performance.

If you’re building embedded AI devices and want a flexible, chat-driven framework with standardized communication and dynamic scripting, esp-claw is worth exploring. Just be ready for the tradeoffs and the need to integrate with external LLM providers.

The code is surprisingly clean and efficient for the scope, and the browser-based flashing experience lowers the barrier to entry.

This repo showcases what edge AI agents might look like in the near future — small, local, conversational, and interoperable with broader device ecosystems.


→ GitHub Repo: espressif/esp-claw ⭐ 915 · C