Be More Agent tackles a common pain point in local voice AI: running a fully offline conversational agent on Raspberry Pi while handling the hardware quirks that often break audio pipelines. Its standout feature is hardware-aware audio resampling logic that auto-detects microphone sample rates and prevents ALSA errors—a practical edge case many local voice agent tutorials ignore.
What Be More Agent does and how it is built
Be More Agent is an offline-first conversational AI agent framework designed specifically for Raspberry Pi 4 and 5 hardware. It integrates several open source components to enable voice interaction without relying on cloud services, making it suitable for hobbyist projects or privacy-sensitive applications.
At its core, the project uses OpenWakeWord for wake word detection, enabling the system to listen for a custom wake phrase. For speech-to-text, it leverages Whisper.cpp, a local port of OpenAI’s Whisper model optimized for CPU inference. Ollama is used as the large language model backend, running local LLM inference with models like gemma:2b or moondream. Voice synthesis is handled by Piper TTS, producing spoken responses on-device.
The architecture follows a state-machine pattern that cycles through agent states: Idle, Listening, Thinking, and Speaking. These states drive both the logic of the agent and the reactive GUI face animations, which are customizable through a modular asset system using PNG sequences and WAV sound files. This design provides an engaging user experience tied directly to the agent’s internal states.
The repo is structured around a single main script agent.py and includes a setup.sh installer script to automate dependency installation, hardware configuration, and environment setup. It targets Raspberry Pi OS running on Pi 4 (minimum 4GB RAM) or Pi 5, with peripherals including a USB microphone, speaker, LCD screen, and optionally the Raspberry Pi camera module.
Additional features include integration with DuckDuckGo search as a fallback for real-time web queries, and hardware-aware audio resampling to address sample rate mismatches common with ALSA on Raspberry Pi. This resampling logic detects the microphone’s sample rate and adjusts audio accordingly, preventing runtime errors and improving reliability.
What sets Be More Agent apart: hardware-aware audio handling and modular state machine
The most distinctive technical aspect of Be More Agent is its hardware-aware audio resampling. Raspberry Pi’s ALSA audio system often struggles when the microphone sample rate does not match expected values, causing errors and unstable audio streams. Many local voice agent projects gloss over this or require manual configuration.
Be More Agent’s approach is to detect the actual sample rate of the microphone hardware and apply resampling automatically in the audio pipeline. This prevents ALSA errors and ensures smooth, continuous audio capture regardless of the microphone used. It’s a practical solution born from real-world constraints and testing on Pi hardware.
The agent’s logic is implemented as a state machine managing distinct states: Idle (waiting for wake word), Listening (capturing speech), Thinking (processing LLM inference), and Speaking (playing TTS output). This separation makes the flow explicit and easier to maintain, and ties directly into the reactive GUI animations that provide immediate visual feedback.
The modular asset system for GUI and sounds allows users to replace character animations and sounds easily. This adds extensibility and personalization, which is rare in open source voice agents.
The code is surprisingly clean for a hobbyist-focused project with hardware dependencies. The single main script design keeps the entry point simple while the setup script automates complex environment setup steps. However, the tradeoff is that the entire agent logic lives in one file, which could challenge maintainability if the project grows.
Local LLM inference with Ollama means the agent can operate fully offline but is limited by the capacity of the chosen models (gemma:2b is relatively small). The tradeoff here is between privacy and latency versus the quality and complexity of responses. For many hobbyists and privacy-conscious users, this is acceptable.
Quick start
🛠️ Hardware requirements
- Raspberry Pi 5 (recommended) or Pi 4 (4GB RAM minimum)
- USB Microphone & Speaker
- LCD Screen (DSI or HDMI)
- Raspberry Pi Camera Module
🚀 Installation
1. Prerequisites
Ensure your Raspberry Pi OS is up to date.
sudo apt update && sudo apt upgrade -y
sudo apt install git -y
2. Install Ollama
This agent relies on Ollama to run the brain.
curl -fsSL https://ollama.com/install.sh| sh
Pull the required models:
ollama pull gemma:2b
ollama pull moondream
3. Clone & setup
git clone https://github.com/brenpoly/be-more-agent.git
cd be-more-agent
chmod +x setup.sh
./setup.sh
The setup script will install system libraries, create necessary folders, download Piper TTS, and set up the Python virtual environment.
4. Configure the wake word
The setup script downloads a default wake word (“Hey Jarvis”). To use your own:
- Train a model at OpenWakeWord.
- Place the
.onnxfile in the root folder. - Rename it to
wakeword.onnx.
5. Run the agent
source venv/bin/activate
python agent.py
verdict
Be More Agent is a practical, well-structured project for hobbyists or privacy-conscious developers wanting to run a fully offline conversational AI agent on Raspberry Pi hardware. Its hardware-aware audio resampling addresses a real pain point with ALSA on Pi, improving stability and user experience.
The tradeoff is the limited power of local LLMs compared to cloud offerings and the single-file codebase that might not scale well for complex extensions. Still, the modular asset system and reactive UI animations provide a nice touch for user engagement.
If you have a Pi 4 or 5, a USB microphone, and some patience for setup, this repo offers an accessible way to explore local voice AI without cloud dependencies. It’s worth understanding for anyone interested in embedded AI agents or offline voice interaction at the edge.
The project is a solid base to build on, especially if you want to add custom wake words, swap models, or improve the GUI, but expect to handle some Raspberry Pi-specific quirks along the way.
Related Articles
- Inside CowAgent: An extensible autonomous AI assistant with multi-modal and multi-model architecture — CowAgent is an extensible AI assistant framework with autonomous task planning, long-term memory, and multi-modal suppor
- Hermes Agent: A self-improving AI agent with closed learning loops and multi-platform integration — Hermes Agent is a Python AI agent featuring closed learning loops, autonomous skill creation, multi-model support, and s
→ GitHub Repo: brenpoly/be-more-agent ⭐ 776 · Python