OpenUtter tackles a niche but practical problem: automating Google Meet participation to capture live captions and generate transcripts in real time. It does this by running a headless Chromium browser controlled via Playwright, integrated as a skill within the OpenClaw AI agent framework. This combination creates a ‘ghost participant’ that can join meetings either as a guest or authenticated Google user, listen to live captions through DOM observation, deduplicate the text, and save timestamped transcripts to disk. It also supports on-demand screenshots and messaging integration for status updates.
what openutter does and its architecture
OpenUtter is a TypeScript project designed as an installable skill for OpenClaw, an AI agent framework. It uses Playwright to launch and control a Chromium browser instance in either headless or headed mode. The bot joins Google Meet meetings as a guest (handling lobby waiting with retry logic) or can authenticate with Google credentials for direct join. Once in the meeting, it enables live captions and observes the DOM elements where captions appear.
The core functionality revolves around capturing these captions in real time, deduplicating repeated or overlapping text (common in live captions), and producing a clean, timestamped transcript file saved on disk. This is particularly useful for meeting documentation, accessibility, or AI analysis downstream.
OpenUtter also supports taking screenshots on demand, which can be useful for visual context or debugging. Integration with OpenClaw’s messaging channels means it can send status updates, making it fit naturally into broader AI workflows.
Under the hood, the project handles dependencies intelligently: it auto-installs Chromium via Playwright and attempts to install necessary Linux runtime dependencies when run as root. This automation reduces setup friction.
technical strengths and design tradeoffs
One standout aspect is the use of Playwright’s headless browser automation to interact with Google Meet’s web UI. This approach avoids official APIs (which might be limited or non-existent for certain features) and instead observes the DOM directly for captions. The technical challenge here lies in the fragility of DOM selectors and UI changes by Google, which can break the bot. However, this tradeoff is acceptable for a skill meant to be updated and maintained alongside OpenClaw.
The transcript deduplication mechanism is a key technical detail: live captions often repeat or partially overlap, so the skill tracks text changes with timestamps to produce a clean, non-redundant output. This improves transcript quality significantly over naive text dumps.
The retry logic for guest lobby handling is another practical detail. Google Meet guests often wait in a lobby and need to be admitted. Automating this reduces manual intervention and makes the bot more robust in real meeting environments.
The dual support for guest and authenticated joins offers flexibility: guest mode is easier to set up but may require lobby admittance, while authenticated mode needs Google login but can bypass the lobby. This is a considerate design accommodating different user needs.
While the codebase is TypeScript and uses Playwright, it also provides headed debug modes to visually inspect browser behavior during development or troubleshooting. This improves developer experience.
The tradeoffs come down to maintenance overhead due to UI changes and the complexity of browser automation dependencies. Also, headless browsers consume more resources than API-based approaches, which may matter in large-scale deployments.
quick start with openutter
Installation is straightforward, leveraging the Node.js ecosystem:
npx openutter
This installs the OpenUtter skill into the default OpenClaw skills directory ~/.openclaw/skills/openutter. You can specify a custom directory with --target-dir if needed:
npx openutter --target-dir ./skills/openutter
During installation, OpenUtter attempts to install Chromium via Playwright and verifies it can launch correctly. On Linux, it also tries to auto-install necessary runtime dependencies when run as root. If automatic installation fails, it prints the exact commands to run manually.
If you prefer to install Chromium manually, you can run:
npx playwright-core install chromium
Once installed, OpenUtter can join meetings as a guest by default. Authenticated joins require additional setup with Google credentials.
verdict
OpenUtter is a practical and well-scoped project that fills a specific gap: capturing Google Meet captions in real time through browser automation integrated with an AI agent framework. Its design balances flexibility (guest vs authenticated modes), robustness (retry and lobby handling), and developer experience (debug modes, automated dependency installation).
The main limitation is the inherent brittleness of browser-based automation against UI changes and resource overhead of running Chromium instances. This means it requires maintenance and may not scale easily to very large numbers of concurrent meetings.
For developers working within the OpenClaw ecosystem or those looking to extend AI agent capabilities into video conferencing automation, OpenUtter offers a solid foundation. Its code quality and pragmatic design reflect hands-on experience with Playwright and browser automation challenges.
If you need to capture Google Meet captions programmatically and want integration with AI workflows, OpenUtter is worth exploring with an understanding of its operational tradeoffs.
Related Articles
- OpenClaw Client: a self-hosted multi-agent AI chat interface with streaming “thinking” separation — OpenClaw Client offers a self-hosted web UI to manage OpenClaw AI agents with streaming response separation, file upload
- openclaw-mission-control: centralized monitoring for OpenClaw AI agents with a TypeScript dashboard — openclaw-mission-control is a TypeScript project offering a centralized dashboard to monitor and control OpenClaw AI age
- OpenClaw Dashboard: a lightweight Go server for AI data visualization and interaction — OpenClaw Dashboard is a Go-based self-hosted web UI for visualizing and interacting with OpenClaw AI data. It offers eas
- SkillClaw: A modular Python framework for orchestrating AI agents across OpenAI-compatible and AWS Bedrock APIs — SkillClaw is a Python framework enabling flexible AI agent orchestration across OpenAI-compatible and AWS Bedrock APIs,
- Running OpenClaw AI Agents Natively on Android with Termux: A Practical Deep Dive — OpenClaw runs natively on Android via Termux using a Node.js patch that fixes Bionic libc issues, cutting boot time to ~
→ GitHub Repo: sumansid/openutter ⭐ 188 · TypeScript