Noureddine RAMDI / OpenUtter: a headless Google Meet bot skill for the OpenClaw AI agent framework

Created Sat, 23 May 2026 20:41:14 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

sumansid/openutter

OpenUtter tackles a niche but practical problem: automating Google Meet participation to capture live captions and generate transcripts in real time. It does this by running a headless Chromium browser controlled via Playwright, integrated as a skill within the OpenClaw AI agent framework. This combination creates a ‘ghost participant’ that can join meetings either as a guest or authenticated Google user, listen to live captions through DOM observation, deduplicate the text, and save timestamped transcripts to disk. It also supports on-demand screenshots and messaging integration for status updates.

what openutter does and its architecture

OpenUtter is a TypeScript project designed as an installable skill for OpenClaw, an AI agent framework. It uses Playwright to launch and control a Chromium browser instance in either headless or headed mode. The bot joins Google Meet meetings as a guest (handling lobby waiting with retry logic) or can authenticate with Google credentials for direct join. Once in the meeting, it enables live captions and observes the DOM elements where captions appear.

The core functionality revolves around capturing these captions in real time, deduplicating repeated or overlapping text (common in live captions), and producing a clean, timestamped transcript file saved on disk. This is particularly useful for meeting documentation, accessibility, or AI analysis downstream.

OpenUtter also supports taking screenshots on demand, which can be useful for visual context or debugging. Integration with OpenClaw’s messaging channels means it can send status updates, making it fit naturally into broader AI workflows.

Under the hood, the project handles dependencies intelligently: it auto-installs Chromium via Playwright and attempts to install necessary Linux runtime dependencies when run as root. This automation reduces setup friction.

technical strengths and design tradeoffs

One standout aspect is the use of Playwright’s headless browser automation to interact with Google Meet’s web UI. This approach avoids official APIs (which might be limited or non-existent for certain features) and instead observes the DOM directly for captions. The technical challenge here lies in the fragility of DOM selectors and UI changes by Google, which can break the bot. However, this tradeoff is acceptable for a skill meant to be updated and maintained alongside OpenClaw.

The transcript deduplication mechanism is a key technical detail: live captions often repeat or partially overlap, so the skill tracks text changes with timestamps to produce a clean, non-redundant output. This improves transcript quality significantly over naive text dumps.

The retry logic for guest lobby handling is another practical detail. Google Meet guests often wait in a lobby and need to be admitted. Automating this reduces manual intervention and makes the bot more robust in real meeting environments.

The dual support for guest and authenticated joins offers flexibility: guest mode is easier to set up but may require lobby admittance, while authenticated mode needs Google login but can bypass the lobby. This is a considerate design accommodating different user needs.

While the codebase is TypeScript and uses Playwright, it also provides headed debug modes to visually inspect browser behavior during development or troubleshooting. This improves developer experience.

The tradeoffs come down to maintenance overhead due to UI changes and the complexity of browser automation dependencies. Also, headless browsers consume more resources than API-based approaches, which may matter in large-scale deployments.

quick start with openutter

Installation is straightforward, leveraging the Node.js ecosystem:

npx openutter

This installs the OpenUtter skill into the default OpenClaw skills directory ~/.openclaw/skills/openutter. You can specify a custom directory with --target-dir if needed:

npx openutter --target-dir ./skills/openutter

During installation, OpenUtter attempts to install Chromium via Playwright and verifies it can launch correctly. On Linux, it also tries to auto-install necessary runtime dependencies when run as root. If automatic installation fails, it prints the exact commands to run manually.

If you prefer to install Chromium manually, you can run:

npx playwright-core install chromium

Once installed, OpenUtter can join meetings as a guest by default. Authenticated joins require additional setup with Google credentials.

verdict

OpenUtter is a practical and well-scoped project that fills a specific gap: capturing Google Meet captions in real time through browser automation integrated with an AI agent framework. Its design balances flexibility (guest vs authenticated modes), robustness (retry and lobby handling), and developer experience (debug modes, automated dependency installation).

The main limitation is the inherent brittleness of browser-based automation against UI changes and resource overhead of running Chromium instances. This means it requires maintenance and may not scale easily to very large numbers of concurrent meetings.

For developers working within the OpenClaw ecosystem or those looking to extend AI agent capabilities into video conferencing automation, OpenUtter offers a solid foundation. Its code quality and pragmatic design reflect hands-on experience with Playwright and browser automation challenges.

If you need to capture Google Meet captions programmatically and want integration with AI workflows, OpenUtter is worth exploring with an understanding of its operational tradeoffs.


→ GitHub Repo: sumansid/openutter ⭐ 188 · TypeScript