Noureddine RAMDI / mcp-selenium: structured browser automation for AI agents via MCP

Created Sat, 23 May 2026 20:41:14 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

angiejones/mcp-selenium

mcp-selenium offers a pragmatic bridge between AI agents powered by language models and real browser automation. Instead of having agents generate brittle Selenium scripts, this project exposes Selenium WebDriver as a set of well-defined, typed tools and resources via the Model Context Protocol (MCP). This approach lets AI agents invoke structured commands like start_browser or take_screenshot with precise semantics rather than raw script text, improving reliability and developer experience.

What mcp-selenium does and how it works

At its core, mcp-selenium is a Node.js MCP server that manages a single WebDriver browser session and exposes Selenium’s capabilities as MCP tools. The MCP interface includes actionable tools such as start_browser, navigate, interact, execute_script, and take_screenshot, each with clearly defined parameters and sensible defaults.

The server supports multiple browsers out of the box—Chrome, Firefox, Edge, and Safari—making it versatile across common environments. It also enables WebDriver BiDi (bidirectional) protocol automatically, which is crucial for advanced diagnostics like capturing console logs, JavaScript errors, and network events during a session.

Two MCP resources provide read-only views into the browser context: one reports the current browser session status, and the other provides a snapshot of the page’s accessibility tree. This information helps AI agents understand page structure and state without brittle scraping or guesswork.

Under the hood, the architecture is straightforward: a singleton WebDriver instance handles all commands. This design simplifies session management but also means concurrent interactions are serialized through the single session.

The stack is pure JavaScript/Node.js, relying on Selenium WebDriver bindings. The MCP protocol acts as the communication layer between AI clients (Claude Code, Goose, Cursor, Windsurf, etc.) and the browser automation server.

Technical strengths and design tradeoffs

One of the key strengths of mcp-selenium is its structured, typed interface for browser automation. Traditional Selenium usage often involves generating and maintaining brittle scripts that can break with minor UI changes. Here, AI agents operate at a higher abstraction level by invoking specific tools with typed parameters, which reduces error surface and clarifies intent.

The automatic enabling of WebDriver BiDi sets mcp-selenium apart. It provides rich diagnostic data streams — console logs, JS errors, network events — which are invaluable for debugging complex automated interactions or giving agents richer context about page state.

Supporting multiple browsers with a consistent MCP interface is also a practical plus. Agents don’t have to deal with browser-specific quirks directly; the MCP server abstracts those details away.

However, the singleton WebDriver session model is a tradeoff. While it simplifies design and resource usage, it limits concurrency and might not scale well for use cases requiring multiple parallel browser sessions. Also, the MCP resources for session status and accessibility are read-only snapshots, which constrain real-time interactivity and might require polling or additional logic on the client side.

The codebase is surprisingly clean and focused. The core logic is contained in a few files centered around the MCP server and WebDriver command translation. The lack of extraneous dependencies keeps the footprint minimal.

Quick start

mcp-selenium offers multiple ways to integrate with popular MCP clients. Installation is mostly a one-liner via npx. Here are example commands from the official README:

# Using Goose CLI
goose session --with-extension "npx -y @angiejones/mcp-selenium@latest"

# Using Claude Code
claude mcp add selenium -- npx -y @angiejones/mcp-selenium@latest

For clients like Cursor or Windsurf, configure the MCP server with JSON:

{
  "mcpServers": {
    "selenium": {
      "command": "npx",
      "args": ["-y", "@angiejones/mcp-selenium@latest"]
    }
  }
}

You can also clone and install manually:

git clone https://github.com/angiejones/mcp-selenium.git
cd mcp-selenium
npm install

Or install globally:

npm install -g @angiejones/mcp-selenium
mcp-selenium

These options cover most typical developer workflows, from quick experimentation to full local development.

Verdict

mcp-selenium is a practical, focused project that addresses a real pain point: how to get AI agents to interact reliably with web browsers without brittle script generation. Its structured MCP tool interface and multi-browser support make it a solid choice for developers building AI agents that need dependable browser automation.

That said, its singleton session model restricts concurrency and might be a bottleneck for high-scale or parallel scenarios. The read-only resources provide useful context but have limited interactivity, so more complex workflows may need additional client-side logic.

Overall, if you’re building AI agents that require robust browser control and want a cleaner, typed interface to Selenium through MCP, this repo is worth exploring. The code is approachable, the architecture sensible, and the diagnostic features like WebDriver BiDi a meaningful addition to the tooling.

For teams focused on multi-agent concurrency or requiring richer session management, expect to extend or adapt the core architecture accordingly. But as a foundation, mcp-selenium offers a well-executed bridge between AI-driven reasoning and browser automation capabilities.


→ GitHub Repo: angiejones/mcp-selenium ⭐ 416 · JavaScript