Noureddine RAMDI / Surf: Connecting OpenAI's Computer Use API to a Cloud Virtual Desktop with Real-Time Streaming

Created Sat, 23 May 2026 20:41:14 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

e2b-dev/surf

Surf is a practical example of bringing OpenAI’s Computer Use capabilities into a real, sandboxed Linux desktop environment hosted in the cloud. It wires natural language instructions through an AI agent into actual input events on a virtual machine, making the entire interaction visible through a real-time streamed UI. This is not just a simulation or a command-line experiment — Surf lets you see the AI operate a real desktop, complete with clicks, typing, and scrolling, all orchestrated through a minimal yet effective Next.js application.

What surf does and how it is built

Surf is a web application built with Next.js using TypeScript that acts as a bridge between OpenAI’s Computer Use API and E2B’s cloud-based virtual desktop sandbox. The core idea is to enable an AI agent to control a Linux desktop environment remotely by interpreting natural language instructions and converting them into direct desktop actions.

The backend exposes a single API endpoint /api/chat which manages the entire interaction loop. This endpoint handles incoming chat messages, forwards them to OpenAI’s Computer Use API for tool-calling, and then sends the resulting desktop commands to the E2B sandbox through the @e2b/desktop SDK. The SDK abstracts the communication with the virtual machine, enabling actions such as mouse clicks, keyboard input, and scrolling.

On the frontend, Surf presents a chat interface alongside a live video stream of the virtual desktop. This live view is powered by Server-Sent Events (SSE), which streams each AI action as it happens back to the browser. This architecture makes the agent’s decision-making process fully observable, providing transparency into each step of its control loop.

The stack is fairly straightforward:

  • Frontend: Next.js + React for UI and real-time rendering
  • Backend: Next.js API routes handling chat and AI orchestration
  • SDK: @e2b/desktop for sandbox interaction
  • Streaming: Server-Sent Events for pushing desktop state and agent actions

This setup creates a compact but functional scaffold for integrating frontier computer-use models with sandboxed execution environments.

Technical strengths and design tradeoffs

One of the main technical strengths of Surf lies in its real-time streaming architecture. Using Server-Sent Events to push AI actions and desktop state back to the client avoids the complexity of full WebSocket implementations while still providing near-instant updates. This choice simplifies the backend code and reduces infrastructure overhead.

The backend’s single /api/chat endpoint encapsulates the entire AI interaction loop, which keeps the codebase focused and easier to maintain. The orchestration between OpenAI’s tool-calling model and the sandbox SDK is cleanly separated, which aids in debugging and potential extension.

However, this minimalistic approach has tradeoffs. The single API route could become a bottleneck under heavy concurrent usage, and the use of SSE, while simpler, is unidirectional and might limit interactivity compared to WebSockets. Also, reliance on the @e2b/desktop SDK ties the system to the E2B sandbox environment, which might not suit all deployment scenarios.

From a code quality perspective, the TypeScript codebase appears idiomatic and well-structured, with clear separation between frontend and backend concerns. The live streaming of desktop actions makes the AI agent’s process transparent — a valuable feature for debugging and user trust.

Quick start

Getting Surf up and running involves a few straightforward steps, assuming you have the required API keys for E2B and OpenAI.

# Clone the repository
git clone https://github.com/e2b-dev/surf
cd surf

# Install dependencies
npm install

# Set up environment variables
# Create a .env.local file with your API keys based on .env.example

# Start the development server
npm run dev

After starting the server, open your browser at http://localhost:3000 to interact with the AI-controlled desktop sandbox.

This quick setup makes it easy to experiment with the system locally, provided you have your API credentials ready.

Verdict

Surf is a practical and instructive reference implementation for connecting OpenAI’s Computer Use API to a real virtual desktop sandbox. Its small codebase and clear architecture make it a good starting point for developers interested in AI-driven remote desktop automation.

It’s particularly relevant for those exploring AI agents that operate in real, sandboxed environments rather than simulated or purely virtualized contexts. The real-time streaming of agent actions via SSE is a neat solution for observability, though it could face scaling challenges in production.

Limitations include its dependency on E2B’s cloud sandbox and the single API endpoint design, which might require re-architecting for large-scale or diverse environments. Yet, as a minimal viable scaffolding, Surf succeeds in showing how to bridge frontier AI computer-use capabilities with sandboxed execution.

If you’re looking to experiment with AI agents controlling Linux desktops in the cloud, Surf offers a clear, hands-on codebase to learn from and build upon.


→ GitHub Repo: e2b-dev/surf ⭐ 790 · TypeScript