Surf: Connecting OpenAI's Computer Use API to a Cloud Virtual Desktop with Real-Time Streaming

Surf is a practical example of bringing OpenAI’s Computer Use capabilities into a real, sandboxed Linux desktop environment hosted in the cloud. It wires natural language instructions through an AI agent into actual input events on a virtual machine, making the entire interaction visible through a real-time streamed UI. This is not just a simulation or a command-line experiment — Surf lets you see the AI operate a real desktop, complete with clicks, typing, and scrolling, all orchestrated through a minimal yet effective Next.js application.

What surf does and how it is built

Surf is a web application built with Next.js using TypeScript that acts as a bridge between OpenAI’s Computer Use API and E2B’s cloud-based virtual desktop sandbox. The core idea is to enable an AI agent to control a Linux desktop environment remotely by interpreting natural language instructions and converting them into direct desktop actions.

The backend exposes a single API endpoint /api/chat which manages the entire interaction loop. This endpoint handles incoming chat messages, forwards them to OpenAI’s Computer Use API for tool-calling, and then sends the resulting desktop commands to the E2B sandbox through the @e2b/desktop SDK. The SDK abstracts the communication with the virtual machine, enabling actions such as mouse clicks, keyboard input, and scrolling.

On the frontend, Surf presents a chat interface alongside a live video stream of the virtual desktop. This live view is powered by Server-Sent Events (SSE), which streams each AI action as it happens back to the browser. This architecture makes the agent’s decision-making process fully observable, providing transparency into each step of its control loop.

The stack is fairly straightforward:

Frontend: Next.js + React for UI and real-time rendering
Backend: Next.js API routes handling chat and AI orchestration
SDK: @e2b/desktop for sandbox interaction
Streaming: Server-Sent Events for pushing desktop state and agent actions

This setup creates a compact but functional scaffold for integrating frontier computer-use models with sandboxed execution environments.

Technical strengths and design tradeoffs

One of the main technical strengths of Surf lies in its real-time streaming architecture. Using Server-Sent Events to push AI actions and desktop state back to the client avoids the complexity of full WebSocket implementations while still providing near-instant updates. This choice simplifies the backend code and reduces infrastructure overhead.

The backend’s single /api/chat endpoint encapsulates the entire AI interaction loop, which keeps the codebase focused and easier to maintain. The orchestration between OpenAI’s tool-calling model and the sandbox SDK is cleanly separated, which aids in debugging and potential extension.

However, this minimalistic approach has tradeoffs. The single API route could become a bottleneck under heavy concurrent usage, and the use of SSE, while simpler, is unidirectional and might limit interactivity compared to WebSockets. Also, reliance on the @e2b/desktop SDK ties the system to the E2B sandbox environment, which might not suit all deployment scenarios.

From a code quality perspective, the TypeScript codebase appears idiomatic and well-structured, with clear separation between frontend and backend concerns. The live streaming of desktop actions makes the AI agent’s process transparent — a valuable feature for debugging and user trust.

Quick start

Getting Surf up and running involves a few straightforward steps, assuming you have the required API keys for E2B and OpenAI.

# Clone the repository
git clone https://github.com/e2b-dev/surf
cd surf

# Install dependencies
npm install

# Set up environment variables
# Create a .env.local file with your API keys based on .env.example

# Start the development server
npm run dev

After starting the server, open your browser at http://localhost:3000 to interact with the AI-controlled desktop sandbox.

This quick setup makes it easy to experiment with the system locally, provided you have your API credentials ready.

Verdict

Surf is a practical and instructive reference implementation for connecting OpenAI’s Computer Use API to a real virtual desktop sandbox. Its small codebase and clear architecture make it a good starting point for developers interested in AI-driven remote desktop automation.

It’s particularly relevant for those exploring AI agents that operate in real, sandboxed environments rather than simulated or purely virtualized contexts. The real-time streaming of agent actions via SSE is a neat solution for observability, though it could face scaling challenges in production.

Limitations include its dependency on E2B’s cloud sandbox and the single API endpoint design, which might require re-architecting for large-scale or diverse environments. Yet, as a minimal viable scaffolding, Surf succeeds in showing how to bridge frontier AI computer-use capabilities with sandboxed execution.

If you’re looking to experiment with AI agents controlling Linux desktops in the cloud, Surf offers a clear, hands-on codebase to learn from and build upon.

Open Computer Use: orchestrating multi-agent AI for real computer control with containerized VMs — Open Computer Use enables AI agents to control real computers using specialized Browser, Terminal, and Desktop agents ru
Open Cowork: Desktop AI Agent with VM-level Sandbox Isolation for Safer AI Workflows — Open Cowork wraps multiple LLMs in a cross-platform desktop app with unique VM-level sandboxing using WSL2 and Lima for
OpenClaw Client: a self-hosted multi-agent AI chat interface with streaming “thinking” separation — OpenClaw Client offers a self-hosted web UI to manage OpenClaw AI agents with streaming response separation, file upload
Mapping the OpenClaw AI agent ecosystem: a curated catalog of skills, dashboards, and integrations — OpenClaw offers a comprehensive AI agent platform with a rich ecosystem of skills, dashboards, memory plugins, and multi
QA-Use: AI-powered natural language E2E testing platform with autonomous browser agents — QA-Use enables natural-language E2E tests using AI agents that autonomously interact with web apps. Built with TypeScrip

→ GitHub Repo: e2b-dev/surf ⭐ 790 · TypeScript

Noureddine RAMDI / Surf: Connecting OpenAI's Computer Use API to a Cloud Virtual Desktop with Real-Time Streaming

What surf does and how it is built

Technical strengths and design tradeoffs

Quick start

Verdict

Related Articles