Noureddine RAMDI / Inside html-video: a flexible HTML-to-video rendering meta-layer with AI coding agents

Created Sat, 06 Jun 2026 19:20:10 +0000 Modified Sat, 06 Jun 2026 19:20:19 +0000

nexu-io/html-video

html-video tackles the challenge of turning HTML content into fully rendered MP4 videos on your local machine, leveraging AI coding agents to automate video description and assembly. Instead of locking you into a single rendering engine or cloud service, it acts as a meta-layer that orchestrates multiple rendering engines and coding agents, providing a flexible, extensible platform for video creation.

What html-video does and how it works

At its core, html-video is an open-source framework that lets you describe a video or simply paste an article or GitHub link, then hands off that input to a locally running coding agent which generates a video rendered entirely on your machine. It abstracts over multiple HTML-to-video rendering engines, supporting auto-detection of 13 different coding agents, including Claude Code, Cursor, Copilot CLI, Gemini CLI, and others.

The architecture centers around a content-graph storyboard intermediate representation (IR) that models video frames as nodes and edges in a directed graph. This IR is topologically sorted to drive multi-frame video assembly, enabling complex video narratives composed of multiple scenes or frames.

Rendering itself is handled by a pluggable adapter architecture. The default engine named Hyperframes uses headless Chromium controlled programmatically, combined with ffmpeg (libx264) to produce MP4 output without any per-render fees. This choice balances compatibility and quality, as Chromium enables full HTML+CSS rendering fidelity, and ffmpeg ensures robust video encoding.

The adapters are designed to allow other engines like Remotion, Motion Canvas, or Manim to plug in later without disrupting the agent loop or storyboard logic. This decouples video content generation from rendering specifics, providing a clean separation of concerns.

The repo comes with 21 curated HTML/CSS templates licensed for reuse, which serve as starting points or examples for video composition. It also includes a local browser-based studio application accessible at http://127.0.0.1:3071, where you can interactively pick templates, chat with your coding agent, edit text on individual frames, add soundtracks, and export final MP4 files.

The pluggable rendering engine adapter and content-graph storyboard IR

The standout technical design is the single render(input, ctx) contract that all rendering engines implement. This contract shields the agent loop from the details of how video frames are composed and encoded.

Under the hood, the content-graph storyboard IR is a graph structure where nodes represent frames or scenes and edges represent transitions or dependencies. This model allows the system to handle complex, multi-frame videos with deterministic ordering and composability.

Each rendering adapter consumes this IR and maps it to the specific rendering engine’s API or capabilities. For example, the Hyperframes engine renders each storyboard frame by loading the corresponding HTML/CSS in headless Chromium, capturing frames, and then feeding them to ffmpeg for video encoding.

This pluggable pattern enables adding new rendering engines without rewriting the core logic that manages agent interactions or storyboard assembly. It also means the ecosystem can evolve with new engines as they become available.

On the coding agent side, the system auto-detects 13 different agents installed locally and routes video description prompts accordingly. This multi-agent detection and routing is crucial for local-first workflows, avoiding cloud dependencies and API key management.

The repo also supports an optional AI-generated soundtrack feature via MiniMax, adding audio to videos generated from the storyboard.

Quick start

pnpm install
pnpm -r build
node packages/cli/dist/bin.js studio    # opens the studio at http://127.0.0.1:3071

Once the studio is running in your browser, you can pick one of the 21 templates or describe a video in natural language or paste a link. The coding agent generates the storyboard frames, which you can edit per frame, add a soundtrack, and export to MP4.

The CLI also includes utilities like:

node packages/cli/dist/bin.js doctor                 # detect installed agents + engines
node packages/cli/dist/bin.js search-templates --intent "github stars race" --top 3

These commands help you inspect your environment and find templates suited to your video concept.

verdict

html-video offers a thoughtful architecture combining AI coding agents, a flexible content-graph IR, and a pluggable rendering engine adapter system that runs fully locally. For developers who want to experiment with AI-driven video generation without relying on cloud services, it presents a compelling option.

The tradeoff is that the system currently requires a fair amount of local resources—headless Chromium and ffmpeg can be heavy—and the rendering performance depends on your machine. Also, while it detects many agents, the quality of the generated video depends on the capabilities of the installed coding agent.

Its modularity and clean separation between agent logic, storyboard representation, and rendering engine make it worth exploring if you want to build or customize video workflows powered by AI. However, if you need large-scale distributed rendering or cloud-based scalability, this local-first approach may feel limiting.

In sum, html-video is a solid foundation for developers interested in AI-assisted video production workflows using familiar web technologies, with a clear, extensible architecture and practical tooling to get started quickly.


→ GitHub Repo: nexu-io/html-video ⭐ 1,286 · HTML