AI video generation pipelines often hit a wall when it comes to maintaining visual consistency across scenes. Characters, props, and locations tend to shift appearance in subtle yet jarring ways, breaking immersion and complicating the production process. FlowKit tackles this head-on — it’s a Python system that automates end-to-end AI video generation using Google’s Flow API, with a clever architectural twist that keeps visuals consistent throughout a project.
automating AI video generation with a chrome extension and FastAPI backend
FlowKit is a standalone Python system designed to automate the entire pipeline of AI-generated videos, from scripting to YouTube-ready output. It leverages the Google Flow API to generate video content but handles the quirks of authentication and API access via a Chrome extension (Manifest V3). This extension acts as a browser bridge — it manages authentication, solves reCAPTCHAs, and proxies requests to labs.google.com.
The backend is implemented using FastAPI and SQLite, which orchestrate the pipeline and maintain state. Communication with the Chrome extension happens over WebSockets, enabling real-time command and response coordination during video generation.
The typical pipeline FlowKit executes includes:
- Story breakdown into scenes
- Generation of entity reference images (characters, locations, props) that remain consistent across scenes
- Scene composition combining these entities with action prompts
- Generation of 8-second video clips per scene
- Text-to-speech narration synchronized with clips
- Video concatenation into a final product
- Thumbnail generation
- YouTube upload with SEO metadata
This architecture separates appearance descriptions (used exclusively for reference image generation) from action prompts (used for scene composition). By generating reference images once and reusing them, FlowKit avoids the common AI video pitfall where characters or props look different from one scene to another.
solving visual consistency with a reference image system
The standout technical aspect of FlowKit is its reference image system. Most AI video pipelines generate each scene independently, often resulting in inconsistent visuals. FlowKit’s approach is to first generate stable visual references for entities — characters, locations, and props — that serve as anchors throughout the video.
This separation means the pipeline can reuse the same character or prop images across multiple scenes while only varying the action or background. This design addresses a fundamental challenge in AI video workflows: how to maintain a coherent visual identity over dozens of scenes without manual intervention.
The tradeoff here is complexity. The system must carefully orchestrate calls to the Google Flow API and the Chrome extension proxy to ensure references are generated once and reused correctly. This adds orchestration overhead and requires a robust backend to manage state, but the payoff is a more professional, consistent output.
Code quality reflects practical engineering — the FastAPI backend is modular and focuses on the orchestration layer, while the Chrome extension handles browser-specific tasks. The SQLite database keeps track of references and scene metadata, making the system self-contained.
Performance-wise, video generation times range between 2 to 5 minutes per video, with 8-second clips as the base unit. The system has demonstrated projects up to 50 scenes, including a 25-scene F-15E project, and offers a 4K upscale option.
quick start
The project provides a one-command setup script that checks and installs all prerequisites, including Python 3.10+, pip, ffmpeg, ffprobe, and Chrome. It creates a virtual environment and installs dependencies.
./setup.sh
Windows: Use WSL (
wsl --install) or Git Bash. All bash scripts and commands assume a Unix shell.
For manual setup, the README instructs:
pip install -r requirements.txt
Running the system requires additional setup for the TTS engine OmniVoice:
pip install torch==2.8.0 torchaudio==2.8.0 # or +cu128 for NVIDIA
pip install omnivoice
python3 -c "from omnivoice import OmniVoice; print('OK')"
If OmniVoice is installed in a separate virtual environment, you must point to it with:
export TTS_PYTHON_BIN=/path/to/omnivoice-venv/bin/python3
Full installation details can be found in skills/fk-gen-tts-template.md.
verdict
FlowKit targets developers and researchers interested in automated AI video generation with consistent visual storytelling. Its architecture is opinionated and practical, balancing automation with the tricky challenge of maintaining visual coherence.
The Chrome extension bridge is a clever workaround for Google Flow API’s authentication and reCAPTCHA hurdles, but it also introduces an integration complexity layer that might be a barrier for some.
The reference image system is the real technical highlight, solving a problem that trips up most AI video workflows. However, this comes with orchestration overhead and a dependency on Google’s platform stability.
Overall, FlowKit is worth exploring if you need an end-to-end pipeline that produces visually consistent AI videos at scale. It’s not a plug-and-play product but a solid foundation with practical engineering choices. If your use case demands visual consistency across many scenes and you can handle some setup complexity, this repo provides a compelling starting point.
Related Articles
- Flowise: visual low-code AI agent builder with a modular TypeScript monorepo — Flowise offers a visual drag-and-drop low-code platform to build AI agents and LLM apps, with a Node.js backend and Reac
- Langflow: Visual orchestration platform for AI agents and workflows — Langflow offers a Python-based visual platform to build and deploy AI agents and workflows with multi-agent orchestratio
- AutoGPT: A modular platform for continuous AI agents and workflow automation — AutoGPT is a Python-based platform for building and managing continuous AI agents that automate workflows, featuring a m
- CopilotKit: Building dynamic agentic UIs with the AG-UI protocol — CopilotKit introduces the AG-UI Protocol, enabling AI agents to dynamically render and update UI components in React app
- Spec Kit: AI-Driven Spec-Driven Development with Executable Specifications — Spec Kit redefines software development by turning specifications into executable artifacts guided by AI agents, offerin
→ GitHub Repo: crisng95/flowkit ⭐ 324 · Python