SupoClip: self-hostable AI-powered video clipping with multi-LLM backend abstraction

SupoClip offers a practical approach to turning long-form videos into highlight clips using AI, all within a self-hostable environment. Unlike many cloud-dependent services, it enables users to run the entire pipeline locally or in their own infrastructure, avoiding usage limits and watermarks. The key technical feature is its multi-LLM abstraction layer, which supports Google Gemini, OpenAI, Anthropic, or a local Ollama instance for clip selection and captioning. This flexibility lets users choose between cloud APIs or fully local models once transcription is also local.

What SupoClip does and its architecture

At its core, SupoClip is an AI-driven video clipper designed as an open-source alternative to OpusClip. Its main function is to extract highlight clips from longer videos by first transcribing the audio and then applying large language models (LLMs) to select and caption the most relevant segments.

The stack is a modern full-stack setup orchestrated with Docker Compose. The backend is built on FastAPI, a high-performance Python web framework suited for async operations and API development. FastAPI handles transcription jobs, interacts with the LLM backends, and manages clip generation workflows.

For data storage, SupoClip uses PostgreSQL as the main relational database, supported by Redis for caching and task queue management, which is typical for asynchronous background processing in Python web apps. The frontend is implemented with Next.js, a React-based framework offering server-side rendering and smooth client-side navigation, providing a responsive and user-friendly interface.

The transcription service is powered by AssemblyAI, which provides reliable and detailed audio transcription with timestamps. For AI analysis and clip selection, SupoClip abstracts over multiple LLM providers — Google Gemini, OpenAI, Anthropic, and Ollama. The latter is a local-first model that allows users to run the entire analysis pipeline without external API calls once the transcription step is also local.

This architecture balances cloud service convenience with local-control flexibility. The Docker Compose orchestration simplifies deployment by bundling all components — backend, frontend, database, and caching layers — into a single manageable stack.

Multi-LLM abstraction and design tradeoffs

What distinguishes SupoClip is its multi-LLM abstraction layer. This is a deliberate design choice that allows the system to interface with different large language models through a unified API. It means you can switch between cloud providers or a local Ollama instance without changing the core logic.

This abstraction is significant because it offers a migration path away from costly or rate-limited cloud APIs towards a local model that preserves data privacy and reduces operational costs. However, the tradeoff is complexity in maintaining adapter code for each backend and potential differences in output quality or latency.

The codebase is surprisingly clean in separating concerns between transcription, AI analysis, and video processing. The use of FastAPI’s dependency injection and async features keeps the backend responsive even under concurrent workloads. The frontend Next.js app consumes the backend APIs efficiently, providing a smooth user experience.

Automated testing is well integrated with pytest for backend unit tests, Vitest for frontend components, and Playwright for end-to-end scenarios. This test coverage is important for a project that handles asynchronous jobs and multi-service orchestration.

One limitation is the reliance on AssemblyAI for transcription in the default setup, which is a third-party cloud API. While Ollama covers LLM inference locally, transcription remains a cloud dependency unless users swap in a local transcription model, which is documented but requires extra setup. This is a common tradeoff in AI media pipelines where local speech-to-text models still lag behind cloud services in accuracy and speed.

The project also includes Stripe integration hooks for monetization, indicating plans or support for paid hosted versions. However, the self-hosted version is fully functional without watermarks or usage caps.

Quick start

Prerequisites

Docker and Docker Compose
An AssemblyAI API key (for transcription) - Get one here
An LLM provider for AI analysis - OpenAI, Google, Anthropic, or Ollama

1. Clone and Configure

git clone https://github.com/FujiwaraChoki/supoclip.git
cd supoclip

Create a .env file in the root directory:


### Prerequisites

- Docker and Docker Compose
- An AssemblyAI API key (for transcription) - Get one here
- An LLM provider for AI analysis - OpenAI, Google, Anthropic, or Ollama

# Option D: Ollama (local/self-hosted)

# OLLAMA_BASE_URL=  # Optional; defaults to localhost locally, host.docker.internal in Docker

### Local Development (Without Docker)

See CLAUDE.md for detailed development instructions.

verdict

SupoClip is a solid option for developers and teams looking to run AI-powered video clipping pipelines locally or in private infrastructure. Its multi-LLM abstraction provides flexibility to choose cloud APIs or local inference with Ollama, which is a noteworthy feature for privacy-conscious setups.

The project is well-structured with a modern stack and good test coverage, making it approachable for those comfortable with Docker, FastAPI, and React. However, the dependency on AssemblyAI for transcription by default is a limitation for truly offline use, requiring extra work to swap in local transcription models.

If you want to avoid usage limits and watermarks of commercial services and have control over your data and costs, SupoClip is worth exploring. It is not a plug-and-play consumer app but a developer-friendly platform that balances cloud convenience with local autonomy.

claude-shorts: AI-driven pipeline for viral vertical video clips from long form content — claude-shorts uses AI scoring, GPU transcription, and adaptive video reframing to extract viral-ready vertical clips fro
How video-use turns AI agents into transcript-driven video editors — video-use replaces frame-heavy editing with transcript-driven AI agents, using ElevenLabs Scribe and self-evaluation to
Structured prompt engineering with awesome-gpt-image-2: a curated GPT Image 2 prompt library in TypeScript — A TypeScript library of 4,000+ curated GPT Image 2 prompts in JSON with dynamic Raycast Snippet arguments, multilingual
Building private AI workflows with the n8n self-hosted AI starter kit — Spin up a private AI agent stack in under 5 minutes with n8n’s self-hosted AI starter kit. Combines local LLMs, automati
HolyClaude: a battle-tested Docker AI dev workstation solving real container quirks — HolyClaude bundles Claude Code, 7 AI CLIs, a headless browser, and 50+ dev tools in a Docker container that fixes 15+ re

→ GitHub Repo: FujiwaraChoki/supoclip ⭐ 640 · Python

Noureddine RAMDI / SupoClip: self-hostable AI-powered video clipping with multi-LLM backend abstraction