Extracting viral-ready vertical clips from long-form videos is a tough problem. It’s not just about cutting at fixed intervals or matching keywords — the clip has to hook the viewer immediately, maintain coherence, carry emotional weight, and deliver value. claude-shorts tackles this challenge with a 10-step pipeline that combines GPU-accelerated transcription, AI-driven segment scoring, and smart video rendering with adaptive framing.
What claude-shorts does and its architecture
claude-shorts is a Claude Code skill designed to automatically generate vertical video clips optimized for platforms like TikTok, Instagram Reels, and YouTube Shorts from longer videos. At its core, it orchestrates a 10-step pipeline that extracts segments that are not only semantically interesting but also visually and audibly clean for viral potential.
The pipeline begins with faster-whisper, a GPU-accelerated Whisper implementation, producing word-level timestamped transcriptions. This transcription is critical for precise audio-aware boundary snapping — ensuring that clip cuts align with natural sentence endings and silent moments rather than awkward mid-word.
Next, Claude (the LLM) scores candidate segments in five dimensions: hook strength, coherence, emotion, value density, and payoff. Unlike heuristic or keyword-based approaches, this scoring reflects narrative arc understanding, making the clip selection more nuanced and context-aware.
For video rendering, claude-shorts uses Remotion v4 combined with React 19. Remotion is a React-based video rendering framework, and here it’s used for single-pass rendering of vertical 1080x1920 videos with animated, TikTok-style captions. The captions are synchronized with the transcription timestamps, enhancing viewer engagement.
Content type detection is powered by MediaPipe, which enables adaptive reframing strategies:
- Face tracking for talking-head videos
- Cursor tracking for screen recordings
- Dominant speaker tracking for podcasts
This content-aware reframing helps maintain the focus of the video in vertical format without awkward cropping.
Finally, platform-specific FFmpeg encoding pipelines handle export to TikTok, Instagram Reels, and YouTube Shorts, respecting each platform’s technical specifications.
The repo is split between Python code for transcription, scoring, and video processing orchestration, and a Remotion/React frontend for rendering. The setup assumes a Unix-like environment with FFmpeg installed and recommends a NVIDIA GPU for CUDA-accelerated transcription and NVENC encoding.
What sets claude-shorts apart: AI scoring and audio-aware boundary snapping
Most video clipping tools rely on simple heuristics, like keyword spotting or fixed time windows. claude-shorts stands out because it uses a large language model (Claude) to semantically score video segments along multiple dimensions that matter for virality and viewer retention.
This is combined with an audio-aware boundary snapping mechanism that ensures cuts don’t break words or sentences abruptly. The boundary snapping aligns cuts to word boundaries and silent points, which improves the natural flow of the clip.
The tradeoff here is complexity and dependency on GPU resources. GPU-accelerated faster-whisper transcription is a heavy dependency but necessary for word-level timestamps and speed. Similarly, relying on Claude for scoring introduces external API or local LLM complexity.
The code quality is pragmatic. The Python side uses well-known libraries like numpy, mediapipe, and opencv-python for core processing tasks. The Node.js side is clean, using modern React 19 with Remotion v4 and Zod for runtime validation. The setup scripts automate virtual environment creation and dependency installation, improving DX.
One limitation to note is platform dependency: the tool requires Unix tools like bash and FFmpeg, and on Windows, WSL 2 is mandatory. This restricts out-of-the-box Windows support.
Quick start
The README provides a clear set of prerequisites and installation steps:
## Prerequisites
- **FFmpeg** (system package)
- **Python 3.10+**
- **Node.js 18+**
- **Claude Code** (CLI)
- **NVIDIA GPU** recommended (for CUDA transcription + NVENC encoding)
# Install Python + Node.js dependencies
bash setup.sh
# Install as a Claude Code skill
bash install.sh
On Windows, WSL 2 is required due to Unix tooling dependencies.
The setup.sh script handles:
- Creating a Python virtual environment (reusing if already present)
- Installing Python dependencies including faster-whisper, mediapipe, numpy, opencv-python, and PyTorch
- Running
npm installinside theremotion/directory - System dependency checks for FFmpeg and jq
This means the project is largely plug-and-play after prerequisites are met, with a smooth installation process.
verdict
claude-shorts is a thoughtfully engineered tool for creators and developers aiming to automate vertical short clip production from long-form videos with a semantic and audiovisual quality focus.
Its strengths lie in the unique use of LLM-based segment scoring combined with audio-aware boundary snapping and adaptive content reframing. This makes the clips more natural, engaging, and platform-optimized.
The tradeoffs include dependency on NVIDIA GPUs for optimal performance and Unix-like environments, which may limit accessibility for some users. The codebase is split between Python and React-based Node.js — a common stack but requiring familiarity with both.
If you work with video content and want to reduce manual clipping effort while improving clip quality using AI, claude-shorts is worth a look. It’s especially relevant if you have access to GPU resources and can handle the setup environment.
Overall, this repo solves a real problem with a practical, well-architected approach that balances modern AI techniques with video processing realities.
Related Articles
- how awesome-claude-skills turns claude into a real-world action agent — Awesome Claude Skills is a modular Python framework that empowers Claude to perform real-world actions by integrating wi
- Hugging Face Transformers: a unified API for state-of-the-art AI models across modalities — Hugging Face Transformers offers a unified Python API to access over 1 million pretrained AI models for text, vision, an
→ GitHub Repo: AgriciDaniel/claude-shorts ⭐ 78 · Python