Edit Mind tackles a familiar pain point for anyone managing large video libraries: how to search your videos semantically, beyond filenames or simple metadata. It does this by locally processing videos with multiple AI models — transcribing audio, detecting objects and faces in frames, then embedding these insights into a vector database for natural language search. The real technical challenge is orchestrating these pipelines efficiently and reliably, which Edit Mind addresses with a background job service coordinating AI tasks and storing results in ChromaDB.
What Edit Mind does and how it’s built
Edit Mind is a local-first video knowledge base designed to index and semantically search your video libraries using multi-modal AI analysis. It combines object detection (YOLO), face recognition (DeepFace), and speech transcription (Whisper) to extract rich, scene-level metadata from videos.
Under the hood, it stores semantic embeddings in ChromaDB, a vector database optimized for similarity search, enabling natural language queries over video content. For relational data, such as video metadata and indexing status, it uses PostgreSQL with Prisma ORM.
Architecturally, Edit Mind is organized as a Docker Compose monorepo leveraging pnpm workspaces. It cleanly separates concerns into three main components:
- A React Router V7 web frontend for browsing and querying indexed videos.
- A Node.js/Express background job service managing AI pipelines and queueing tasks with BullMQ.
- A Python ML service running the AI models (YOLO, DeepFace, Whisper) for multi-modal analysis.
The system supports GPU acceleration via CUDA Docker Compose profiles and integrates with Ollama or Google Gemini for advanced NLP tasks. This stack shows a pragmatic use of containerization and microservices to manage complex AI workflows locally.
How the background job service orchestrates multi-modal AI pipelines
The standout technical feature of Edit Mind is its Node.js background job service. This service acts as the conductor for the multi-modal AI pipelines, orchestrating the processing stages required to turn raw video into searchable semantic data.
When a new video is added, it enqueues tasks to:
- Extract audio and run Whisper for transcription.
- Sample video frames and run YOLO to detect objects in each frame.
- Perform face recognition on detected faces using DeepFace.
Each of these tasks runs asynchronously, coordinated through BullMQ queues to manage concurrency and retries. Once processing completes, the service generates embeddings from the combined metadata — transcriptions, object labels, face identifiers — and stores them in ChromaDB.
This design decouples compute-heavy AI inference from the frontend, enabling responsive UI interactions and scalable processing. The Python ML service encapsulates the AI models, while the Node.js service handles queue management, error handling, and data persistence.
The tradeoff here is complexity: managing distributed asynchronous jobs and ensuring data consistency across services requires robust error handling and monitoring. However, this separation improves maintainability and allows leveraging specialized languages for each role (Python for ML, Node.js for backend orchestration).
Quick start
Edit Mind uses Docker Compose to run everything in containers.
Desktop app option
If you prefer not to deal with Docker or terminal setup, there is a commercial desktop app with a one-click installer for macOS and Windows. It supports additional features like Davinci Resolve and Final Cut Pro integration and can utilize Apple GPUs, which Docker containers cannot.
Self-hosted setup
To get Edit Mind running locally with Docker Compose, follow these steps:
mkdir edit-mind
cd edit-mind
Configure Docker to share your media folder:
- On macOS/Windows: open Docker Desktop → Settings → Resources → File Sharing, add the path where your videos are stored, and apply.
- On Linux, file sharing is typically enabled by default.
Configure environment variables with two files:
.envfor your personal config (required).env.systemfor system defaults (required)
Copy the example .env file and customize it to your setup.
This setup encapsulates all components in containers, including the React frontend, Node.js job service, and Python ML service, with GPU acceleration optionally enabled if your system supports it.
Who should consider Edit Mind
Edit Mind is tailored for developers and technical users who want a local-first, privacy-preserving video knowledge base that runs AI pipelines directly on their hardware. It’s especially relevant if you manage large video collections and need semantic search capabilities beyond filename or tag matching.
The tradeoff is the complexity of managing multi-container Docker Compose setups and the current pre-v1.0 status, meaning active development and potential instability. GPU acceleration requires compatible hardware and additional Docker configuration.
The background job orchestration model is a solid example for anyone building multi-modal AI pipelines requiring asynchronous task management and semantic vector storage.
If you’re looking for a turnkey cloud SaaS solution, this is not it. But if you want hands-on control over your video indexing and semantic search, Edit Mind is worth exploring.
Related Articles
- Cua: A unified stack for background desktop automation agents across macOS, Linux, Windows, and Android — Cua provides a multi-component open-source stack for building and benchmarking computer-use agents that control full des
- Jan: a local-first desktop app for large language models with Tauri and Rust — Jan is an open-source desktop app that runs large language models locally using Tauri, Node.js, and Rust. It offers priv
- LobeHub: An extensible AI agent playground with MCP plugin architecture — LobeHub offers a TypeScript-based AI agent platform with a unique MCP plugin system for integrating 10,000+ skills and c
→ GitHub Repo: IliasHad/edit-mind ⭐ 1,336 · TypeScript