red-run is not your typical pentest toolkit. It orchestrates multiple AI agents built on Claude Code to automate offensive security assessments end-to-end — from reconnaissance to post-exploitation. What sets it apart is a real multi-agent architecture where persistent “domain teammates” run in parallel tmux panes, each accumulating context and handing off findings between specialized agents. The operator watches it all unfold live, approving semantic routing decisions and redirecting agents on the fly.
an AI-driven offensive security assessment framework
At its core, red-run is a Python-based toolkit designed to automate the full kill chain in offensive security using AI agents. It uses Claude Code as the underlying agent framework, orchestrating teams specialized for recon, initial access, lateral movement, privilege escalation, and post-access tasks.
The architecture revolves around multiple key components:
Persistent domain teammates: Each teammate is a Claude Code agent specializing in a domain (e.g., web, AD, shell). They run in split tmux panes, maintaining state and context across tasks rather than being ephemeral.
MCP servers: These are modular control plane servers providing integration with pentesting tools and capabilities such as nmap scanning, shell access, browser automation, skill routing, and state management.
Semantic skill routing: Using ChromaDB for retrieval-augmented generation (RAG), the orchestrator semantically routes tasks to the appropriate skill or teammate. Routing decisions are presented in a dashboard for operator approval, giving control and transparency.
Persistent engagement state: The entire engagement’s context and state persist in an SQLite database, including compaction of context to keep things manageable over long assessments.
Browser-based dashboard: A real-time UI shows access chains, credentials, and event timelines through server-sent events (SSE), allowing the operator to monitor and interact with the running agents.
Multiple orchestrator variants: Besides the default shell-server orchestrator, red-run supports CTF mode, a legacy subagent mode, and plans for DLP-safe and training modes.
Optional C2 integration: It integrates with Sliver for command-and-control (C2) support, but also allows custom MCP server-based C2 backends.
Under the hood, the system is designed for Linux VMs equipped with pentest tools, Docker, and Claude Code. The installer sets up everything from MCP servers to teammate templates and indexes the skills directory into ChromaDB.
multi-agent orchestration with persistent teammates and semantic routing
What distinguishes red-run technically is how it implements a real-world multi-agent system tailored for offensive security workflows.
Persistent teammates in tmux: Unlike ephemeral LLM calls or stateless scripts, teammates hold ongoing context in their pane. This means a web teammate that discovers domain credentials can pass them off semantically to an Active Directory teammate, enabling chained operations.
Operator-in-the-loop routing: The orchestrator doesn’t blindly dispatch tasks. Instead, it uses semantic search over indexed skills with ChromaDB to propose routing decisions, then waits for operator approval. This balances automation with human oversight.
State persistence and compaction: Engagement state is kept in SQLite with mechanisms to compact context intelligently, a necessary tradeoff to handle long-running assessments without losing crucial info.
Shell-server sharing: The shell-server MCP runs as a persistent SSE service accessible by all teammates, enabling shared sessions and visibility — a practical design for collaborative pentesting.
Extensibility: The MCP server model and skill-router are designed to be modular, making it easier to add new pentesting tools or skills.
The tradeoff here is complexity and footprint. You need a Linux VM with multiple dependencies, Docker, and a Claude Code environment, plus operator expertise to manage the multi-agent system effectively.
The code quality, while not exhaustively detailed in the analysis, is described as clean and modular enough to allow semantic indexing of skills at runtime. The split tmux pane model is a pragmatic choice for live operator interaction.
quick start with red-run
The installation process is straightforward but assumes familiarity with Linux pentesting environments. Here are the exact commands from the repo:
./install.sh # Symlink-based (edits reflect immediately)
./install.sh --copy # Copy-based (standalone machines)
./uninstall.sh # Remove everything
After installing, verify dependencies with:
bash preflight.sh
Then launch the shell-server:
./run.sh # shell-server only (default)
For C2 integration with Sliver, run the config wizard before launch:
bash config.sh # select C2 backend, generate operator configs
./run.sh # starts C2 daemon + MCP automatically
The shell-server listens on 127.0.0.1:8022 via SSE, shared across all teammates.
verdict
red-run is a technically ambitious toolkit for AI-driven offensive security assessments that stands out by its multi-agent orchestration model with persistent teammates and semantic routing. It’s best suited for pentesters and red teamers comfortable with Linux pentesting environments who want to experiment with AI automation beyond simple scripted tooling.
The tradeoff is complexity: setting up and managing multiple AI agents in parallel with operator-in-the-loop control requires operational discipline and infrastructure readiness. It’s not a turnkey solution for casual users or quick scans.
Still, the approach provides a valuable template for how AI agents can be orchestrated in parallel with persistent context and human oversight in a real-world security use case. Worth exploring if you want a hands-on example of multi-agent AI orchestration combined with pentest tooling integration.
Related Articles
- Mapping the AI agent orchestration landscape with an awesome curated list — A curated list catalogs 80+ AI coding agent orchestration tools, revealing a fragmented ecosystem around git worktree is
- AgentShield: auditing AI agent security configurations with runtime confidence scoring — AgentShield is a TypeScript CLI tool that audits Claude Code AI agent configs for secrets, permissions, hooks, and more
- Inside Claude Code: A detailed reconstruction of Anthropic’s AI safety and architecture — A deep dive into Claude Code’s 512K lines of TypeScript reveals a layered YOLO safety classifier, multi-agent IPC, and t
- Agent Kanban: orchestrating AI coding agents with cryptographic identities — Agent Kanban is a TypeScript multi-agent platform that uses Ed25519 cryptographic identities to manage AI coding agents
- Outworked: Visual multi-agent orchestration for Claude Code on macOS — Outworked is a macOS app that turns Claude Code into a multi-agent system with visual orchestration, message bus communi
→ GitHub Repo: blacklanternsecurity/red-run ⭐ 197 · Python