Penetration testing often involves juggling a series of diverse security tools and interpreting a flood of raw output. watchtower takes a different approach by orchestrating these tools through a multi-agent LangGraph workflow, dynamically chaining actions based on intermediate results. This design aims to automate and streamline pentesting using AI-driven planning and analysis.
how watchtower automates penetration testing with a langgraph multi-agent architecture
At its core, watchtower is a Python command-line interface that implements a Planner-Worker-Analyst architecture to run automated penetration tests. The Planner role is to strategize the overall testing workflow, deciding which tools to run in what order and how to interpret their outputs.
The Worker agent executes 23 integrated security tools by invoking their CLI binaries as subprocesses. These include popular tools like nmap, nuclei, and httpx, which are required to be installed on the host system and accessible in the PATH. This subprocess approach avoids reimplementing tool functionality but introduces dependencies on external binaries.
The Analyst agent filters and refines the raw output from the Worker into structured, meaningful findings. It aims to reduce noise and false positives before persisting results.
Under the hood, watchtower uses SQLite to persist state across runs, including intermediate findings and the evolving knowledge graph of the pentesting session. This enables resuming tests and generating comprehensive reports from accumulated data.
The tool supports parallel reconnaissance by running multiple security tools concurrently, which speeds up the overall testing process. It also smartly truncates output to avoid overwhelming the user or the LLM with too much irrelevant data.
Watchtower is designed to be LLM-agnostic, supporting OpenAI, Gemini, OpenRouter, or any OpenAI-compatible endpoint. This flexibility allows users to choose their preferred large language model provider for the AI-driven planning and analysis.
Reports are generated as PDFs from the stored findings, providing a professional summary of the testing results.
the technical strengths and tradeoffs of watchtower’s approach
The standout feature of watchtower is its Planner-Worker-Analyst pattern implemented as a LangGraph multi-agent system. This architecture clearly separates concerns: planning strategy, executing tools, and analyzing results. It allows modularity in adding new tools or swapping out AI providers.
Using subprocess wrappers for the 23 security tools allows watchtower to leverage battle-tested scanners without reimplementation. However, this also means the toolchain depends heavily on the host environment having these binaries installed and correctly configured. The tool does detect missing tools and disables them in the interactive UI, which is a practical DX touch.
Persisting state with SQLite is a pragmatic choice that balances simplicity and durability. It avoids the complexity of distributed state stores, but SQLite can become a bottleneck if scaled beyond single-host or highly concurrent use cases.
Parallel execution of tools is a key performance gain, but concurrency management and output aggregation add complexity to the codebase. The repo appears to handle this cleanly, with subprocess management abstracted in the Worker.
The LLM-agnostic design is a major plus. It means the core logic is not tightly coupled to a single AI provider, which future-proofs the project as new models or providers emerge.
The codebase is Python 3.11+, using modern async constructs for concurrency. The CLI interface is interactive by default but supports a headless mode with flags for CI/CD integration.
One limitation is the dependency on external CLI tools and their correct installation, which might be a barrier for newcomers or in restricted environments. Also, the AI-driven orchestration depends on the quality and cost of the chosen LLM provider, which is an external factor.
getting started with watchtower
The project provides a clear quick start guide. After cloning the repo, you create a Python virtual environment, activate it, and install dependencies from requirements.txt.
Installing the actual security tools requires running the included install_tools.sh script, which attempts to install all required binaries. Missing tools are detected at runtime and skipped.
Configuration is managed through a .env file where you set your API key for one of the supported LLM providers (OpenAI, Gemini, or OpenRouter). You can also customize the model names if desired.
Running the tool requires specifying a target URL or IP with the -t flag. The interactive CLI then lets you pick which tools to enable for that run, highlighting the ones detected on your system.
For example, to run a scan against https://www.example.com:
python -m watchtower.main -t https://www.example.com
To run in headless mode without the interactive prompt, use:
python -m watchtower.main --skip-ask-tools -t https://www.example.com
This makes it suitable for integration into automated CI/CD pipelines.
verdict: who should use watchtower and what to expect
watchtower is a solid tool for security professionals and pentesters who want to automate and orchestrate multi-tool workflows using AI. Its multi-agent LangGraph architecture is worth understanding even if you don’t adopt it outright.
The dependency on external CLI tools means it fits best in environments where installing and maintaining these binaries is manageable—typically Linux or macOS. Windows users can use WSL2 for compatibility.
The LLM-agnostic design is forward-looking, but the AI’s effectiveness will depend on your provider and model choices.
While the tool is well suited for automated pentesting pipelines and CI/CD integration, expect a learning curve around environment setup and understanding the Planner-Worker-Analyst flow.
Overall, watchtower fills a niche for AI-driven pentesting orchestration with a clear, modular architecture and practical tradeoffs. It’s a good base for anyone looking to build smarter security workflows or learn about multi-agent LangGraph patterns in real-world tooling.
Related Articles
- OpenAI Codex CLI: local-first AI coding assistant with ChatGPT integration — OpenAI Codex CLI brings AI coding assistance local to your terminal, integrating with ChatGPT plans for powerful hybrid
- Watchtower: automating Docker container updates for homelabs and dev environments — Watchtower automates Docker container base image updates by monitoring image registries and restarting containers with n
- Inside Google Gemini CLI: a terminal-first AI agent with extensible Model Context Protocol — Google Gemini CLI is a TypeScript-based terminal AI agent offering direct Gemini model access, extensibility via MCP, an
- Langflow: Visual orchestration platform for AI agents and workflows — Langflow offers a Python-based visual platform to build and deploy AI agents and workflows with multi-agent orchestratio
- Forge: a Rust-based multi-agent AI coding assistant integrated into your terminal workflow — Forge is a Rust-based AI coding agent with multi-agent architecture and a unique ZSH plugin that intercepts shell comman
→ GitHub Repo: fzn0x/watchtower ⭐ 126 · Python