The landscape of IP threat intelligence is cluttered with numerous sources, each with its own licensing quirks and update mechanisms. ipblocklist tackles this head-on by aggregating over 30 threat intelligence feeds into two practical blocklists: one for inbound malicious IPs and another for outbound command-and-control destinations. What sets this project apart is its explicit handling of licensing tradeoffs — the aggregation code is MIT licensed, while the compiled data inherits a more restrictive CC BY-NC-SA 4.0 license due to upstream source restrictions. This clear boundary between code and data is a useful model for anyone working with threat intel aggregation.
What ipblocklist does and how it is built
At its core, ipblocklist is a Python-based pipeline that fetches, aggregates, and curates IP blocklists from more than 30 different threat intelligence providers. The goal is to produce two clean, actionable lists:
inbound.txt: IP addresses and networks known to initiate malicious connections, such as spam sources, scanners, brute force attackers, and exploit sources. This list is aimed at filtering incoming traffic at the firewall’s WAN or INPUT chain.
outbound.txt: IPs that are known Command & Control servers, botnet controllers, malware drop sites, and phishing hosts. This list is intended for outbound filtering on the LAN or OUTPUT chain to prevent compromised devices from reaching harmful destinations.
The aggregation pipeline runs every two hours via GitHub Actions, but the workflow is resource-intensive enough that it requires a self-hosted runner to avoid GitHub’s runtime limits. This is a practical tradeoff: running the pipeline on GitHub-hosted runners would likely fail due to time constraints.
The architecture follows a simple but effective pattern — the Python scripts fetch raw blocklists, normalize and deduplicate entries, and then generate the two output files. Notably, the pipeline excludes IPs belonging to major public DNS resolvers to avoid false positives, which is a thoughtful detail often overlooked in similar projects.
Licensing is a key aspect here: while the Python code is permissively licensed under MIT, the aggregated blocklists fall under CC BY-NC-SA 4.0 due to restrictions from some upstream sources, especially Spamhaus, which prohibits commercial use. This dual licensing is clearly documented and separates code freedom from data usage restrictions.
Technical strengths and tradeoffs
One of the most interesting parts of ipblocklist is how it manages the complex landscape of threat intel sources, each with its own update frequency, data format, and licensing. The Python code is well-organized for fetching and normalizing data, using standard libraries and parsing techniques to handle diverse input formats. The codebase avoids external dependencies, which simplifies deployment and maintenance.
The decision to run updates every two hours strikes a balance between freshness and resource consumption. However, the processing time is long enough that the author recommends self-hosted GitHub Actions runners. This introduces an operational overhead but ensures reliability and control.
Another strength is the clear separation of inbound versus outbound threat intelligence. This distinction aligns well with real firewall configurations and network security models, making it straightforward to apply these lists in production environments.
The exclusion of large public DNS resolver IPs from the lists minimizes the risk of blocking legitimate DNS traffic, which is a common source of false positives. This shows practical experience with firewall filtering in real networks.
The dual licensing model is both a strength and a limitation. It allows open modification and improvement of the aggregation scripts but restricts commercial use of the output data. This means enterprises need to carefully review licenses if they want to integrate these blocklists into commercial products or services.
Overall, the code quality is solid for a security tooling project, focusing on clarity and maintainability rather than complexity. The tradeoffs made are practical and well explained.
How to use the generated blocklists and manage updates
The repo provides two key output files:
inbound.txt: Use this as a blocklist for incoming connections on your WAN or INPUT firewall chain. It includes IPs with bad reputations for initiating attacks.outbound.txt: Use this as a blocklist for outgoing connections on your LAN or OUTPUT firewall chain. It targets known malicious destination IPs.
These are standard text files compatible with most modern firewalls and security tools. Applying them typically involves configuring your firewall to drop or reject traffic matching these IPs.
The update process is automated through a GitHub Actions workflow defined in update.yml. Since this workflow is resource-intensive, the author runs it on a self-hosted runner. If you fork the repo and want to maintain your own updated lists, you’ll need to set up a self-hosted runner following GitHub’s documentation.
Here’s the usage information verbatim from the README:
## How to Use These Lists
These are standard text files and can be used with most modern firewalls, ad-blockers, and security tools.
### 🛡️ `inbound.txt` (Inbound Blocklist)
* **What it is:** A list of IPs/networks with a bad reputation for *initiating* malicious connections. This includes sources of spam, scanning, brute-force attacks (SSH, RDP), and web exploits.
* **Use Case:** Protect your public-facing servers and services (web servers, mail servers, game servers, etc.).
* **How to use:** Apply this list to your firewall's **WAN IN** or **INPUT** chain to **DROP** or **REJECT** all incoming traffic *from* these sources.
### ☢️ `outbound.txt` (Outbound Blocklist)
* **What it is:** A list of known malicious destination IPs. This includes C2 (Command & Control) servers, botnet controllers, malware drop sites, and phishing hosts.
* **Use Case:** Prevent compromised devices on your *internal* network (like a laptop or IoT device) from *contacting* malicious servers.
* **How to use:** Apply this list to your firewall's **LAN OUT** or **OUTPUT** chain to **BLOCK** or **LOG** all outgoing traffic *to* these destinations.
The README also points out the need for self-hosted runners:
## Self-hosted runners
The `update.yml` workflow takes a long time to complete. I run it on a self-hosted runner. If you fork this repo and want to use the same workflow, set up a self-hosted runner:
Setup Self-hosted runners
Verdict
ipblocklist is a practical project for security teams needing curated IP blocklists that merge multiple threat intel sources while respecting licensing boundaries. Its dual licensing model is a refreshing dose of transparency in a field where source data licenses often go unexamined. The pipeline’s reliance on self-hosted runners for updates is a tradeoff that may limit casual users but ensures robustness in production.
If you operate firewalls and want clean inbound and outbound malicious IP lists without building your own aggregation pipeline from scratch, this repo is worth exploring. However, be mindful of the CC BY-NC-SA 4.0 license on the data, especially if you plan commercial use.
The code itself is straightforward, maintainable Python, suitable for customization or extension. Overall, ipblocklist solves a real problem with a clear design and honest tradeoffs.
Related Articles
- SecLists: the essential wordlist collection for security testing — SecLists is a comprehensive collection of security testing wordlists and payloads, essential for penetration testers and
- Maigret: A resilient OSINT username scraper across thousands of sites — Maigret is a Python-based OSINT tool that scrapes public profiles by username from 3,000+ sites without API keys. It fea
- frp: a fast, extensible reverse proxy evolving towards cloud-native architecture — frp is a Go-based reverse proxy enabling NAT traversal with TCP/UDP/HTTP support and P2P mode. Its upcoming V2 rethinks
→ GitHub Repo: bitwire-it/ipblocklist ⭐ 321 · Python