Noureddine RAMDI / Maigret: A resilient OSINT username scraper across thousands of sites

Created Sat, 02 May 2026 20:07:04 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

soxoj/maigret

Maigret tackles a common OSINT challenge: gathering public information about a person when you only have a username. Instead of relying on site APIs, it scrapes profile pages directly across thousands of websites, navigating the complexities of varied layouts, rate limits, and anti-bot defenses. This makes it a practical tool for anyone needing comprehensive username enumeration without the overhead of API keys or multiple integrations.

What maigret does and how it works

Maigret is an open-source OSINT tool written in Python designed to collect public data about a person from a username alone. The standout feature is its scale: it supports over 3,000 sites, with a default scan targeting the 500 highest-traffic ones. It doesn’t rely on APIs; instead, it scrapes profile pages, extracting whatever public information is available.

The architecture centers around a large, curated site database that defines how to find and parse user profiles on each site. This database is maintained separately in a private commercial version with 5,000+ sites and daily updates to keep pace with site changes, but the open-source version remains powerful with 3,000+ sites.

Under the hood, Maigret implements recursive search capabilities, allowing it to dive deeper based on discovered linked accounts or related profiles. Users can filter site scans by tags (such as categories or countries), optimizing for specific research needs.

The project supports integration as a Python library for embedding in other tools, offers a command-line interface, and includes a web UI that visualizes results and generates reports in multiple formats.

Key technical points include:

  • Python as the core language, leveraging its ecosystem for HTTP requests, parsing, and concurrency.
  • Site database structured to define URL patterns, parsing rules, and metadata.
  • Mechanisms for bypassing common scraping blocks, including CAPTCHA detection and optional Tor/I2P routing for anonymity.
  • Support for recursive, multi-level username enumeration.

The technical strengths and tradeoffs of maigret

Maigret’s technical strength lies in its scale and resilience. Scraping thousands of diverse websites is non-trivial. Each site has different HTML structures, rate limits, and anti-bot defenses. Maigret manages this complexity via a well-organized site database where each entry encodes how to locate and extract user profile data.

The codebase emphasizes extensibility: adding new sites involves defining patterns and extraction rules, allowing the tool to evolve with web changes. The scraping logic is layered with fallback mechanisms and heuristics to handle partial failures gracefully.

Another strength is the focus on anti-blocking strategies. Maigret supports proxy usage, including Tor and I2P networks, to rotate IPs and evade bans. It also detects common obstacles like CAPTCHAs and can bypass some with user input or automation. This robustness makes it viable for real-world OSINT workflows where scraping can be easily blocked.

However, these strengths come with tradeoffs. Maintaining scraping rules for thousands of sites is a constant effort since websites frequently change layouts. The open-source database is comprehensive but lags behind the commercial version in coverage and update frequency.

Relying on scraping rather than APIs limits data completeness and reliability. Some sites intentionally obfuscate profiles or use dynamic content loading, complicating scraping. Maigret’s approach prioritizes breadth over depth, which suits reconnaissance but may miss fine-grained details.

The code quality is surprisingly clean for a project of this scale. Python’s readability aids maintainability, and the modular design separates site definitions from core logic. However, the nature of scraping means occasional breakage is inevitable.

How to try maigret

Installation and usage are straightforward with multiple options:

# install from pypi
pip3 install maigret

# or clone and install manually
git clone https://github.com/soxoj/maigret && cd maigret

# build and install
pip3 install .

# manual build
docker build -t maigret .

For users who prefer not to install, there is a Telegram bot interface.

The tool’s default run scans the top 500 sites by traffic, which balances speed and coverage. You can expand to all sites with -a or filter by categories with --tags.

Maigret also offers a web interface for easier result visualization and report generation, which can be run locally or deployed.

Verdict

Maigret is a solid choice for OSINT practitioners who need to enumerate usernames across a vast range of sites without managing multiple API keys. Its Python codebase is accessible and modular, making it a good base for customization or integration into larger toolchains.

While scraping-based tools always face challenges with site changes and anti-bot measures, Maigret’s layered anti-blocking strategies and support for anonymity networks make it more resilient than many peers.

The tradeoff is that it requires ongoing maintenance and occasional manual intervention when sites change drastically. Also, the open-source version’s site database, while extensive, does not match the commercial offering’s scale or update frequency.

In production, this means Maigret is best for reconnaissance and initial data gathering rather than deep, authoritative profile building. It’s relevant for security researchers, digital investigators, and anyone needing broad username footprinting.

Overall, Maigret shows how a well-architected scraping framework can scale to thousands of sites with practical usability, balancing tradeoffs inherent to web scraping at scale.


→ GitHub Repo: soxoj/maigret ⭐ 19,617 · Python