Noureddine RAMDI / secrets-patterns-db: expanding regex coverage for secret scanning in codebases

Created Tue, 05 May 2026 16:46:42 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

mazen160/secrets-patterns-db

Every time you rely on default secret scanning rules, you risk missing critical keys and tokens lurking in your codebase. secrets-patterns-db tackles this by providing a much larger and more diverse set of regex patterns aimed at detecting secrets like API keys, passwords, and tokens. It’s a practical resource for anyone looking to improve their secret detection coverage without building a new scanner from scratch.

what secrets-patterns-db is and how it works

secrets-patterns-db is an open-source Python project that maintains a database of over 1600 regular expressions designed to identify secrets embedded in source code and configuration files. This collection is significantly larger than what common tools provide out of the box: TruffleHog ships with around 700 rules, and Gitleaks offers roughly 60. That means secrets-patterns-db nearly doubles TruffleHog’s coverage and provides 26 times more patterns than Gitleaks.

At its core, the repository stores these regexes in a YAML file (rules-stable.yml), categorizing them by confidence levels to help users prioritize findings. The database is licensed under Creative Commons BY 4.0, while portions specific to TruffleHog are under AGPL.

The project includes Python scripts to convert this pattern database into the formats required by popular secret scanning tools like TruffleHog (versions 2 and 3) and Gitleaks. This format conversion approach makes the regex set usable across multiple scanners without requiring users to rewrite detection logic.

A noteworthy architectural consideration is the attention to security risks inherent in regex-heavy scanning. Each pattern undergoes testing to prevent Regular Expression Denial of Service (ReDoS) vulnerabilities. ReDoS attacks happen when crafted input causes regexes to hang or consume excessive CPU, a serious problem for CI/CD pipelines scanning large codebases. By proactively vetting patterns, secrets-patterns-db aims to provide both broad coverage and operational safety.

why secrets-patterns-db stands out and what to consider

The defining feature of secrets-patterns-db is its extensive collection of regex patterns. This breadth matters because secret scanning tools rely heavily on their pattern libraries to detect diverse and evolving secret formats. The more comprehensive the regex database, the higher the chance of catching edge-case or custom tokens that default scanners miss.

Beyond quantity, the repo’s approach to categorizing patterns by confidence helps teams tune false positive rates, a common tradeoff in secret detection. High-confidence patterns reduce noise but may miss some secrets, while lower-confidence patterns catch more but generate more false alerts.

The project’s emphasis on preventing ReDoS is an important technical strength. Regex-based scanners can become bottlenecks or denial-of-service targets if patterns are not carefully tested. This repository addresses that by manually cleaning invalid regexes and running CI jobs for validation.

However, the project is currently in beta, so users should be aware of potential instability or incomplete coverage for newer secret formats. Also, while the repository provides the regex patterns and conversion scripts, it does not itself implement a scanning engine. Users must integrate these rules into compatible scanners.

One tradeoff to note is that regex-based detection, even with a large pattern set, can only identify secrets exposed in text. It won’t catch secrets stored in encrypted or binary formats, nor will it detect secrets leaked through runtime behavior. Therefore, secrets-patterns-db complements but does not replace comprehensive application security practices.

explore the project

Since no officially documented quickstart commands are provided, the best way to get started is to familiarize yourself with the repository structure and documentation:

  • The core of the project is the db/rules-stable.yml file, containing the full set of regex patterns.

  • Conversion scripts live under the scripts/ directory. These Python scripts transform the YAML regex database into formats compatible with secret scanners like TruffleHog and Gitleaks.

  • The README and repository docs explain the pattern categories, confidence levels, and testing procedures.

  • The project uses CI pipelines to validate regex correctness and prevent ReDoS risks, which is valuable context for maintainers and contributors.

Reviewing these resources will give you a solid understanding of how to leverage the extensive regex database and convert it for your preferred scanning tool.

verdict

secrets-patterns-db is a valuable resource for security engineers and developers aiming to improve secret detection in their codebases. Its extensive pattern set fills a gap left by popular scanners and offers a way to broaden coverage without building a new scanner.

That said, it’s not a plug-and-play scanner itself. You’ll need to integrate its regex database into compatible tools and be mindful of its beta status. The focus on ReDoS safety and confidence categorization shows a mature approach to real-world secret scanning challenges.

If you manage CI/CD pipelines or DevSecOps workflows that include secret scanning, secrets-patterns-db is worth exploring to upgrade your pattern coverage. Just remember that regex-based detection has inherent limitations and should be part of a layered security strategy rather than the sole defense.


→ GitHub Repo: mazen160/secrets-patterns-db ⭐ 1,440 · Python