Noureddine RAMDI / DeepTeam: A Python framework for adversarial red teaming of large language models

Created Sat, 23 May 2026 20:41:14 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

confident-ai/deepteam

Large language models (LLMs) are increasingly deployed in production, but their security and safety remain critical concerns. DeepTeam addresses this by providing a Python framework that lets you red team your LLM setup without needing to specify the exact model architecture or prepare datasets upfront. It dynamically generates adversarial prompts targeting specific vulnerabilities and evaluates the model’s responses, making it a useful tool for AI safety and robustness testing.

What DeepTeam does and how it works

At its core, DeepTeam is a Python package designed to automate adversarial testing of any LLM accessible via a simple callback interface. You provide a model_callback function that takes an input string and returns the model’s output string asynchronously. The framework then simulates attack scenarios, such as prompt injections, targeting defined vulnerabilities like bias or safety rule violations.

DeepTeam’s architecture relies on modular components:

  • Model callback abstraction: This lets DeepTeam interface with any LLM or application layer wrapping an LLM, without needing to know the underlying system.
  • Vulnerability definitions: These are pluggable classes representing specific risk types, e.g., Bias with parameters like the type of bias to check.
  • Attack strategies: Such as the PromptInjection attack, which dynamically generates adversarial inputs to probe the model’s weaknesses.
  • Metrics and evaluation: The framework uses vulnerability-specific metrics (like BiasMetric) to score the model’s outputs and determine pass/fail rates.

The system is asynchronous and Python-based, making it easy to integrate into existing Python environments. It also supports using standard AI safety frameworks like OWASP or NIST, so you can pick from established vulnerability taxonomies.

Technical strengths and tradeoffs

DeepTeam’s standout feature is its abstraction from the LLM system itself. You don’t need to hardcode which model you’re testing, which reflects real-world conditions where malicious actors don’t know your backend either. This makes DeepTeam flexible and adaptable across different LLM providers.

By dynamically generating adversarial prompts rather than relying on static test datasets, it simulates realistic attack vectors that evolve with the vulnerabilities you specify. This is a practical approach given how fast LLM attack surfaces change.

The codebase is clean and focuses on composability:

  • Vulnerabilities and attacks are defined as independent modules.
  • The red_team function orchestrates the testing flow asynchronously.
  • Metrics provide binary or probabilistic scoring, enabling quantitative risk assessment.

The tradeoff is that DeepTeam relies heavily on the model_callback you provide. It does not control or introspect the underlying model, so its effectiveness depends on the fidelity of that callback and the coverage of vulnerability modules. Also, the framework currently focuses on single-turn attacks like prompt injections, which may not cover multi-turn complex adversarial scenarios.

Quick start with DeepTeam

DeepTeam is straightforward to install and start with. The official quickstart from the README is:

pip install -U deepteam

Here’s a minimal example to red team an LLM for bias vulnerabilities against prompt injection attacks:

from deepteam import red_team
from deepteam.vulnerabilities import Bias
from deepteam.attacks.single_turn import PromptInjection

async def model_callback(input: str) -> str:
    # Replace this with your LLM application
    return f"I'm sorry but I can't answer this: {input}"

risk_assessment = red_team(
    model_callback=model_callback,
    vulnerabilities=[Bias(types=["race"])],
    attacks=[PromptInjection()]
)

Before running, set your environment variable OPENAI_API_KEY if you use OpenAI models or configure the callback accordingly. Then run your Python file:

python red_team_llm.py

This will:

  • Use your callback to generate responses to adversarial prompts.
  • Run prompt injection attacks targeting the defined bias vulnerability.
  • Evaluate outputs with the bias metric to compute a pass/fail risk score.

You can also use predefined safety frameworks like OWASP Top 10 to select vulnerabilities automatically.

DeepTeam’s place in the AI safety toolbox

DeepTeam is relevant for researchers, AI engineers, and security practitioners looking to automate adversarial testing of LLMs without building custom attack datasets. Its modular design and model-agnostic approach let you plug in any model or wrapper.

However, it’s not a silver bullet. Its effectiveness depends on the quality of your model callback and coverage of attack and vulnerability modules. It currently emphasizes prompt injection and bias vulnerabilities, so complex multi-turn or other attack vectors might require extending the framework.

The project strikes a good balance between flexibility and usability. It provides a structured way to simulate adversarial attacks and quantitatively assess risk, which is often missing in ad-hoc manual testing. For teams deploying LLMs in safety-critical contexts, DeepTeam offers a practical starting point for continuous adversarial evaluation.

If you’re building or maintaining LLM-based applications where safety and security matter, DeepTeam is worth exploring to add automated red teaming without heavy setup or proprietary tooling.


→ GitHub Repo: confident-ai/deepteam ⭐ 1,809 · Python