AgentShield: auditing AI agent security configurations with runtime confidence scoring

AgentShield tackles a real headache in AI agent development: how to audit and secure your agent configurations without drowning in false alarms from template files or example setups. In the wild, AI agent ecosystems like Claude Code often ship with default catalogs or shared config files that look risky but aren’t actually deployed at runtime. AgentShield’s approach to score and weigh findings based on runtime confidence is a straightforward yet effective way to focus security attention where it matters.

What AgentShield does and how it audits AI agent configurations

AgentShield is a TypeScript-based security auditor designed specifically for AI agent configurations, with a strong focus on Claude Code setups. It scans configuration files looking for a broad range of security issues across 102 rules divided into five categories: secrets, permissions, hooks, MCP servers, and agent prompt injections.

The tool inspects your local AI agent config files—typically found under ~/.claude/—and produces a graded security report. It’s distributed as a CLI tool installable globally via npm or runnable without install via npx. Beyond the CLI, AgentShield integrates as a GitHub Action and a GitHub App, making it convenient for CI/CD pipelines and repository-level audits.

Under the hood, the CLI entrypoint is in src/index.ts, built on the commander framework for command parsing. The scanner applies patterns to detect hardcoded secrets such as API keys (e.g., Anthropic keys), permission misconfigurations that allow overly broad command execution, hook injection vectors that could manipulate agent behavior, MCP server risks that expose runtime vulnerabilities, and prompt injection patterns that might compromise agent outputs.

Security findings are scored on a scale from A to F (or 0 to 100), broken down into category-specific scores. Findings are annotated with severity levels ranging from critical to informational, with example outputs showing counts like “73 total findings — 19 critical, 29 high, 15 medium, 4 low, 6 info”.

The architecture is pragmatic and modular, allowing easy rule additions and auto-fix capabilities for some issues. It generates reports in JSON and HTML formats, facilitating integration with other tools or dashboards.

runtime confidence system: reducing false positives from example configs

What sets AgentShield apart is its graded runtime confidence system. This mechanism distinguishes between active runtime configurations and non-runtime template or example files commonly shipped with repositories.

This distinction is crucial because many AI agent repos include sample MCP catalogs or example agent definitions that look risky but are never actually activated in production. Without this system, scanners flood developers with false positives.

AgentShield assigns a confidence score to each finding based on whether the config is detected as active at runtime. Template or example files are scored at 0.25x weighting, and there’s a hard cap of 10 deduction points per score category from a single template file. This scoring strategy balances alerting on genuinely risky runtime configs while reducing noise from harmless examples.

This approach also reflects in the exit codes: a scan exit code 0 means no critical runtime issues, while exit code 2 flags critical findings that matter at runtime.

The tradeoff is clear: some risky examples might be under-weighted if they become active unexpectedly, which means users must still vet their config deployment practices carefully. However, the practical benefit is a far more actionable report.

Another interesting feature is the three-agent “Opus” adversarial analysis mode, which runs multiple agent instances to simulate complex injection and hooking attack scenarios. This adds depth to the scanning logic by testing dynamic agent interactions.

Auto-fix functionality is available for certain patterns, reducing manual remediation effort. For example, replacing hardcoded API keys with environment variable references can be done with --fix.

quick start: scanning your Claude Code config

AgentShield is easy to run without installation, thanks to npx:

# Scan your Claude Code config (no install required)
npx ecc-agentshield scan

For a global install:

# Or install globally
npm install -g ecc-agentshield
agentshield scan

The tool auto-discovers your ~/.claude/ directory, scanning all config files except common generated directories like node_modules and build outputs to avoid duplicate findings.

Sample output looks like this:

  AgentShield Security Report

  Grade: F (0/100)

  Score Breakdown
  Secrets        ░░░░░░░░░░░░░░░░░░░░ 0
  Permissions    ░░░░░░░░░░░░░░░░░░░░ 0
  Hooks          ░░░░░░░░░░░░░░░░░░░░ 0
  MCP Servers    ░░░░░░░░░░░░░░░░░░░░ 0
  Agents         ░░░░░░░░░░░░░░░░░░░░ 0

  ● CRITICAL  Hardcoded Anthropic API key
    CLAUDE.md:13
    Evidence: sk-ant-a...cdef
    Fix: Replace with environment variable reference [auto-fixable]

  ● CRITICAL  Overly permissive allow rule: Bash(*)
    settings.json
    Evidence: Bash(*)
    Fix: Restrict to specific commands: Bash(git *), Bash(npm *), Bash(node *)

  Summary
  Files scanned: 6
  Findings: 73 total — 19 critical, 29 high, 15 medium, 4 low, 6 info
  Auto-fixable: 8 (use --fix)

Additional runtime management commands include:

# Install the PreToolUse runtime monitor
agentshield runtime install

# Back up invalid runtime files and restore a healthy install
agentshield runtime repair

verdict: a focused tool for AI agent security auditing with practical tradeoffs

AgentShield is a practical tool built for a very specific but increasingly important niche: securing AI agent configurations, especially in Claude Code environments. Its strength lies in its runtime confidence scoring system that meaningfully reduces false positives from example and template files, a problem that plagues many static analyzers in this space.

The tool is well-structured, with a clean TypeScript codebase and a modular rule system that covers a wide range of security vectors from secrets to prompt injections. The inclusion of auto-fix and adversarial simulation modes adds value beyond simple scanning.

That said, AgentShield is not a silver bullet. Its runtime confidence model relies on heuristics and weighting that require users to understand their deployment context well. False negatives remain a risk if runtime detection is incomplete. Also, the tool’s focus on Claude Code means it may not be as effective for other AI agent frameworks.

For developers and security engineers working with Claude Code agents or similar AI agent platforms, AgentShield is a tool worth adding to your security arsenal. It helps catch a broad spectrum of misconfigurations and secrets exposure with actionable reports and remediation options.

Its CLI-first approach with GitHub CI integration also supports real-world workflows where continuous security auditing is essential.

In short, AgentShield solves a real problem with a practical design. It’s worth exploring if you want to tighten security around your AI agents without drowning in false alarms from harmless examples.

→ GitHub Repo: affaan-m/agentshield ⭐ 584 · TypeScript

Noureddine RAMDI / AgentShield: auditing AI agent security configurations with runtime confidence scoring

What AgentShield does and how it audits AI agent configurations

runtime confidence system: reducing false positives from example configs

quick start: scanning your Claude Code config

verdict: a focused tool for AI agent security auditing with practical tradeoffs