Claw-Eval offers a Python-based evaluation harness for LLM autonomous agents, featuring 300 tasks and a strict Pass^3 metric to ensure reliable, multi-dimensional benchmarking.
LLM-MM-Agent uses LLMs as autonomous agents for end-to-end mathematical modeling, featuring a unique hierarchical method library with actor-critic selection. Supports GPT-4o and DeepSeek-R1.
Minds Platform offers a Python-based AI foundation with autonomous agents and semantic search, designed for flexible enterprise deployment across cloud and on-prem environments.
Goal-Driven offers a prompt-based master-subagent architecture to sustain long-running AI problem-solving sessions through a verification-driven orchestration loop without code or frameworks.
The awesome-agent-evolution repo organizes 50+ open-source projects into a clear taxonomy of AI agent self-evolution and infrastructure layers, offering a practical ecosystem map for developers.
BoxPwnr benchmarks LLM-based autonomous agents on cybersecurity challenges using iterative command execution in a Kali Docker container, supporting 20+ LLM models and 13+ platforms.
Explore how aws-samples/remote-swe-agents runs autonomous software engineering agents in dedicated EC2 instances orchestrated by AWS CDK with a Next.js interface and Amazon Bedrock LLM integration.
Symphony by OpenAI orchestrates autonomous coding agents via work boards and proof-of-work validation, shifting AI coding from direct supervision to task-level management.