Overview
Posts
6
GitHub Stars
1328
Noureddine RAMDI
🚀
Noureddine RAMDI
Dinour
Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation
France
noureddine@ramdi.fr
https://ramdi.fr
Organizations
Overview
Posts
6
GitHub Stars
1328
1
results for
Agent-Evaluation
Clear filter
Claw-Eval: a rigorous Python harness for trustworthy evaluation of LLM-powered autonomous agents
Claw-Eval offers a Python-based evaluation harness for LLM autonomous agents, featuring 300 tasks and a strict Pass^3 metric to ensure reliable, multi-dimensional benchmarking.
github-stars
python
llm
agent-evaluation
sandboxing
Created
Sat, 23 May 2026 20:41:14 +0000
Previous
Next