Noureddine RAMDI

🚀

Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

1 results for Agent-Evaluation

Claw-Eval: a rigorous Python harness for trustworthy evaluation of LLM-powered autonomous agents
Claw-Eval offers a Python-based evaluation harness for LLM autonomous agents, featuring 300 tasks and a strict Pass^3 metric to ensure reliable, multi-dimensional benchmarking.
github-stars python llm agent-evaluation sandboxing Created Sat, 23 May 2026 20:41:14 +0000