Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

6 results for Benchmarking

Clear filter

rPPG-Toolbox: unified benchmarking for camera-based physiological sensing
rPPG-Toolbox is a Python platform for benchmarking camera-based physiological sensing algorithms, unifying classical unsupervised and modern neural methods across multiple datasets with config-driven experimentation.
github-stars python machine-learning benchmarking physiological-sensing Created Mon, 06 Jul 2026 15:15:52 +0000
Claw-Eval: a rigorous Python harness for trustworthy evaluation of LLM-powered autonomous agents
Claw-Eval offers a Python-based evaluation harness for LLM autonomous agents, featuring 300 tasks and a strict Pass^3 metric to ensure reliable, multi-dimensional benchmarking.
github-stars python llm agent-evaluation sandboxing Created Sat, 23 May 2026 20:41:14 +0000
Harvey LAB: Benchmarking legal LLM agents with realistic tasks and automated scoring
Harvey LAB offers an open-source benchmark for evaluating LLM agents on realistic legal tasks using an all-pass rubric and LLM-as-judge scoring. It includes datasets, adapters, and dashboards.
github-stars python llm benchmarking legal-ai Created Sat, 23 May 2026 20:41:14 +0000
BoxPwnr: benchmarking autonomous LLM agents on cybersecurity challenges with iterative command execution
BoxPwnr benchmarks LLM-based autonomous agents on cybersecurity challenges using iterative command execution in a Kali Docker container, supporting 20+ LLM models and 13+ platforms.
github-stars python llm cybersecurity benchmarking Created Mon, 04 May 2026 10:23:01 +0000
Cua: A unified stack for background desktop automation agents across macOS, Linux, Windows, and Android
Cua provides a multi-component open-source stack for building and benchmarking computer-use agents that control full desktops without disrupting user focus, across macOS, Linux, Windows, and Android.
github-stars automation macos virtualization multi-agent Created Sun, 26 Apr 2026 23:47:28 +0000
AutoGPT: A modular platform for continuous AI agents and workflow automation
AutoGPT is a Python-based platform for building and managing continuous AI agents that automate workflows, featuring a modular architecture, low-code agent creation, and benchmarking tools.
github-stars python ai-agents workflow-automation docker Created Sun, 26 Apr 2026 17:51:11 +0000

Noureddine RAMDI Dinour

Organizations

rPPG-Toolbox: unified benchmarking for camera-based physiological sensing

Claw-Eval: a rigorous Python harness for trustworthy evaluation of LLM-powered autonomous agents

Harvey LAB: Benchmarking legal LLM agents with realistic tasks and automated scoring

BoxPwnr: benchmarking autonomous LLM agents on cybersecurity challenges with iterative command execution

Cua: A unified stack for background desktop automation agents across macOS, Linux, Windows, and Android

AutoGPT: A modular platform for continuous AI agents and workflow automation