Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

4 results for Evaluation

Harvey LAB: Benchmarking legal LLM agents with realistic tasks and automated scoring
Harvey LAB offers an open-source benchmark for evaluating LLM agents on realistic legal tasks using an all-pass rubric and LLM-as-judge scoring. It includes datasets, adapters, and dashboards.
github-stars python llm benchmarking legal-ai Created Sat, 23 May 2026 20:41:14 +0000
Skill Conductor: Architecture-first lifecycle management for AI agent skills
Skill Conductor enforces design patterns and uses a 5-mode lifecycle to manage AI agent skills, avoiding common pitfalls like the ‘description trap’ for more reliable skill development.
github-stars python ai agent skill-management Created Sat, 23 May 2026 20:41:14 +0000
google/agents-cli: a Python CLI for AI agent lifecycle management on Google Cloud
google/agents-cli enhances coding assistants with skills for building, evaluating, and deploying AI agents on Google Cloud’s ADK. It offers a modular CLI workflow covering agent scaffolding to observability.
github-stars python cli ai-agent google-cloud Created Tue, 05 May 2026 16:46:42 +0000
Agenta: a comprehensive LLMOps platform for prompt management and evaluation
Agenta is an open-source TypeScript LLMOps platform offering prompt management, evaluation across 50+ models, and production observability with OpenTelemetry. Self-host via Docker Compose.
github-stars typescript llmops prompt-engineering evaluation Created Mon, 04 May 2026 10:23:02 +0000