<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Evaluation on Noureddine RAMDI</title><link>https://ramdi.fr/tags/evaluation/</link><description>Recent content in Evaluation on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/evaluation/index.xml" rel="self" type="application/rss+xml"/><item><title>Harvey LAB: Benchmarking legal LLM agents with realistic tasks and automated scoring</title><link>https://ramdi.fr/github-stars/harvey-lab-benchmarking-legal-llm-agents-with-realistic-tasks-and-automated-scoring/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/harvey-lab-benchmarking-legal-llm-agents-with-realistic-tasks-and-automated-scoring/</guid><description>Harvey LAB offers an open-source benchmark for evaluating LLM agents on realistic legal tasks using an all-pass rubric and LLM-as-judge scoring. It includes datasets, adapters, and dashboards.</description></item><item><title>Skill Conductor: Architecture-first lifecycle management for AI agent skills</title><link>https://ramdi.fr/github-stars/skill-conductor-architecture-first-lifecycle-management-for-ai-agent-skills/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/skill-conductor-architecture-first-lifecycle-management-for-ai-agent-skills/</guid><description>Skill Conductor enforces design patterns and uses a 5-mode lifecycle to manage AI agent skills, avoiding common pitfalls like the &amp;lsquo;description trap&amp;rsquo; for more reliable skill development.</description></item><item><title>google/agents-cli: a Python CLI for AI agent lifecycle management on Google Cloud</title><link>https://ramdi.fr/github-stars/google-agents-cli-a-python-cli-for-ai-agent-lifecycle-management-on-google-cloud/</link><pubDate>Tue, 05 May 2026 16:46:42 +0000</pubDate><guid>https://ramdi.fr/github-stars/google-agents-cli-a-python-cli-for-ai-agent-lifecycle-management-on-google-cloud/</guid><description>google/agents-cli enhances coding assistants with skills for building, evaluating, and deploying AI agents on Google Cloud&amp;rsquo;s ADK. It offers a modular CLI workflow covering agent scaffolding to observability.</description></item><item><title>Agenta: a comprehensive LLMOps platform for prompt management and evaluation</title><link>https://ramdi.fr/github-stars/agenta-a-comprehensive-llmops-platform-for-prompt-management-and-evaluation/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/agenta-a-comprehensive-llmops-platform-for-prompt-management-and-evaluation/</guid><description>Agenta is an open-source TypeScript LLMOps platform offering prompt management, evaluation across 50+ models, and production observability with OpenTelemetry. Self-host via Docker Compose.</description></item></channel></rss>