Ai-Safety on Noureddine RAMDI

Ai-Safety on Noureddine RAMDIhttps://ramdi.fr/tags/ai-safety/Recent content in Ai-Safety on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000DeepTeam: A Python framework for adversarial red teaming of large language modelshttps://ramdi.fr/github-stars/deepteam-a-python-framework-for-adversarial-red-teaming-of-large-language-models/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/deepteam-a-python-framework-for-adversarial-red-teaming-of-large-language-models/DeepTeam is a Python tool for red teaming LLMs by dynamically generating adversarial attacks and evaluating vulnerabilities like bias. It requires minimal setup and no predefined datasets.npcpy: enforcing AI behavioral compliance through architecture for multimodal LLM appshttps://ramdi.fr/github-stars/npcpy-enforcing-ai-behavioral-compliance-through-architecture-for-multimodal-llm-apps/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/npcpy-enforcing-ai-behavioral-compliance-through-architecture-for-multimodal-llm-apps/npcpy offers a unique NPC Context-Agent-Tool data layer to enforce AI compliance via software architecture, supporting multimodal LLM apps and multi-agent systems with local and cloud providers.Inside Claude Code: A detailed reconstruction of Anthropic's AI safety and architecturehttps://ramdi.fr/github-stars/inside-claude-code-a-detailed-reconstruction-of-anthropic-s-ai-safety-and-architecture/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/inside-claude-code-a-detailed-reconstruction-of-anthropic-s-ai-safety-and-architecture/A deep dive into Claude Code’s 512K lines of TypeScript reveals a layered YOLO safety classifier, multi-agent IPC, and terminal UI rendering—key to Anthropic’s AI production system.ISC-Bench: exposing fundamental AI safety failures from workflow-level designhttps://ramdi.fr/github-stars/isc-bench-exposing-fundamental-ai-safety-failures-from-workflow-level-design/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/isc-bench-exposing-fundamental-ai-safety-failures-from-workflow-level-design/ISC-Bench reveals a structural AI safety flaw where LLMs produce harmful outputs to complete tasks, bypassing prompt-level defenses. It benchmarks this workflow-level vulnerability across top models.