ai-interview-codex offers a practical AI interview prep guide featuring iterative system design for Agentic AI and RAG, with benchmarks and production insights for ML, LLM, and system design roles.
Arkon is a self-hosted enterprise knowledge hub using a novel MRP pipeline for structured, traceable wiki compilation with external AI inference and workspace-scoped RBAC.
AutoSkill is a Python framework enabling LLM agents to extract, version, and evolve skills from dialogues, providing a persistent long-term memory system for AI agents.
ChatTutor integrates AI tutors with visual tools like Geogebra in a Vue + Bun full-stack. It supports multiple LLM providers and offers a digital whiteboard for interactive STEM learning.
Claw-Eval offers a Python-based evaluation harness for LLM autonomous agents, featuring 300 tasks and a strict Pass^3 metric to ensure reliable, multi-dimensional benchmarking.
Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.
DeepTeam is a Python tool for red teaming LLMs by dynamically generating adversarial attacks and evaluating vulnerabilities like bias. It requires minimal setup and no predefined datasets.
DeepZero automates vulnerability research on Windows kernel drivers by chaining Ghidra decompilation with LLM-based analysis using YAML pipelines and Jinja2 templates.
Dot bundles local LLM inference, Retrieval Augmented Generation, and Text-To-Speech into a single offline Electron app, enabling document QA without cloud dependencies.
FuzzyAI combines fuzz testing with AI models using Python and Ollama. It offers a CLI for fuzzing with local LLMs, balancing AI power and practical setup tradeoffs.
Harvey LAB offers an open-source benchmark for evaluating LLM agents on realistic legal tasks using an all-pass rubric and LLM-as-judge scoring. It includes datasets, adapters, and dashboards.
Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.
PicoAgents is a Python multi-agent framework built from scratch, offering transparent agent orchestration, LLM provider abstraction, streaming UI, and production-ready benchmarks.
Xalgorix is a Go-based autonomous pentesting platform driven by LLMs, featuring a 22-phase methodology from recon to exploit verification, with live telemetry and reporting.
IntellAgent is a Python framework that stress-tests conversational AI agents by generating structured adversarial dialogues via policy graph decomposition, helping uncover blind spots before production.
Kimi-Audio combines continuous acoustic and discrete semantic tokens within a 7B LLM for unified audio-text understanding and generation. It achieves state-of-the-art ASR with low-latency audio synthesis.
LiveCaptions Translator taps Windows 11’s on-device LiveCaptions for real-time speech translation via multiple LLM and traditional APIs, all in a sleek C# desktop app.
LiveTradeBench benchmarks LLM trading agents like GPT and Claude in live US equity and prediction markets with real-time news and sentiment integration.
LLM-MM-Agent uses LLMs as autonomous agents for end-to-end mathematical modeling, featuring a unique hierarchical method library with actor-critic selection. Supports GPT-4o and DeepSeek-R1.
LLM4Pentest aggregates 40+ research papers and tools tracking the evolving role of LLMs in automated penetration testing, highlighting progress and limitations.