Llm on Noureddine RAMDI

Llm on Noureddine RAMDIhttps://ramdi.fr/tags/llm/Recent content in Llm on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000ai-interview-codex: iterative AI system design and interview prep with real-world benchmarkshttps://ramdi.fr/github-stars/ai-interview-codex-iterative-ai-system-design-and-interview-prep-with-real-world-benchmarks/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/ai-interview-codex-iterative-ai-system-design-and-interview-prep-with-real-world-benchmarks/ai-interview-codex offers a practical AI interview prep guide featuring iterative system design for Agentic AI and RAG, with benchmarks and production insights for ML, LLM, and system design roles.Arkon: Structured enterprise knowledge synthesis with a unique LLM compilation pipelinehttps://ramdi.fr/github-stars/arkon-structured-enterprise-knowledge-synthesis-with-a-unique-llm-compilation-pipeline/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/arkon-structured-enterprise-knowledge-synthesis-with-a-unique-llm-compilation-pipeline/Arkon is a self-hosted enterprise knowledge hub using a novel MRP pipeline for structured, traceable wiki compilation with external AI inference and workspace-scoped RBAC.AutoSkill: Experience-driven lifelong learning for LLM agents with skill versioning and evolutionhttps://ramdi.fr/github-stars/autoskill-experience-driven-lifelong-learning-for-llm-agents-with-skill-versioning-and-evolution/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/autoskill-experience-driven-lifelong-learning-for-llm-agents-with-skill-versioning-and-evolution/AutoSkill is a Python framework enabling LLM agents to extract, version, and evolve skills from dialogues, providing a persistent long-term memory system for AI agents.ChatTutor: enabling AI tutors to teach STEM visually with a Vue + Bun full-stackhttps://ramdi.fr/github-stars/chattutor-enabling-ai-tutors-to-teach-stem-visually-with-a-vue-bun-full-stack/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/chattutor-enabling-ai-tutors-to-teach-stem-visually-with-a-vue-bun-full-stack/ChatTutor integrates AI tutors with visual tools like Geogebra in a Vue + Bun full-stack. It supports multiple LLM providers and offers a digital whiteboard for interactive STEM learning.Claw-Eval: a rigorous Python harness for trustworthy evaluation of LLM-powered autonomous agentshttps://ramdi.fr/github-stars/claw-eval-a-rigorous-python-harness-for-trustworthy-evaluation-of-llm-powered-autonomous-agents/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/claw-eval-a-rigorous-python-harness-for-trustworthy-evaluation-of-llm-powered-autonomous-agents/Claw-Eval offers a Python-based evaluation harness for LLM autonomous agents, featuring 300 tasks and a strict Pass^3 metric to ensure reliable, multi-dimensional benchmarking.Comic Translate: AI-driven multi-language comic translation with full-page contexthttps://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.DeepTeam: A Python framework for adversarial red teaming of large language modelshttps://ramdi.fr/github-stars/deepteam-a-python-framework-for-adversarial-red-teaming-of-large-language-models/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/deepteam-a-python-framework-for-adversarial-red-teaming-of-large-language-models/DeepTeam is a Python tool for red teaming LLMs by dynamically generating adversarial attacks and evaluating vulnerabilities like bias. It requires minimal setup and no predefined datasets.DeepZero: Automating Windows Kernel Driver Vulnerability Research with YAML-Driven LLM Pipelineshttps://ramdi.fr/github-stars/deepzero-automating-windows-kernel-driver-vulnerability-research-with-yaml-driven-llm-pipelines/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/deepzero-automating-windows-kernel-driver-vulnerability-research-with-yaml-driven-llm-pipelines/DeepZero automates vulnerability research on Windows kernel drivers by chaining Ghidra decompilation with LLM-based analysis using YAML pipelines and Jinja2 templates.Dot: an offline Electron desktop app for local LLM inference and document QAhttps://ramdi.fr/github-stars/dot-an-offline-electron-desktop-app-for-local-llm-inference-and-document-qa/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/dot-an-offline-electron-desktop-app-for-local-llm-inference-and-document-qa/Dot bundles local LLM inference, Retrieval Augmented Generation, and Text-To-Speech into a single offline Electron app, enabling document QA without cloud dependencies.FuzzyAI: AI-Driven Fuzz Testing with Local LLM Integrationhttps://ramdi.fr/github-stars/fuzzyai-ai-driven-fuzz-testing-with-local-llm-integration/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/fuzzyai-ai-driven-fuzz-testing-with-local-llm-integration/FuzzyAI combines fuzz testing with AI models using Python and Ollama. It offers a CLI for fuzzing with local LLMs, balancing AI power and practical setup tradeoffs.Harvey LAB: Benchmarking legal LLM agents with realistic tasks and automated scoringhttps://ramdi.fr/github-stars/harvey-lab-benchmarking-legal-llm-agents-with-realistic-tasks-and-automated-scoring/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/harvey-lab-benchmarking-legal-llm-agents-with-realistic-tasks-and-automated-scoring/Harvey LAB offers an open-source benchmark for evaluating LLM agents on realistic legal tasks using an all-pass rubric and LLM-as-judge scoring. It includes datasets, adapters, and dashboards.Inside Mini-SGLang: A clear and modular Python LLM inference enginehttps://ramdi.fr/github-stars/inside-mini-sglang-a-clear-and-modular-python-llm-inference-engine/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/inside-mini-sglang-a-clear-and-modular-python-llm-inference-engine/Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.Inside picoagents: a transparent multi-agent system framework built from scratch in Pythonhttps://ramdi.fr/github-stars/inside-picoagents-a-transparent-multi-agent-system-framework-built-from-scratch-in-python/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/inside-picoagents-a-transparent-multi-agent-system-framework-built-from-scratch-in-python/PicoAgents is a Python multi-agent framework built from scratch, offering transparent agent orchestration, LLM provider abstraction, streaming UI, and production-ready benchmarks.Inside Xalgorix: an LLM-driven autonomous pentesting platform with a 22-phase testing pipelinehttps://ramdi.fr/github-stars/inside-xalgorix-an-llm-driven-autonomous-pentesting-platform-with-a-22-phase-testing-pipeline/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/inside-xalgorix-an-llm-driven-autonomous-pentesting-platform-with-a-22-phase-testing-pipeline/Xalgorix is a Go-based autonomous pentesting platform driven by LLMs, featuring a 22-phase methodology from recon to exploit verification, with live telemetry and reporting.IntellAgent: systematic adversarial testing for conversational AI with policy graph decompositionhttps://ramdi.fr/github-stars/intellagent-systematic-adversarial-testing-for-conversational-ai-with-policy-graph-decomposition/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/intellagent-systematic-adversarial-testing-for-conversational-ai-with-policy-graph-decomposition/IntellAgent is a Python framework that stress-tests conversational AI agents by generating structured adversarial dialogues via policy graph decomposition, helping uncover blind spots before production.Kimi-Audio: a unified hybrid-token audio foundation model with LLM corehttps://ramdi.fr/github-stars/kimi-audio-a-unified-hybrid-token-audio-foundation-model-with-llm-core/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/kimi-audio-a-unified-hybrid-token-audio-foundation-model-with-llm-core/Kimi-Audio combines continuous acoustic and discrete semantic tokens within a 7B LLM for unified audio-text understanding and generation. It achieves state-of-the-art ASR with low-latency audio synthesis.LiveCaptions Translator: Real-time speech translation using Windows 11's built-in captions and LLM APIshttps://ramdi.fr/github-stars/livecaptions-translator-real-time-speech-translation-using-windows-11-s-built-in-captions-and-llm-apis/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/livecaptions-translator-real-time-speech-translation-using-windows-11-s-built-in-captions-and-llm-apis/LiveCaptions Translator taps Windows 11’s on-device LiveCaptions for real-time speech translation via multiple LLM and traditional APIs, all in a sleek C# desktop app.LiveTradeBench: Evaluating LLM-driven trading agents in live marketshttps://ramdi.fr/github-stars/livetradebench-evaluating-llm-driven-trading-agents-in-live-markets/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/livetradebench-evaluating-llm-driven-trading-agents-in-live-markets/LiveTradeBench benchmarks LLM trading agents like GPT and Claude in live US equity and prediction markets with real-time news and sentiment integration.LLM-MM-Agent: autonomous mathematical modeling with hierarchical method selectionhttps://ramdi.fr/github-stars/llm-mm-agent-autonomous-mathematical-modeling-with-hierarchical-method-selection/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/llm-mm-agent-autonomous-mathematical-modeling-with-hierarchical-method-selection/LLM-MM-Agent uses LLMs as autonomous agents for end-to-end mathematical modeling, featuring a unique hierarchical method library with actor-critic selection. Supports GPT-4o and DeepSeek-R1.LLM4Pentest: A curated knowledge hub on large language models for automated penetration testinghttps://ramdi.fr/github-stars/llm4pentest-a-curated-knowledge-hub-on-large-language-models-for-automated-penetration-testing/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/llm4pentest-a-curated-knowledge-hub-on-large-language-models-for-automated-penetration-testing/LLM4Pentest aggregates 40+ research papers and tools tracking the evolving role of LLMs in automated penetration testing, highlighting progress and limitations.llmstxt_architect: automated generation and maintenance of llms.txt files for LLM-aware websiteshttps://ramdi.fr/github-stars/llmstxt-architect-automated-generation-and-maintenance-of-llms-txt-files-for-llm-aware-websites/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/llmstxt-architect-automated-generation-and-maintenance-of-llms-txt-files-for-llm-aware-websites/llmstxt_architect automates generating and updating llms.txt files that communicate website content to LLMs. Supports multi-provider LLMs and preserves file structure during updates.macai: a unified native macOS AI chat client for multiple LLM providershttps://ramdi.fr/github-stars/macai-a-unified-native-macos-ai-chat-client-for-multiple-llm-providers/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/macai-a-unified-native-macos-ai-chat-client-for-multiple-llm-providers/macai is a native macOS AI chat client unifying access to major LLM providers with iCloud Sync and local inference support, offering a minimalist cross-device AI chat experience.MarkPDFDown: converting PDFs to Markdown using vision-capable large language modelshttps://ramdi.fr/github-stars/markpdfdown-converting-pdfs-to-markdown-using-vision-capable-large-language-models/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/markpdfdown-converting-pdfs-to-markdown-using-vision-capable-large-language-models/MarkPDFDown is a Python CLI tool that converts PDFs and images into Markdown by using vision-capable large language models for visual recognition-based parsing, handling complex layouts and formulas.Minimalist Python AI demos: exploring qxresearch-event-1's concise LLM patternshttps://ramdi.fr/github-stars/minimalist-python-ai-demos-exploring-qxresearch-event-1-s-concise-llm-patterns/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/minimalist-python-ai-demos-exploring-qxresearch-event-1-s-concise-llm-patterns/qxresearch-event-1 is a collection of 50+ minimalist Python apps showcasing core AI patterns like fine-tuning, vector DB, and Whisper in about 10 lines each. A practical learning resource.Navigating the evolving landscape of LLM-based multi-agent systems: A survey paper repositoryhttps://ramdi.fr/github-stars/navigating-the-evolving-landscape-of-llm-based-multi-agent-systems-a-survey-paper-repository/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/navigating-the-evolving-landscape-of-llm-based-multi-agent-systems-a-survey-paper-repository/A curated and frequently updated bibliography accompanying the IJCAI 2024 survey paper on LLM-based multi-agent systems, organizing research into five key categories and revealing emerging trends.Navigating the LLM engineer handbook: a curated map for production-grade language modelshttps://ramdi.fr/github-stars/navigating-the-llm-engineer-handbook-a-curated-map-for-production-grade-language-models/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/navigating-the-llm-engineer-handbook-a-curated-map-for-production-grade-language-models/The LLM Engineer Handbook catalogs the full lifecycle of large language model engineering, from pretraining to prompt management, guiding engineers beyond demos to production-ready LLM apps.NomAI: a multi-step AI nutrition analysis app combining Flutter and FastAPIhttps://ramdi.fr/github-stars/nomai-a-multi-step-ai-nutrition-analysis-app-combining-flutter-and-fastapi/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/nomai-a-multi-step-ai-nutrition-analysis-app-combining-flutter-and-fastapi/NomAI combines a Flutter app with a FastAPI backend using a multi-step LLM pipeline and web-grounded reasoning for nutrition analysis and meal tracking.npcpy: enforcing AI behavioral compliance through architecture for multimodal LLM appshttps://ramdi.fr/github-stars/npcpy-enforcing-ai-behavioral-compliance-through-architecture-for-multimodal-llm-apps/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/npcpy-enforcing-ai-behavioral-compliance-through-architecture-for-multimodal-llm-apps/npcpy offers a unique NPC Context-Agent-Tool data layer to enforce AI compliance via software architecture, supporting multimodal LLM apps and multi-agent systems with local and cloud providers.obsidian-llm-wiki-local: local-first AI-powered wiki generation with human-in-the-loop feedbackhttps://ramdi.fr/github-stars/obsidian-llm-wiki-local-local-first-ai-powered-wiki-generation-with-human-in-the-loop-feedback/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/obsidian-llm-wiki-local-local-first-ai-powered-wiki-generation-with-human-in-the-loop-feedback/obsidian-llm-wiki-local generates interlinked Obsidian markdown wikis using local LLMs. Its standout feature is a rejection feedback loop that refines article quality via user input.Open Computer Use: orchestrating multi-model LLM pipelines for remote Linux desktop controlhttps://ramdi.fr/github-stars/open-computer-use-orchestrating-multi-model-llm-pipelines-for-remote-linux-desktop-control/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/open-computer-use-orchestrating-multi-model-llm-pipelines-for-remote-linux-desktop-control/Open Computer Use uses a modular three-stage LLM pipeline to control a cloud Linux desktop, combining grounding, vision, and action models for flexible AI-driven automation.OpenAgents: orchestrating multi-agent LLM workflows with Flask and Next.jshttps://ramdi.fr/github-stars/openagents-orchestrating-multi-agent-llm-workflows-with-flask-and-next-js/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/openagents-orchestrating-multi-agent-llm-workflows-with-flask-and-next-js/OpenAgents hosts three specialized LLM agents—Data, Plugins, Web—via a Flask API and Next.js UI, integrating sandboxed code execution, plugin selection, and browser automation.OpenAnt: An LLM-powered two-stage vulnerability discovery tool with exploit validationhttps://ramdi.fr/github-stars/openant-an-llm-powered-two-stage-vulnerability-discovery-tool-with-exploit-validation/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/openant-an-llm-powered-two-stage-vulnerability-discovery-tool-with-exploit-validation/OpenAnt uses a two-stage LLM pipeline to detect and validate code vulnerabilities across multiple languages, reducing false positives by verifying exploits automatically.OpenChronicle: an AX-first local memory layer for LLM agentshttps://ramdi.fr/github-stars/openchronicle-an-ax-first-local-memory-layer-for-llm-agents/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/openchronicle-an-ax-first-local-memory-layer-for-llm-agents/OpenChronicle captures macOS accessibility events to build structured local memory for LLM agents. Its async pipeline produces persistent Markdown memory and an SQLite index.OptiLLM: transparent inference-time scaling for improved LLM reasoninghttps://ramdi.fr/github-stars/optillm-transparent-inference-time-scaling-for-improved-llm-reasoning/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/optillm-transparent-inference-time-scaling-for-improved-llm-reasoning/OptiLLM is an OpenAI-compatible inference proxy that boosts LLM reasoning with 20+ techniques like Mixture of Agents and MCTS, requiring no model retraining. Use a simple prefix to improve accuracy 2-10x.Paper2Any: multi-modal AI pipeline converting academic papers into editable scientific artifactshttps://ramdi.fr/github-stars/paper2any-multi-modal-ai-pipeline-converting-academic-papers-into-editable-scientific-artifacts/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/paper2any-multi-modal-ai-pipeline-converting-academic-papers-into-editable-scientific-artifacts/Paper2Any uses chained LLM calls with structured output to convert academic papers into editable scientific figures, slides, and diagrams via a FastAPI backend and React frontend.ReasoningBank: Experience-Driven Memory as a New Scaling Dimension for AI Agentshttps://ramdi.fr/github-stars/reasoningbank-experience-driven-memory-as-a-new-scaling-dimension-for-ai-agents/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/reasoningbank-experience-driven-memory-as-a-new-scaling-dimension-for-ai-agents/ReasoningBank introduces memory-aware test-time scaling for AI agents by storing reasoning traces from both successes and failures, enabling self-evolution through experience.SuperClaude: Meta-programming Claude Code into a structured AI development platformhttps://ramdi.fr/github-stars/superclaude-meta-programming-claude-code-into-a-structured-ai-development-platform/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/superclaude-meta-programming-claude-code-into-a-structured-ai-development-platform/SuperClaude transforms Claude Code into a structured AI development platform using behavioral instruction injection, 30 slash commands, 20 specialized agents, and 8 MCP server integrations for faster, token-efficient workflows.SupoClip: self-hostable AI-powered video clipping with multi-LLM backend abstractionhttps://ramdi.fr/github-stars/supoclip-self-hostable-ai-powered-video-clipping-with-multi-llm-backend-abstraction/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/supoclip-self-hostable-ai-powered-video-clipping-with-multi-llm-backend-abstraction/SupoClip is an open-source self-hosted AI video clipper using AssemblyAI transcription and multiple LLM backends including local Ollama. It runs on Docker Compose with FastAPI and Next.js.Swark: generating architecture diagrams from code using GitHub Copilot in VS Codehttps://ramdi.fr/github-stars/swark-generating-architecture-diagrams-from-code-using-github-copilot-in-vs-code/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/swark-generating-architecture-diagrams-from-code-using-github-copilot-in-vs-code/Swark is a VS Code extension that creates Mermaid.js architecture diagrams from any code using GitHub Copilot’s free tier via the VS Code Language Model API—no API keys needed.vLLM Compressor: Practical quantization and compression for large language model inferencehttps://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/vLLM Compressor applies advanced quantization and compression techniques to large language models, enabling optimized inference without requiring full model definitions.Weave: a Go microkernel platform for hot-pluggable AI application developmenthttps://ramdi.fr/github-stars/weave-a-go-microkernel-platform-for-hot-pluggable-ai-application-development/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/weave-a-go-microkernel-platform-for-hot-pluggable-ai-application-development/Weave is a Go-based AI platform with a microkernel architecture that supports hot-pluggable AI plugins and dynamic multi-model switching. Deploy with Docker Compose for rapid development.yoagent: a minimal Rust AI agent runtime with multi-provider LLM supporthttps://ramdi.fr/github-stars/yoagent-a-minimal-rust-ai-agent-runtime-with-multi-provider-llm-support/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/yoagent-a-minimal-rust-ai-agent-runtime-with-multi-provider-llm-support/yoagent is a minimal Rust library implementing an AI agent loop with multi-provider LLM support, built-in tools, and event streaming for clean, extensible agent workflows.Aider: precise AI pair programming with whole-codebase awarenesshttps://ramdi.fr/github-stars/aider-precise-ai-pair-programming-with-whole-codebase-awareness/Sat, 09 May 2026 11:42:26 +0000https://ramdi.fr/github-stars/aider-precise-ai-pair-programming-with-whole-codebase-awareness/Aider is a terminal-based AI pair programming tool that builds a repository map for full codebase context, enabling precise, developer-controlled edits with multi-LLM support and git integration.Inside Claude Code From Scratch: A practical reconstruction of Anthropic's coding agenthttps://ramdi.fr/github-stars/inside-claude-code-from-scratch-a-practical-reconstruction-of-anthropic-s-coding-agent/Sat, 09 May 2026 11:42:26 +0000https://ramdi.fr/github-stars/inside-claude-code-from-scratch-a-practical-reconstruction-of-anthropic-s-coding-agent/Claude Code From Scratch distills Anthropic’s 500K+ line coding agent into ~8,000 lines of Python and TypeScript, revealing core architecture like the Agent Loop, semantic memory, multi-agent skills, and context compression.OpenAlpha_Evolve: autonomous code evolution with LLM-driven diff mutationshttps://ramdi.fr/github-stars/openalpha-evolve-autonomous-code-evolution-with-llm-driven-diff-mutations/Sat, 09 May 2026 11:42:26 +0000https://ramdi.fr/github-stars/openalpha-evolve-autonomous-code-evolution-with-llm-driven-diff-mutations/OpenAlpha_Evolve uses large language models to generate precise code diffs as mutations in an evolutionary algorithm, enabling autonomous iterative code improvement with sandboxed evaluation.RAGFlow: a modular, agentic retrieval-augmented generation engine with deep document understandinghttps://ramdi.fr/github-stars/ragflow-a-modular-agentic-retrieval-augmented-generation-engine-with-deep-document-understanding/Wed, 06 May 2026 18:58:37 +0000https://ramdi.fr/github-stars/ragflow-a-modular-agentic-retrieval-augmented-generation-engine-with-deep-document-understanding/RAGFlow is an open-source Python RAG engine combining deep document parsing, configurable pipelines, agentic workflows, and sandboxed code execution for LLM context management.rtk: A Rust CLI proxy that cuts LLM token usage by up to 90% with transparent command rewritinghttps://ramdi.fr/github-stars/rtk-a-rust-cli-proxy-that-cuts-llm-token-usage-by-up-to-90-with-transparent-command-rewriting/Wed, 06 May 2026 18:58:37 +0000https://ramdi.fr/github-stars/rtk-a-rust-cli-proxy-that-cuts-llm-token-usage-by-up-to-90-with-transparent-command-rewriting/rtk is a Rust CLI proxy that intercepts shell commands to reduce LLM token consumption by 60-90% using a transparent Bash hook and output filtering, supporting 100+ commands.10x CLI coding agent: tiered AI model routing for faster coding workflowshttps://ramdi.fr/github-stars/10x-cli-coding-agent-tiered-ai-model-routing-for-faster-coding-workflows/Tue, 05 May 2026 22:24:55 +0000https://ramdi.fr/github-stars/10x-cli-coding-agent-tiered-ai-model-routing-for-faster-coding-workflows/10x is a TypeScript CLI coding agent that speeds coding up to 20x by routing tasks across a tiered AI model system with customizable multi-step workflows called Superpowers.Langchain-Chatchat: A model-agnostic orchestration layer for Chinese-language RAG and Agentshttps://ramdi.fr/github-stars/langchain-chatchat-a-model-agnostic-orchestration-layer-for-chinese-language-rag-and-agents/Tue, 05 May 2026 22:24:55 +0000https://ramdi.fr/github-stars/langchain-chatchat-a-model-agnostic-orchestration-layer-for-chinese-language-rag-and-agents/Langchain-Chatchat offers a flexible, offline-capable orchestration layer for multiple Chinese LLMs and RAG approaches, enabling seamless model swaps across frameworks without code changes.Langfuse: Simplifying LLM observability with decorator-based tracinghttps://ramdi.fr/github-stars/langfuse-simplifying-llm-observability-with-decorator-based-tracing/Tue, 05 May 2026 22:24:55 +0000https://ramdi.fr/github-stars/langfuse-simplifying-llm-observability-with-decorator-based-tracing/Langfuse provides end-to-end observability for LLM applications with automatic tracing via an @observe() decorator, enabling teams to debug and manage AI workflows efficiently.Open Cowork: Desktop AI Agent with VM-level Sandbox Isolation for Safer AI Workflowshttps://ramdi.fr/github-stars/open-cowork-desktop-ai-agent-with-vm-level-sandbox-isolation-for-safer-ai-workflows/Tue, 05 May 2026 22:24:55 +0000https://ramdi.fr/github-stars/open-cowork-desktop-ai-agent-with-vm-level-sandbox-isolation-for-safer-ai-workflows/Open Cowork wraps multiple LLMs in a cross-platform desktop app with unique VM-level sandboxing using WSL2 and Lima for safe AI agent command execution.Quivr: A Python framework for flexible retrieval-augmented generation pipelineshttps://ramdi.fr/github-stars/quivr-a-python-framework-for-flexible-retrieval-augmented-generation-pipelines/Tue, 05 May 2026 22:24:55 +0000https://ramdi.fr/github-stars/quivr-a-python-framework-for-flexible-retrieval-augmented-generation-pipelines/Quivr is a Python framework offering an opinionated, pluggable retrieval-augmented generation pipeline with multi-LLM support and YAML-defined workflows for flexible knowledge retrieval.Inside Genie Sim 3.0: LLM-driven embodied AI simulation with high-fidelity 3D sceneshttps://ramdi.fr/github-stars/inside-genie-sim-3-0-llm-driven-embodied-ai-simulation-with-high-fidelity-3d-scenes/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/inside-genie-sim-3-0-llm-driven-embodied-ai-simulation-with-high-fidelity-3d-scenes/Genie Sim 3.0 is an open-source platform combining 3D Gaussian Splatting and LLM-driven scene generation for embodied AI simulation, offering large-scale synthetic data and low sim-to-real discrepancy.Open Deep Research: A Next.js 16 agentic AI assistant for iterative web researchhttps://ramdi.fr/github-stars/open-deep-research-a-next-js-16-agentic-ai-assistant-for-iterative-web-research/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/open-deep-research-a-next-js-16-agentic-ai-assistant-for-iterative-web-research/Open Deep Research is a TypeScript Next.js 16 app that uses an LLM to plan, execute, and iterate web research via Exa and Upstash QStash, producing sourced reports with images.Action100M: Hierarchical Tree-of-Captions for Multi-Scale Video Understandinghttps://ramdi.fr/github-stars/action100m-hierarchical-tree-of-captions-for-multi-scale-video-understanding/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/action100m-hierarchical-tree-of-captions-for-multi-scale-video-understanding/Action100M provides a hierarchical Tree-of-Captions annotation for 100M video segments, enabling multi-scale video understanding with LLM-generated captions. Explore its structure, tech strengths, and how to access the data.Alibaba's Qwen3.6: Efficient large-scale LLMs with gated delta networks and sparse MoEhttps://ramdi.fr/github-stars/alibaba-s-qwen3-6-efficient-large-scale-llms-with-gated-delta-networks-and-sparse-moe/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/alibaba-s-qwen3-6-efficient-large-scale-llms-with-gated-delta-networks-and-sparse-moe/Qwen3.6 from Alibaba uses gated delta networks and sparse Mixture-of-Experts to achieve near-397B parameter model performance with only 3B active parameters, supporting 201 languages and 262k context length.autoMate: a local-first AI hub exposing 40+ tools via MCP-over-HTTPhttps://ramdi.fr/github-stars/automate-a-local-first-ai-hub-exposing-40-tools-via-mcp-over-http/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/automate-a-local-first-ai-hub-exposing-40-tools-via-mcp-over-http/autoMate exposes 40+ AI tools and 31 SaaS APIs via MCP-over-HTTP on localhost, with encrypted storage and multi-provider LLM support. A local AI infrastructure hub with privacy-first design.Building production-ready RAG workflows with n8n using free JSON templateshttps://ramdi.fr/github-stars/building-production-ready-rag-workflows-with-n8n-using-free-json-templates/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/building-production-ready-rag-workflows-with-n8n-using-free-json-templates/Explore over 200 pre-built n8n workflow templates integrating vector databases, embedding models, and LLMs for rapid RAG workflow prototyping and deployment without coding.ClawSync: A Convex-based multi-agent AI platform with shared soul documents and per-agent model routinghttps://ramdi.fr/github-stars/clawsync-a-convex-based-multi-agent-ai-platform-with-shared-soul-documents-and-per-agent-model-routing/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/clawsync-a-convex-based-multi-agent-ai-platform-with-shared-soul-documents-and-per-agent-model-routing/ClawSync offers a multi-agent AI platform using Convex backend, with shared soul documents for reusable personalities and per-agent model routing across popular LLMs. Explore its architecture and setup.DeepDrone: natural language AI control for drones with real-time telemetry and MAVLink integrationhttps://ramdi.fr/github-stars/deepdrone-natural-language-ai-control-for-drones-with-real-time-telemetry-and-mavlink-integration/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/deepdrone-natural-language-ai-control-for-drones-with-real-time-telemetry-and-mavlink-integration/DeepDrone uses LLMs to translate natural language commands into structured drone operations via MAVLink, with real-time telemetry and safety constraints. Python backend, FastAPI, LiteLLM, and JS frontend.Inside Company Research Agent: automating business intelligence with multi-API AI agentshttps://ramdi.fr/github-stars/inside-company-research-agent-automating-business-intelligence-with-multi-api-ai-agents/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/inside-company-research-agent-automating-business-intelligence-with-multi-api-ai-agents/Company Research Agent automates detailed business research by orchestrating OpenAI, Google Gemini, Tavily APIs and geolocation data via a Python backend and Node.js frontend. Setup script streamlines install.llm-wiki: orchestrating multi-agent LLM research into persistent knowledge baseshttps://ramdi.fr/github-stars/llm-wiki-orchestrating-multi-agent-llm-research-into-persistent-knowledge-bases/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/llm-wiki-orchestrating-multi-agent-llm-research-into-persistent-knowledge-bases/llm-wiki is a shell-based orchestration layer that turns LLM agents into a persistent, multi-agent research wiki. Supports up to 10 agents, deep investigations, and durable provenance tracking.Nanobrowser: multi-agent AI browser automation with dynamic self-correcting planninghttps://ramdi.fr/github-stars/nanobrowser-multi-agent-ai-browser-automation-with-dynamic-self-correcting-planning/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/nanobrowser-multi-agent-ai-browser-automation-with-dynamic-self-correcting-planning/Nanobrowser is a TypeScript Chrome extension implementing a multi-agent AI system for browser automation with a unique self-correcting Planner-Navigator architecture, supporting multiple LLMs and local privacy.PokieTicker: layered AI-driven stock market analysis with sentiment and XGBoosthttps://ramdi.fr/github-stars/pokieticker-layered-ai-driven-stock-market-analysis-with-sentiment-and-xgboost/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/pokieticker-layered-ai-driven-stock-market-analysis-with-sentiment-and-xgboost/PokieTicker combines rule-based filtering, LLM sentiment analysis, and XGBoost prediction in a full-stack stock analysis app. Runs locally with no API keys.Zeron Chat: A unified AI chat interface with resumable streaming for multi-LLM experimentationhttps://ramdi.fr/github-stars/zeron-chat-a-unified-ai-chat-interface-with-resumable-streaming-for-multi-llm-experimentation/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/zeron-chat-a-unified-ai-chat-interface-with-resumable-streaming-for-multi-llm-experimentation/Zeron Chat is a TypeScript React app that unifies multiple LLM providers in one interface with resumable streaming that survives page refreshes, built on TanStack Start and Zero state management.Zinc: A Zig-based LLM inference engine optimized for AMD RDNA and Apple Silicon GPUshttps://ramdi.fr/github-stars/zinc-a-zig-based-llm-inference-engine-optimized-for-amd-rdna-and-apple-silicon-gpus/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/zinc-a-zig-based-llm-inference-engine-optimized-for-amd-rdna-and-apple-silicon-gpus/Zinc is a Zig-written LLM inference engine using Vulkan and Metal for AMD RDNA and Apple Silicon GPUs. It supports GGUF quantized models and exposes an OpenAI-compatible API with streaming.ai-trader: AI-powered config-driven backtesting with natural language interactionhttps://ramdi.fr/github-stars/ai-trader-ai-powered-config-driven-backtesting-with-natural-language-interaction/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/ai-trader-ai-powered-config-driven-backtesting-with-natural-language-interaction/ai-trader adds natural language AI interaction to algorithmic trading backtesting via an MCP server and YAML configs. Supports US/TW stocks, crypto, forex with caching.Allium: a behavioral specification framework for intent persistence in AI agent engineeringhttps://ramdi.fr/github-stars/allium-a-behavioral-specification-framework-for-intent-persistence-in-ai-agent-engineering/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/allium-a-behavioral-specification-framework-for-intent-persistence-in-ai-agent-engineering/Allium addresses intent drift in AI agent sessions by capturing behaviors as formal specs that persist across interactions, exposing contradictions automatically.Building private AI workflows with the n8n self-hosted AI starter kithttps://ramdi.fr/github-stars/building-private-ai-workflows-with-the-n8n-self-hosted-ai-starter-kit/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/building-private-ai-workflows-with-the-n8n-self-hosted-ai-starter-kit/Spin up a private AI agent stack in under 5 minutes with n8n’s self-hosted AI starter kit. Combines local LLMs, automation, and vector search for secure AI workflows.Council of High Intelligence: orchestrating structured multi-agent AI deliberations across multiple LLMshttps://ramdi.fr/github-stars/council-of-high-intelligence-orchestrating-structured-multi-agent-ai-deliberations-across-multiple-llms/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/council-of-high-intelligence-orchestrating-structured-multi-agent-ai-deliberations-across-multiple-llms/Council of High Intelligence is a Shell tool coordinating 18 AI personas across Claude, OpenAI, Gemini, and Ollama, enforcing true disagreement via structured multi-round deliberations.DocsGPT: a flexible AI platform for private agents and enterprise document searchhttps://ramdi.fr/github-stars/docsgpt-a-flexible-ai-platform-for-private-agents-and-enterprise-document-search/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/docsgpt-a-flexible-ai-platform-for-private-agents-and-enterprise-document-search/DocsGPT is a Python-based AI platform for building private agents and enterprise search, with multi-LLM support and versatile deployment modes via Docker Compose.Exploring Claude API integration patterns with anthropics/claude-cookbookshttps://ramdi.fr/github-stars/exploring-claude-api-integration-patterns-with-anthropics-claude-cookbooks/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/exploring-claude-api-integration-patterns-with-anthropics-claude-cookbooks/anthropics/claude-cookbooks offers Jupyter Notebook recipes demonstrating practical Claude API usage, including sub-agent orchestration, multimodal vision, and RAG patterns.Gitingest: turning GitHub repos into AI-friendly text digests with a clever URL hackhttps://ramdi.fr/github-stars/gitingest-turning-github-repos-into-ai-friendly-text-digests-with-a-clever-url-hack/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/gitingest-turning-github-repos-into-ai-friendly-text-digests-with-a-clever-url-hack/Gitingest is a Python CLI and API that converts Git repos into LLM-optimized text digests, featuring a unique URL hack for instant GitHub repo ingestion and self-hosted FastAPI server.Hands-On Large Language Models: A practical, visual journey through LLM engineeringhttps://ramdi.fr/github-stars/hands-on-large-language-models-a-practical-visual-journey-through-llm-engineering/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/hands-on-large-language-models-a-practical-visual-journey-through-llm-engineering/Explore the Hands-On Large Language Models repo, a Jupyter notebook-based practical guide from fundamentals to fine-tuning, designed for hands-on LLM learning on free Colab GPUs.hf-agents: a shell CLI extension for hardware-aware local coding agents with llama.cpphttps://ramdi.fr/github-stars/hf-agents-a-shell-cli-extension-for-hardware-aware-local-coding-agents-with-llama-cpp/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/hf-agents-a-shell-cli-extension-for-hardware-aware-local-coding-agents-with-llama-cpp/hf-agents automates hardware profiling, model selection, and local coding agent deployment using llama.cpp and Pi, all in a shell CLI extension. Efficient and minimal dependencies.How the claude-plugins repo orchestrates multi-agent AI consultation with multiple LLMshttps://ramdi.fr/github-stars/how-the-claude-plugins-repo-orchestrates-multi-agent-ai-consultation-with-multiple-llms/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/how-the-claude-plugins-repo-orchestrates-multi-agent-ai-consultation-with-multiple-llms/claude-plugins is a TypeScript-based plugin marketplace for Claude Code, featuring a multi-agent consultant plugin that runs parallel LLMs like GPT-5, Gemini, Grok, Perplexity, and Claude for AI consultation.How video-use turns AI agents into transcript-driven video editorshttps://ramdi.fr/github-stars/how-video-use-turns-ai-agents-into-transcript-driven-video-editors/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/how-video-use-turns-ai-agents-into-transcript-driven-video-editors/video-use replaces frame-heavy editing with transcript-driven AI agents, using ElevenLabs Scribe and self-evaluation to produce polished edits.Inside NousResearch's finetuning-subnet: continuous incentivized fine-tuning for LLMs on Bittensorhttps://ramdi.fr/github-stars/inside-nousresearch-s-finetuning-subnet-continuous-incentivized-fine-tuning-for-llms-on-bittensor/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/inside-nousresearch-s-finetuning-subnet-continuous-incentivized-fine-tuning-for-llms-on-bittensor/NousResearch’s finetuning-subnet enables continuous, incentivized fine-tuning of LLMs using synthetic data from a separate subnet, pioneering cross-subnet communication in Bittensor.ISC-Bench: exposing fundamental AI safety failures from workflow-level designhttps://ramdi.fr/github-stars/isc-bench-exposing-fundamental-ai-safety-failures-from-workflow-level-design/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/isc-bench-exposing-fundamental-ai-safety-failures-from-workflow-level-design/ISC-Bench reveals a structural AI safety flaw where LLMs produce harmful outputs to complete tasks, bypassing prompt-level defenses. It benchmarks this workflow-level vulnerability across top models.KohakuTerrarium: Modular AI agent composition with algebraic pipelineshttps://ramdi.fr/github-stars/kohakuterrarium-modular-ai-agent-composition-with-algebraic-pipelines/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/kohakuterrarium-modular-ai-agent-composition-with-algebraic-pipelines/KohakuTerrarium offers a Python framework to build modular AI agents using a unique algebra for composing multi-agent pipelines, with session persistence and multi-runtime support.kvcached: a plugin cache for SGLang and vLLM Python environmentshttps://ramdi.fr/github-stars/kvcached-a-plugin-cache-for-sglang-and-vllm-python-environments/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/kvcached-a-plugin-cache-for-sglang-and-vllm-python-environments/kvcached provides a plugin cache layer for SGLang and vLLM Python LLM environments, easing deployment with PyPI and Docker support. Useful for optimizing LLM workflows.LLM-God: orchestrating multiple LLM web UIs in one Electron app with DOM injectionhttps://ramdi.fr/github-stars/llm-god-orchestrating-multiple-llm-web-uis-in-one-electron-app-with-dom-injection/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/llm-god-orchestrating-multiple-llm-web-uis-in-one-electron-app-with-dom-injection/LLM-God bundles multiple LLM web interfaces into a single Electron app, using DOM injection to send prompts to all models simultaneously. It offers a clever free-tier workaround with tradeoffs.Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyondhttps://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.LycheeMemory: a lightweight semantic long-term memory framework for LLM agentshttps://ramdi.fr/github-stars/lycheememory-a-lightweight-semantic-long-term-memory-framework-for-llm-agents/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/lycheememory-a-lightweight-semantic-long-term-memory-framework-for-llm-agents/LycheeMemory offers a lightweight semantic memory system for LLM agents, cutting token use by 71% and costs by 55% compared to native memory, with SQLite + LanceDB backend and REST/MCP APIs.MAGI: A structured multi-LLM debate system with iterative critique and votinghttps://ramdi.fr/github-stars/magi-a-structured-multi-llm-debate-system-with-iterative-critique-and-voting/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/magi-a-structured-multi-llm-debate-system-with-iterative-critique-and-voting/MAGI implements a multi-round debate protocol among three LLMs to match stronger models’ accuracy via iterative critique and voting. It offers fault tolerance, adaptive escalation, and persona presets.Mapping the LLM agent landscape with the awesome-llm-agents curated cataloghttps://ramdi.fr/github-stars/mapping-the-llm-agent-landscape-with-the-awesome-llm-agents-curated-catalog/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/mapping-the-llm-agent-landscape-with-the-awesome-llm-agents-curated-catalog/A curated catalog of 20+ LLM agent frameworks and tools organized by agent type and capabilities. Understand architectural differences and trade-offs in LLM agent design.Memary: Recursive Knowledge Graph Memory for Autonomous AI Agentshttps://ramdi.fr/github-stars/memary-recursive-knowledge-graph-memory-for-autonomous-ai-agents/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/memary-recursive-knowledge-graph-memory-for-autonomous-ai-agents/Memary is an open-source memory layer for AI agents using knowledge graphs and recursive retrieval to efficiently store and query agent memories. It supports multi-agent setups and integrates with LlamaIndex and OpenAI.Meta-Harness: evolving the scaffolding around large language models for optimized task performancehttps://ramdi.fr/github-stars/meta-harness-evolving-the-scaffolding-around-large-language-models-for-optimized-task-performance/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/meta-harness-evolving-the-scaffolding-around-large-language-models-for-optimized-task-performance/Meta-Harness from Stanford IRIS Lab automates the search for optimal harness configurations around LLMs, evolving memory, retrieval, and context systems for better task-specific performance.OASIS: a Python CLI for AI-driven code vulnerability scanning with deterministic validationhttps://ramdi.fr/github-stars/oasis-a-python-cli-for-ai-driven-code-vulnerability-scanning-with-deterministic-validation/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/oasis-a-python-cli-for-ai-driven-code-vulnerability-scanning-with-deterministic-validation/OASIS is a Python CLI security auditor using LangGraph-orchestrated LLMs for two-phase scanning and deterministic validation of code vulnerabilities. It balances AI insights with guardrails to reduce false positives.OpenGame: generating playable web games from natural language with a dual-skill LLM frameworkhttps://ramdi.fr/github-stars/opengame-generating-playable-web-games-from-natural-language-with-a-dual-skill-llm-framework/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/opengame-generating-playable-web-games-from-natural-language-with-a-dual-skill-llm-framework/OpenGame from CUHK MMLab generates full web games from natural language prompts using a dual-skill LLM architecture that maintains cross-file consistency and integration fixes.OpenKB: A persistent, vectorless wiki knowledge base powered by LLMs and PageIndexhttps://ramdi.fr/github-stars/openkb-a-persistent-vectorless-wiki-knowledge-base-powered-by-llms-and-pageindex/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/openkb-a-persistent-vectorless-wiki-knowledge-base-powered-by-llms-and-pageindex/OpenKB compiles documents into a persistent, interlinked wiki using LLMs and PageIndex’s vectorless retrieval, supporting multi-LLM backends and interactive chat with persisted sessions.Orion: Direct access to Apple Neural Engine for on-device LLM traininghttps://ramdi.fr/github-stars/orion-direct-access-to-apple-neural-engine-for-on-device-llm-training/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/orion-direct-access-to-apple-neural-engine-for-on-device-llm-training/Orion bypasses CoreML to access Apple’s Neural Engine directly via private frameworks, enabling on-device inference and fine-tuning of small LLMs with 8.5x reduced training overhead.PageLM: orchestrating multi-provider LLM workflows for interactive learninghttps://ramdi.fr/github-stars/pagelm-orchestrating-multi-provider-llm-workflows-for-interactive-learning/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/pagelm-orchestrating-multi-provider-llm-workflows-for-interactive-learning/PageLM is an open-source TypeScript platform orchestrating multi-LLM workflows to generate interactive educational content from documents with real-time streaming and multi-backend support.PasteGuard: a local privacy proxy for masking sensitive data in LLM requestshttps://ramdi.fr/github-stars/pasteguard-a-local-privacy-proxy-for-masking-sensitive-data-in-llm-requests/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/pasteguard-a-local-privacy-proxy-for-masking-sensitive-data-in-llm-requests/PasteGuard intercepts API calls to OpenAI and Anthropic, masking over 30 types of sensitive data across 24 languages before reaching AI providers. Simple integration by changing base URL.pdftochat: a cloud-integrated PDF-to-chat system with hybrid vector searchhttps://ramdi.fr/github-stars/pdftochat-a-cloud-integrated-pdf-to-chat-system-with-hybrid-vector-search/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/pdftochat-a-cloud-integrated-pdf-to-chat-system-with-hybrid-vector-search/pdftochat is a TypeScript-based PDF-to-chat app leveraging Chroma Cloud for hybrid vector search and Together.ai for LLMs, integrating multiple cloud services for scalable document Q&A.Resume Matcher: A provider-agnostic AI platform for tailored resumes using LiteLLM abstractionhttps://ramdi.fr/github-stars/resume-matcher-a-provider-agnostic-ai-platform-for-tailored-resumes-using-litellm-abstraction/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/resume-matcher-a-provider-agnostic-ai-platform-for-tailored-resumes-using-litellm-abstraction/Resume Matcher uses LiteLLM to unify six LLM providers for AI-powered resume tailoring, with a FastAPI backend and Next.js frontend. It supports local and cloud deployments with PDF export.SmallClaw: a local-first AI agent framework with single-pass chat handlinghttps://ramdi.fr/github-stars/smallclaw-a-local-first-ai-agent-framework-with-single-pass-chat-handling/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/smallclaw-a-local-first-ai-agent-framework-with-single-pass-chat-handling/SmallClaw is a TypeScript AI agent framework that uses a single LLM call for chat and tool invocation, designed for local models with a clean web UI and structured tools.TextGen: a portable zero-config local LLM runner with multi-backend and multimodal supporthttps://ramdi.fr/github-stars/textgen-a-portable-zero-config-local-llm-runner-with-multi-backend-and-multimodal-support/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/textgen-a-portable-zero-config-local-llm-runner-with-multi-backend-and-multimodal-support/TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.Understanding LLM internals: a hands-on guide to transformers and attention mathhttps://ramdi.fr/github-stars/understanding-llm-internals-a-hands-on-guide-to-transformers-and-attention-math/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/understanding-llm-internals-a-hands-on-guide-to-transformers-and-attention-math/A curated repo breaking down large language model internals with numeric attention math, tokenization, and transformer architecture, targeting engineers who want to understand LLMs under the hood.vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batchinghttps://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.WriteHERE: dynamic recursive planning for AI-assisted long-form writing with real-time visualizationhttps://ramdi.fr/github-stars/writehere-dynamic-recursive-planning-for-ai-assisted-long-form-writing-with-real-time-visualization/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/writehere-dynamic-recursive-planning-for-ai-assisted-long-form-writing-with-real-time-visualization/WriteHERE uses recursive task decomposition to dynamically break down and execute long-form AI writing tasks, with real-time visualization of the agent’s thought process. Supports GPT and Claude.A practical taxonomy for large language model ensembles: Exploring the Awesome-LLM-Ensemble repositoryhttps://ramdi.fr/github-stars/a-practical-taxonomy-for-large-language-model-ensembles-exploring-the-awesome-llm-ensemble-repository/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/a-practical-taxonomy-for-large-language-model-ensembles-exploring-the-awesome-llm-ensemble-repository/The Awesome-LLM-Ensemble repo catalogs research on combining multiple LLMs with a clear three-phase taxonomy: before, during, and after inference ensemble methods.AI Knowledge Graph Generator: Building structured graphs from unstructured text with LLMshttps://ramdi.fr/github-stars/ai-knowledge-graph-generator-building-structured-graphs-from-unstructured-text-with-llms/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/ai-knowledge-graph-generator-building-structured-graphs-from-unstructured-text-with-llms/A Python tool that converts unstructured text into interactive knowledge graphs using a three-phase LLM pipeline with SPO triplet extraction, entity standardization, and relationship inference.Automating bank statement processing with YOLOv8, OCR, and LLMs for personal finance analysishttps://ramdi.fr/github-stars/automating-bank-statement-processing-with-yolov8-ocr-and-llms-for-personal-finance-analysis/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/automating-bank-statement-processing-with-yolov8-ocr-and-llms-for-personal-finance-analysis/Explore how a hybrid pipeline using YOLOv8 layout detection, OCR, and LLMs automates messy bank statement PDFs for personal finance analysis with RAG and AI agents.BoxPwnr: benchmarking autonomous LLM agents on cybersecurity challenges with iterative command executionhttps://ramdi.fr/github-stars/boxpwnr-benchmarking-autonomous-llm-agents-on-cybersecurity-challenges-with-iterative-command-execution/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/boxpwnr-benchmarking-autonomous-llm-agents-on-cybersecurity-challenges-with-iterative-command-execution/BoxPwnr benchmarks LLM-based autonomous agents on cybersecurity challenges using iterative command execution in a Kali Docker container, supporting 20+ LLM models and 13+ platforms.Curating quality: a curated list of essential books for large language model engineershttps://ramdi.fr/github-stars/curating-quality-a-curated-list-of-essential-books-for-large-language-model-engineers/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/curating-quality-a-curated-list-of-essential-books-for-large-language-model-engineers/A curated list of 24 rigorously selected books on LLM engineering, covering foundational theory to production deployment. Highlights a unique 6-step quality filtering process.DATAGEN: a LangGraph multi-agent framework for automated data analysis workflowshttps://ramdi.fr/github-stars/datagen-a-langgraph-multi-agent-framework-for-automated-data-analysis-workflows/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/datagen-a-langgraph-multi-agent-framework-for-automated-data-analysis-workflows/DATAGEN orchestrates eight specialized AI agents using LangGraph to automate data analysis workflows with progressive disclosure and multi-LLM provider support.DocStrange: A versatile Python library for LLM-optimized document parsing with dual-mode processinghttps://ramdi.fr/github-stars/docstrange-a-versatile-python-library-for-llm-optimized-document-parsing-with-dual-mode-processing/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/docstrange-a-versatile-python-library-for-llm-optimized-document-parsing-with-dual-mode-processing/DocStrange converts PDFs, DOCX, PPTX, XLSX, images, and URLs into LLM-ready Markdown, JSON, HTML, and CSV. It offers free cloud and private local GPU modes for flexible, privacy-compliant document parsing.Eclaire: a local-first AI assistant unifying your personal data with local LLMshttps://ramdi.fr/github-stars/eclaire-a-local-first-ai-assistant-unifying-your-personal-data-with-local-llms/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/eclaire-a-local-first-ai-assistant-unifying-your-personal-data-with-local-llms/Eclaire is a self-hosted AI assistant that unifies personal data with local LLM backends via an OpenAI-compatible API, emphasizing privacy and modular design.Inside llm-madness: a lightweight GPT transformer training pipeline with built-in visualizationhttps://ramdi.fr/github-stars/inside-llm-madness-a-lightweight-gpt-transformer-training-pipeline-with-built-in-visualization/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/inside-llm-madness-a-lightweight-gpt-transformer-training-pipeline-with-built-in-visualization/llm-madness offers a Python-built GPT-style transformer training pipeline with tokenizer training, memory-mapped datasets, and a unique web UI for per-layer attention inspection and loss visualization.Mind-Map-Wizard: AI-powered mind maps with a custom SVG rendering enginehttps://ramdi.fr/github-stars/mind-map-wizard-ai-powered-mind-maps-with-a-custom-svg-rendering-engine/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/mind-map-wizard-ai-powered-mind-maps-with-a-custom-svg-rendering-engine/Mind-Map-Wizard generates interactive mind maps from AI-generated markdown outlines using a custom SVG engine and keeps all data local for privacy.OpenClaude: a multi-model terminal-first coding agent CLI with practical agent routinghttps://ramdi.fr/github-stars/openclaude-a-multi-model-terminal-first-coding-agent-cli-with-practical-agent-routing/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/openclaude-a-multi-model-terminal-first-coding-agent-cli-with-practical-agent-routing/OpenClaude is a TypeScript CLI coding agent that routes tasks across different LLMs by type, optimizing cost and performance with multi-provider support and a unified terminal interface.OpenMythos: Exploring recurrent-depth transformers with input injection for sustained reasoninghttps://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/OpenMythos implements a recurrent-depth transformer that recycles layers via looped blocks, using input injection to prevent signal drift. It scales from 1B to 1T parameters with up to 1M token context.PAT3D: orchestrating text-to-3D simulation-ready scenes through a multi-stage AI and physics pipelinehttps://ramdi.fr/github-stars/pat3d-orchestrating-text-to-3d-simulation-ready-scenes-through-a-multi-stage-ai-and-physics-pipeline/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/pat3d-orchestrating-text-to-3d-simulation-ready-scenes-through-a-multi-stage-ai-and-physics-pipeline/PAT3D composes a 9-stage pipeline combining LLMs, vision models, 3D asset generators, and physics simulation to produce physically plausible, simulation-ready 3D scenes from text prompts.QuantaAlpha: LLM-driven trajectory-based self-evolution for quantitative alpha factor discoveryhttps://ramdi.fr/github-stars/quantaalpha-llm-driven-trajectory-based-self-evolution-for-quantitative-alpha-factor-discovery/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/quantaalpha-llm-driven-trajectory-based-self-evolution-for-quantitative-alpha-factor-discovery/QuantaAlpha uses large language models with evolutionary strategies to automate quantitative alpha factor discovery, achieving strong backtest metrics on major indices.ray-finance: a local-first, privacy-focused CLI financial advisor with encrypted context and LLM-powered advicehttps://ramdi.fr/github-stars/ray-finance-a-local-first-privacy-focused-cli-financial-advisor-with-encrypted-context-and-llm-powered-advice/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/ray-finance-a-local-first-privacy-focused-cli-financial-advisor-with-encrypted-context-and-llm-powered-advice/ray-finance is a TypeScript CLI tool that syncs bank data locally with AES-256 encryption, redacts PII before AI calls, and maintains persistent financial context for personalized LLM advice.RESTai: a multi-project AIaaS platform with agentic browser automation and visual AI pipelineshttps://ramdi.fr/github-stars/restai-a-multi-project-aiaas-platform-with-agentic-browser-automation-and-visual-ai-pipelines/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/restai-a-multi-project-aiaas-platform-with-agentic-browser-automation-and-visual-ai-pipelines/RESTai exposes multi-project AI capabilities via a unified REST API, featuring an agentic browser with Dockerized Playwright, knowledge graph RAG, and a visual Blockly pipeline builder.RsClaw: a Rust-native AI agent engine with persistent three-layer memory and multi-agent delegationhttps://ramdi.fr/github-stars/rsclaw-a-rust-native-ai-agent-engine-with-persistent-three-layer-memory-and-multi-agent-delegation/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/rsclaw-a-rust-native-ai-agent-engine-with-persistent-three-layer-memory-and-multi-agent-delegation/RsClaw is a Rust-based AI agent engine featuring persistent three-layer memory across sessions, multi-agent delegation, and low resource usage in a single 15MB binary.Stash: a shared agent memory with no server-side LLM callshttps://ramdi.fr/github-stars/stash-a-shared-agent-memory-with-no-server-side-llm-calls/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/stash-a-shared-agent-memory-with-no-server-side-llm-calls/Stash captures coding agent session transcripts for teams and builds a shared knowledge base without server-side LLM calls, preserving privacy and cutting costs.SwarmVault: a local-first knowledge compiler with contradiction detection and hybrid searchhttps://ramdi.fr/github-stars/swarmvault-a-local-first-knowledge-compiler-with-contradiction-detection-and-hybrid-search/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/swarmvault-a-local-first-knowledge-compiler-with-contradiction-detection-and-hybrid-search/SwarmVault compiles raw sources into a persistent Markdown wiki with typed knowledge graph, hybrid search, and contradiction detection. It supports 16+ agents and offline use.unslop: empirically detecting and avoiding repetitive LLM output patternshttps://ramdi.fr/github-stars/unslop-empirically-detecting-and-avoiding-repetitive-llm-output-patterns/Mon, 04 May 2026 10:11:02 +0000https://ramdi.fr/github-stars/unslop-empirically-detecting-and-avoiding-repetitive-llm-output-patterns/unslop is a Python CLI tool that detects repetitive defaults in LLM outputs by empirical analysis, generating reusable anti-pattern profiles to improve prompt engineering.AI penetration testing knowledge base: structured resources for LLM security researchhttps://ramdi.fr/github-stars/ai-penetration-testing-knowledge-base-structured-resources-for-llm-security-research/Mon, 04 May 2026 10:09:00 +0000https://ramdi.fr/github-stars/ai-penetration-testing-knowledge-base-structured-resources-for-llm-security-research/A curated repository for AI/LLM penetration testing covering prompt injection, adversarial ML, and LLM red teaming with the OWASP LLM Top 10 framework.Inside llm_wiki: a desktop app for building persistent LLM-powered personal wikishttps://ramdi.fr/github-stars/inside-llm-wiki-a-desktop-app-for-building-persistent-llm-powered-personal-wikis/Mon, 04 May 2026 10:05:49 +0000https://ramdi.fr/github-stars/inside-llm-wiki-a-desktop-app-for-building-persistent-llm-powered-personal-wikis/llm_wiki uses a two-step chain-of-thought pipeline to build a self-maintaining knowledge base. It combines Tauri, knowledge graphs, and Louvain clustering for a unique personal wiki experience.Building a production-ready second brain with agentic RAG and LLMOpshttps://ramdi.fr/github-stars/building-a-production-ready-second-brain-with-agentic-rag-and-llmops/Sun, 03 May 2026 08:12:11 +0000https://ramdi.fr/github-stars/building-a-production-ready-second-brain-with-agentic-rag-and-llmops/Explore an open-source course that teaches building a production-grade AI assistant using advanced retrieval-augmented generation, agent orchestration, fine-tuning, and LLMOps practices.Navigating free-tier LLM APIs with the awesome-free-llm-apis cataloghttps://ramdi.fr/github-stars/navigating-free-tier-llm-apis-with-the-awesome-free-llm-apis-catalog/Sun, 03 May 2026 08:12:11 +0000https://ramdi.fr/github-stars/navigating-free-tier-llm-apis-with-the-awesome-free-llm-apis-catalog/A curated catalog of free-tier LLM APIs compatible with OpenAI SDK, detailing rate limits, model specs, and providers to build zero-cost AI applications.A-MEM: dynamic semantic memory management for LLM agents inspired by Zettelkastenhttps://ramdi.fr/github-stars/a-mem-dynamic-semantic-memory-management-for-llm-agents-inspired-by-zettelkasten/Sun, 03 May 2026 00:54:10 +0000https://ramdi.fr/github-stars/a-mem-dynamic-semantic-memory-management-for-llm-agents-inspired-by-zettelkasten/A-MEM is a Python agentic memory system that dynamically organizes LLM agent memories using semantic embeddings and automatic linking, inspired by Zettelkasten.A hands-on course for mastering large language models: fine-tuning, quantization, and toolinghttps://ramdi.fr/github-stars/a-hands-on-course-for-mastering-large-language-models-fine-tuning-quantization-and-tooling/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/a-hands-on-course-for-mastering-large-language-models-fine-tuning-quantization-and-tooling/Explore a comprehensive LLM course with practical notebooks on fine-tuning (QLoRA, DPO), quantization (GPTQ), and tools like AutoEval and LazyMergekit. Ideal for aspiring LLM engineers.Hermes Agent: A self-improving AI agent with closed learning loops and multi-platform integrationhttps://ramdi.fr/github-stars/hermes-agent-a-self-improving-ai-agent-with-closed-learning-loops-and-multi-platform-integration/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/hermes-agent-a-self-improving-ai-agent-with-closed-learning-loops-and-multi-platform-integration/Hermes Agent is a Python AI agent featuring closed learning loops, autonomous skill creation, multi-model support, and seamless Telegram/Discord integration for persistent, adaptable AI workflows.LlamaFactory: modular, extensible fine-tuning framework for large language modelshttps://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, including LoRA, QLoRA, and reinforcement learning.LocalAI: running diverse AI models locally with multi-backend support and agent capabilitieshttps://ramdi.fr/github-stars/localai-running-diverse-ai-models-locally-with-multi-backend-support-and-agent-capabilities/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/localai-running-diverse-ai-models-locally-with-multi-backend-support-and-agent-capabilities/LocalAI enables running 36+ AI models locally without GPU, supporting multi-user API access and built-in AI agents with OpenAI-compatible APIs. Here’s how it works and why it matters.mem0: optimizing AI agent memory with a new single-pass additive algorithmhttps://ramdi.fr/github-stars/mem0-optimizing-ai-agent-memory-with-a-new-single-pass-additive-algorithm/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/mem0-optimizing-ai-agent-memory-with-a-new-single-pass-additive-algorithm/mem0 enhances AI agent memory with a new single-pass ADD-only extraction algorithm and multi-signal retrieval, boosting benchmarks significantly while simplifying memory management.MetaGPT: orchestrating multi-agent AI teams to automate software developmenthttps://ramdi.fr/github-stars/metagpt-orchestrating-multi-agent-ai-teams-to-automate-software-development/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/metagpt-orchestrating-multi-agent-ai-teams-to-automate-software-development/MetaGPT uses a multi-agent system with defined GPT roles following SOPs to automate software development from one-line prompts. It simulates a software company with role-based AI collaboration.Ollama: a unified CLI and API platform for local large language modelshttps://ramdi.fr/github-stars/ollama-a-unified-cli-and-api-platform-for-local-large-language-models/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/ollama-a-unified-cli-and-api-platform-for-local-large-language-models/Ollama simplifies running and managing open-source large language models locally with a unified CLI and REST API, supporting broad integrations and multi-OS support.vLLM: Efficient large language model serving with paged attention and continuous batchinghttps://ramdi.fr/github-stars/vllm-efficient-large-language-model-serving-with-paged-attention-and-continuous-batching/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/vllm-efficient-large-language-model-serving-with-paged-attention-and-continuous-batching/vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.TradingAgents: a multi-agent LLM framework simulating real-world trading firm dynamicshttps://ramdi.fr/github-stars/tradingagents-a-multi-agent-llm-framework-simulating-real-world-trading-firm-dynamics/Sat, 02 May 2026 07:48:10 +0000https://ramdi.fr/github-stars/tradingagents-a-multi-agent-llm-framework-simulating-real-world-trading-firm-dynamics/TradingAgents uses specialized LLM agents in a structured bull/bear debate to mimic real trading firms. Supports 10+ LLMs, persistent memory, and CLI/Docker usage.Qwen Code: A multi-provider terminal AI coding agent with unified config abstractionhttps://ramdi.fr/github-stars/qwen-code-a-multi-provider-terminal-ai-coding-agent-with-unified-config-abstraction/Tue, 28 Apr 2026 18:38:54 +0000https://ramdi.fr/github-stars/qwen-code-a-multi-provider-terminal-ai-coding-agent-with-unified-config-abstraction/Qwen Code is a TypeScript terminal AI coding agent that abstracts multiple LLM providers behind a unified config, enabling flexible AI workflows with Skills and SubAgents.Hunting Tokens/sec: 4 LLM Backends, 1 Hard Ceiling (Part 2/4)https://ramdi.fr/post/ai-llm/local-llm-tokens-per-second-benchmark/Tue, 28 Apr 2026 00:00:00 +0000https://ramdi.fr/post/ai-llm/local-llm-tokens-per-second-benchmark/Part 2 of 4: a benchmark journal across nixpkgs llama.cpp, upstream master, and ik_llama.cpp on Qwen3.6-27B. Six hours, four backends, all converging at 66 tok/s — and the physical reason why.Speculative Decoding Meets Hybrid SSM: Why It Breaks (Part 3/4)https://ramdi.fr/post/ai-llm/local-llm-speculative-decoding-hybrid-ssm/Tue, 28 Apr 2026 00:00:00 +0000https://ramdi.fr/post/ai-llm/local-llm-speculative-decoding-hybrid-ssm/Part 3 of 4: a deep-dive into why speculative decoding silently breaks (or runs anti-economically) on hybrid attention+SSM architectures like Qwen3.6, Mamba-2, and RWKV — and what would need to change upstream to fix it.The NixOS Setup for llama.cpp: Declarative and Reproducible (Part 4/4)https://ramdi.fr/post/ai-llm/local-llm-nixos-llama-server-module/Tue, 28 Apr 2026 00:00:00 +0000https://ramdi.fr/post/ai-llm/local-llm-nixos-llama-server-module/Part 4 of 4: the actual NixOS module, llama-pull helper, claude-code-router wiring, and one-line workflow for switching models. Five Nix files for a complete, isolated, rollback-able local LLM service.Why I Serve Qwen3.6 Locally on My RTX 5090 (Part 1/4)https://ramdi.fr/post/ai-llm/local-llm-rtx5090-why-nixos/Tue, 28 Apr 2026 00:00:00 +0000https://ramdi.fr/post/ai-llm/local-llm-rtx5090-why-nixos/Part 1 of 4: motivation, hardware, and stack choices for serving Qwen3.6-27B locally on a 32 GB consumer GPU with NixOS, before any benchmarks or trade-offs kick in.Forge: a Rust-based multi-agent AI coding assistant integrated into your terminal workflowhttps://ramdi.fr/github-stars/forge-a-rust-based-multi-agent-ai-coding-assistant-integrated-into-your-terminal-workflow/Sun, 26 Apr 2026 23:47:28 +0000https://ramdi.fr/github-stars/forge-a-rust-based-multi-agent-ai-coding-assistant-integrated-into-your-terminal-workflow/Forge is a Rust-based AI coding agent with multi-agent architecture and a unique ZSH plugin that intercepts shell commands for seamless terminal integration. It supports 300+ LLM providers.AutoGen: exploring multi-agent AI orchestration with Python in maintenance modehttps://ramdi.fr/github-stars/autogen-exploring-multi-agent-ai-orchestration-with-python-in-maintenance-mode/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/autogen-exploring-multi-agent-ai-orchestration-with-python-in-maintenance-mode/AutoGen is a Python framework for building multi-agent AI applications with LLM integration, now in maintenance mode with Microsoft Agent Framework as its successor. Learn its architecture, strengths, and how to get started.Context7: injecting real-time, version-specific docs into LLM workflowshttps://ramdi.fr/github-stars/context7-injecting-real-time-version-specific-docs-into-llm-workflows/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/context7-injecting-real-time-version-specific-docs-into-llm-workflows/Context7 tackles LLM hallucinations by injecting up-to-date, version-specific library docs directly into AI coding agents’ context via CLI or MCP server integration.DeerFlow 2.0: orchestrating multi-agent AI workflows with flexible LLM integrationhttps://ramdi.fr/github-stars/deerflow-2-0-orchestrating-multi-agent-ai-workflows-with-flexible-llm-integration/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/deerflow-2-0-orchestrating-multi-agent-ai-workflows-with-flexible-llm-integration/DeerFlow 2.0 is a Python framework for orchestrating AI sub-agents and memory with support for multiple LLMs and execution sandboxes. It uses a modular config and setup wizard for flexible deployment.Inside AI Engineering Hub: a hands-on collection of production-ready AI projectshttps://ramdi.fr/github-stars/inside-ai-engineering-hub-a-hands-on-collection-of-production-ready-ai-projects/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/inside-ai-engineering-hub-a-hands-on-collection-of-production-ready-ai-projects/AI Engineering Hub offers 90+ production-ready AI projects spanning LLMs, RAG, AI agents, and MCP, organized by difficulty and real-world use cases.Inside CowAgent: An extensible autonomous AI assistant with multi-modal and multi-model architecturehttps://ramdi.fr/github-stars/inside-cowagent-an-extensible-autonomous-ai-assistant-with-multi-modal-and-multi-model-architecture/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/inside-cowagent-an-extensible-autonomous-ai-assistant-with-multi-modal-and-multi-model-architecture/CowAgent is an extensible AI assistant framework with autonomous task planning, long-term memory, and multi-modal support. It integrates multiple LLMs and platforms for flexible AI workflows.Kong Gateway: A universal API gateway with advanced AI traffic routing and governancehttps://ramdi.fr/github-stars/kong-gateway-a-universal-api-gateway-with-advanced-ai-traffic-routing-and-governance/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/kong-gateway-a-universal-api-gateway-with-advanced-ai-traffic-routing-and-governance/Kong Gateway extends traditional API management with universal LLM API routing, semantic security, and AI-specific features, enabling multi-vendor AI traffic governance in cloud-native environments.LLM-driven browser automation with Browser-Use: a hands-on lookhttps://ramdi.fr/github-stars/llm-driven-browser-automation-with-browser-use-a-hands-on-look/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/llm-driven-browser-automation-with-browser-use-a-hands-on-look/Browser-Use is a Python library enabling LLM-powered AI agents to automate browsers efficiently. It features a custom ChatBrowserUse model and supports cloud and local agents.OpenHands: Modular architecture for flexible AI agent developmenthttps://ramdi.fr/github-stars/openhands-modular-architecture-for-flexible-ai-agent-development/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/openhands-modular-architecture-for-flexible-ai-agent-development/OpenHands offers a modular Python platform to build and deploy AI agents with SDK, CLI, GUI, and cloud options. It supports multiple LLMs and self-hosting for enterprises.Pathway LLM App: unified pipelines for scalable retrieval-augmented generation and AI searchhttps://ramdi.fr/github-stars/pathway-llm-app-unified-pipelines-for-scalable-retrieval-augmented-generation-and-ai-search/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/pathway-llm-app-unified-pipelines-for-scalable-retrieval-augmented-generation-and-ai-search/Pathway LLM App provides integrated pipelines for scalable RAG and AI search, combining vector and full-text indexing with real-time sync for Gen AI apps at scale.Awesome LLM Apps: a practical collection of runnable AI agent and RAG templateshttps://ramdi.fr/github-stars/awesome-llm-apps-a-practical-collection-of-runnable-ai-agent-and-rag-templates/Fri, 24 Apr 2026 18:26:13 +0000https://ramdi.fr/github-stars/awesome-llm-apps-a-practical-collection-of-runnable-ai-agent-and-rag-templates/Awesome LLM Apps offers 100+ runnable AI agent and RAG templates for quick LLM app development. It supports multiple providers and advanced multi-agent patterns with minimal setup.Inside daily_stock_analysis: a multi-LLM automated stock analysis systemhttps://ramdi.fr/github-stars/inside-daily-stock-analysis-a-multi-llm-automated-stock-analysis-system/Fri, 24 Apr 2026 18:26:13 +0000https://ramdi.fr/github-stars/inside-daily-stock-analysis-a-multi-llm-automated-stock-analysis-system/daily_stock_analysis combines multi-LLM integration with multi-source financial data to automate stock market decisions across global markets. It features flexible AI provider fallback and multi-channel alerts.Browser Harness: a self-healing LLM agent for browser automation via Chrome DevToolshttps://ramdi.fr/github-stars/browser-harness-a-self-healing-llm-agent-for-browser-automation-via-chrome-devtools/Fri, 24 Apr 2026 07:26:29 +0000https://ramdi.fr/github-stars/browser-harness-a-self-healing-llm-agent-for-browser-automation-via-chrome-devtools/Browser Harness enables LLMs to automate browsers by dynamically generating helper functions using the Chrome DevTools Protocol, with minimal Python code and free remote browsers.Building AI Agents with Claude Codehttps://ramdi.fr/post/ai-llm/building-ai-agents-with-claude/Sat, 11 Apr 2026 00:00:00 +0000https://ramdi.fr/post/ai-llm/building-ai-agents-with-claude/How to leverage Claude Code to build autonomous AI agents that publish content, review code, and manage workflows.