unslop is a Python CLI tool that detects repetitive defaults in LLM outputs by empirical analysis, generating reusable anti-pattern profiles to improve prompt engineering.
A curated repository for AI/LLM penetration testing covering prompt injection, adversarial ML, and LLM red teaming with the OWASP LLM Top 10 framework.
llm_wiki uses a two-step chain-of-thought pipeline to build a self-maintaining knowledge base. It combines Tauri, knowledge graphs, and Louvain clustering for a unique personal wiki experience.
Explore an open-source course that teaches building a production-grade AI assistant using advanced retrieval-augmented generation, agent orchestration, fine-tuning, and LLMOps practices.
A curated catalog of free-tier LLM APIs compatible with OpenAI SDK, detailing rate limits, model specs, and providers to build zero-cost AI applications.
A-MEM is a Python agentic memory system that dynamically organizes LLM agent memories using semantic embeddings and automatic linking, inspired by Zettelkasten.
Explore a comprehensive LLM course with practical notebooks on fine-tuning (QLoRA, DPO), quantization (GPTQ), and tools like AutoEval and LazyMergekit. Ideal for aspiring LLM engineers.
Hermes Agent is a Python AI agent featuring closed learning loops, autonomous skill creation, multi-model support, and seamless Telegram/Discord integration for persistent, adaptable AI workflows.
LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, including LoRA, QLoRA, and reinforcement learning.
LocalAI enables running 36+ AI models locally without GPU, supporting multi-user API access and built-in AI agents with OpenAI-compatible APIs. Here’s how it works and why it matters.
mem0 enhances AI agent memory with a new single-pass ADD-only extraction algorithm and multi-signal retrieval, boosting benchmarks significantly while simplifying memory management.
MetaGPT uses a multi-agent system with defined GPT roles following SOPs to automate software development from one-line prompts. It simulates a software company with role-based AI collaboration.
Ollama simplifies running and managing open-source large language models locally with a unified CLI and REST API, supporting broad integrations and multi-OS support.
vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.
TradingAgents uses specialized LLM agents in a structured bull/bear debate to mimic real trading firms. Supports 10+ LLMs, persistent memory, and CLI/Docker usage.
Qwen Code is a TypeScript terminal AI coding agent that abstracts multiple LLM providers behind a unified config, enabling flexible AI workflows with Skills and SubAgents.
Part 2 of 4: a benchmark journal across nixpkgs llama.cpp, upstream master, and ik_llama.cpp on Qwen3.6-27B. Six hours, four backends, all converging at 66 tok/s — and the physical reason why.
Part 3 of 4: a deep-dive into why speculative decoding silently breaks (or runs anti-economically) on hybrid attention+SSM architectures like Qwen3.6, Mamba-2, and RWKV — and what would need to change upstream to fix it.
Part 4 of 4: the actual NixOS module, llama-pull helper, claude-code-router wiring, and one-line workflow for switching models. Five Nix files for a complete, isolated, rollback-able local LLM service.
Part 1 of 4: motivation, hardware, and stack choices for serving Qwen3.6-27B locally on a 32 GB consumer GPU with NixOS, before any benchmarks or trade-offs kick in.