kvcached provides a plugin cache layer for SGLang and vLLM Python LLM environments, easing deployment with PyPI and Docker support. Useful for optimizing LLM workflows.
LLM-God bundles multiple LLM web interfaces into a single Electron app, using DOM injection to send prompts to all models simultaneously. It offers a clever free-tier workaround with tradeoffs.
Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.
LycheeMemory offers a lightweight semantic memory system for LLM agents, cutting token use by 71% and costs by 55% compared to native memory, with SQLite + LanceDB backend and REST/MCP APIs.
MAGI implements a multi-round debate protocol among three LLMs to match stronger models’ accuracy via iterative critique and voting. It offers fault tolerance, adaptive escalation, and persona presets.
A curated catalog of 20+ LLM agent frameworks and tools organized by agent type and capabilities. Understand architectural differences and trade-offs in LLM agent design.
Memary is an open-source memory layer for AI agents using knowledge graphs and recursive retrieval to efficiently store and query agent memories. It supports multi-agent setups and integrates with LlamaIndex and OpenAI.
Meta-Harness from Stanford IRIS Lab automates the search for optimal harness configurations around LLMs, evolving memory, retrieval, and context systems for better task-specific performance.
OASIS is a Python CLI security auditor using LangGraph-orchestrated LLMs for two-phase scanning and deterministic validation of code vulnerabilities. It balances AI insights with guardrails to reduce false positives.
OpenGame from CUHK MMLab generates full web games from natural language prompts using a dual-skill LLM architecture that maintains cross-file consistency and integration fixes.
OpenKB compiles documents into a persistent, interlinked wiki using LLMs and PageIndex’s vectorless retrieval, supporting multi-LLM backends and interactive chat with persisted sessions.
Orion bypasses CoreML to access Apple’s Neural Engine directly via private frameworks, enabling on-device inference and fine-tuning of small LLMs with 8.5x reduced training overhead.
PageLM is an open-source TypeScript platform orchestrating multi-LLM workflows to generate interactive educational content from documents with real-time streaming and multi-backend support.
PasteGuard intercepts API calls to OpenAI and Anthropic, masking over 30 types of sensitive data across 24 languages before reaching AI providers. Simple integration by changing base URL.
pdftochat is a TypeScript-based PDF-to-chat app leveraging Chroma Cloud for hybrid vector search and Together.ai for LLMs, integrating multiple cloud services for scalable document Q&A.
Resume Matcher uses LiteLLM to unify six LLM providers for AI-powered resume tailoring, with a FastAPI backend and Next.js frontend. It supports local and cloud deployments with PDF export.
SmallClaw is a TypeScript AI agent framework that uses a single LLM call for chat and tool invocation, designed for local models with a clean web UI and structured tools.
TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.
A curated repo breaking down large language model internals with numeric attention math, tokenization, and transformer architecture, targeting engineers who want to understand LLMs under the hood.
vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.