vLLM Compressor applies advanced quantization and compression techniques to large language models, enabling optimized inference without requiring full model definitions.
kvcached provides a plugin cache layer for SGLang and vLLM Python LLM environments, easing deployment with PyPI and Docker support. Useful for optimizing LLM workflows.
OpenResearcher is a fully open 30B agentic LLM designed for deep research tasks, featuring a 96K-turn dataset and a self-built retriever over 11B tokens, running on vLLM with 8×A100 GPUs.