Vllm on Noureddine RAMDI

Vllm on Noureddine RAMDIhttps://ramdi.fr/tags/vllm/Recent content in Vllm on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000vLLM Compressor: Practical quantization and compression for large language model inferencehttps://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/vLLM Compressor applies advanced quantization and compression techniques to large language models, enabling optimized inference without requiring full model definitions.kvcached: a plugin cache for SGLang and vLLM Python environmentshttps://ramdi.fr/github-stars/kvcached-a-plugin-cache-for-sglang-and-vllm-python-environments/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/kvcached-a-plugin-cache-for-sglang-and-vllm-python-environments/kvcached provides a plugin cache layer for SGLang and vLLM Python LLM environments, easing deployment with PyPI and Docker support. Useful for optimizing LLM workflows.OpenResearcher: An open-source 30B LLM for long-horizon deep researchhttps://ramdi.fr/github-stars/openresearcher-an-open-source-30b-llm-for-long-horizon-deep-research/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/openresearcher-an-open-source-30b-llm-for-long-horizon-deep-research/OpenResearcher is a fully open 30B agentic LLM designed for deep research tasks, featuring a 96K-turn dataset and a self-built retriever over 11B tokens, running on vLLM with 8×A100 GPUs.