<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Vllm on Noureddine RAMDI</title><link>https://ramdi.fr/tags/vllm/</link><description>Recent content in Vllm on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/vllm/index.xml" rel="self" type="application/rss+xml"/><item><title>vLLM Compressor: Practical quantization and compression for large language model inference</title><link>https://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/</guid><description>vLLM Compressor applies advanced quantization and compression techniques to large language models, enabling optimized inference without requiring full model definitions.</description></item><item><title>kvcached: a plugin cache for SGLang and vLLM Python environments</title><link>https://ramdi.fr/github-stars/kvcached-a-plugin-cache-for-sglang-and-vllm-python-environments/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/kvcached-a-plugin-cache-for-sglang-and-vllm-python-environments/</guid><description>kvcached provides a plugin cache layer for SGLang and vLLM Python LLM environments, easing deployment with PyPI and Docker support. Useful for optimizing LLM workflows.</description></item><item><title>OpenResearcher: An open-source 30B LLM for long-horizon deep research</title><link>https://ramdi.fr/github-stars/openresearcher-an-open-source-30b-llm-for-long-horizon-deep-research/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/openresearcher-an-open-source-30b-llm-for-long-horizon-deep-research/</guid><description>OpenResearcher is a fully open 30B agentic LLM designed for deep research tasks, featuring a 96K-turn dataset and a self-built retriever over 11B tokens, running on vLLM with 8×A100 GPUs.</description></item></channel></rss>