<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Inference-Server on Noureddine RAMDI</title><link>https://ramdi.fr/tags/inference-server/</link><description>Recent content in Inference-Server on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/inference-server/index.xml" rel="self" type="application/rss+xml"/><item><title>vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batching</title><link>https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/</guid><description>vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.</description></item></channel></rss>