Inference-Server on Noureddine RAMDI

Inference-Server on Noureddine RAMDIhttps://ramdi.fr/tags/inference-server/Recent content in Inference-Server on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batchinghttps://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.