Noureddine RAMDI

🚀

Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

1 results for Inference-Server

vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batching
vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.
github-stars python apple-silicon machine-learning inference-server Created Mon, 04 May 2026 10:23:02 +0000