Overview
Posts
6
GitHub Stars
1328
Noureddine RAMDI
🚀
Noureddine RAMDI
Dinour
Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation
France
noureddine@ramdi.fr
https://ramdi.fr
Organizations
Overview
Posts
6
GitHub Stars
1328
1
results for
Inference-Server
Clear filter
vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batching
vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.
github-stars
python
apple-silicon
machine-learning
inference-server
Created
Mon, 04 May 2026 10:23:02 +0000
Previous
Next