Metal on Noureddine RAMDI

Metal on Noureddine RAMDIhttps://ramdi.fr/tags/metal/Recent content in Metal on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000Zinc: A Zig-based LLM inference engine optimized for AMD RDNA and Apple Silicon GPUshttps://ramdi.fr/github-stars/zinc-a-zig-based-llm-inference-engine-optimized-for-amd-rdna-and-apple-silicon-gpus/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/zinc-a-zig-based-llm-inference-engine-optimized-for-amd-rdna-and-apple-silicon-gpus/Zinc is a Zig-written LLM inference engine using Vulkan and Metal for AMD RDNA and Apple Silicon GPUs. It supports GGUF quantized models and exposes an OpenAI-compatible API with streaming.dflash-mlx: Speculative decoding on Apple Silicon with Metal and MLXhttps://ramdi.fr/github-stars/dflash-mlx-speculative-decoding-on-apple-silicon-with-metal-and-mlx/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/dflash-mlx-speculative-decoding-on-apple-silicon-with-metal-and-mlx/dflash-mlx implements exact speculative decoding for language models on Apple Silicon using Metal and MLX, reducing forward passes with a block-diffusion draft model and per-layer KV cache rollback.vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batchinghttps://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.