<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Metal on Noureddine RAMDI</title><link>https://ramdi.fr/tags/metal/</link><description>Recent content in Metal on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/metal/index.xml" rel="self" type="application/rss+xml"/><item><title>Zinc: A Zig-based LLM inference engine optimized for AMD RDNA and Apple Silicon GPUs</title><link>https://ramdi.fr/github-stars/zinc-a-zig-based-llm-inference-engine-optimized-for-amd-rdna-and-apple-silicon-gpus/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/zinc-a-zig-based-llm-inference-engine-optimized-for-amd-rdna-and-apple-silicon-gpus/</guid><description>Zinc is a Zig-written LLM inference engine using Vulkan and Metal for AMD RDNA and Apple Silicon GPUs. It supports GGUF quantized models and exposes an OpenAI-compatible API with streaming.</description></item><item><title>dflash-mlx: Speculative decoding on Apple Silicon with Metal and MLX</title><link>https://ramdi.fr/github-stars/dflash-mlx-speculative-decoding-on-apple-silicon-with-metal-and-mlx/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/dflash-mlx-speculative-decoding-on-apple-silicon-with-metal-and-mlx/</guid><description>dflash-mlx implements exact speculative decoding for language models on Apple Silicon using Metal and MLX, reducing forward passes with a block-diffusion draft model and per-layer KV cache rollback.</description></item><item><title>vllm-mlx: Efficient LLM serving on Apple Silicon with SSD-tiered KV cache and continuous batching</title><link>https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/vllm-mlx-efficient-llm-serving-on-apple-silicon-with-ssd-tiered-kv-cache-and-continuous-batching/</guid><description>vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.</description></item></channel></rss>