Zinc is a Zig-written LLM inference engine using Vulkan and Metal for AMD RDNA and Apple Silicon GPUs. It supports GGUF quantized models and exposes an OpenAI-compatible API with streaming.
dflash-mlx implements exact speculative decoding for language models on Apple Silicon using Metal and MLX, reducing forward passes with a block-diffusion draft model and per-layer KV cache rollback.
vllm-mlx is a Python inference server for Apple Silicon that supports OpenAI and Anthropic APIs, featuring SSD-tiered KV cache for long-context agents and continuous batching for performance.