Noureddine RAMDI / Running Stable Diffusion locally on Apple Silicon with Mochi Diffusion

Created Mon, 04 May 2026 10:23:02 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

MochiDiffusion/MochiDiffusion

Running Stable Diffusion locally on consumer hardware usually means dealing with hefty GPUs and large VRAM footprints. Mochi Diffusion takes a different route by harnessing Apple’s Core ML framework and the Neural Engine on Apple Silicon Macs to run these models with a memory footprint as low as 150MB. This is a stark contrast to the typical 4-8GB VRAM usually required for Stable Diffusion on CUDA-enabled GPUs.

How Mochi Diffusion runs Stable Diffusion models on Apple Silicon

Mochi Diffusion is a native macOS application written in SwiftUI that enables running Stable Diffusion and FLUX.2 Klein models entirely offline on Apple Silicon Macs (M1 and later). It uses Apple’s Core ML framework, which provides an abstraction layer to run machine learning models efficiently on Apple hardware.

The app takes advantage of the Apple Neural Engine (ANE), a dedicated hardware accelerator specialized for machine learning tasks. This enables Mochi Diffusion to squeeze large deep learning models into a surprisingly small runtime footprint — roughly 150MB of memory — while still maintaining reasonable inference speeds.

The architecture involves converting models into Core ML format, optimized specifically to tap into the Neural Engine’s capabilities. To do this, Mochi Diffusion uses Apple’s Core ML tools to convert Stable Diffusion models, including split_einsum variants designed to maximize execution efficiency on the ANE.

It supports image-to-image generation workflows, ControlNet integration for more guided generation, and the ability to add custom Core ML models. The app also includes a built-in gallery that preserves EXIF metadata, allowing users to manage their generated images along with relevant model and generation parameters.

All processing happens locally, so no cloud dependency or internet connection is required once the app and models are set up. The only system requirements are macOS 15.6 or later and an Apple Silicon chip (M1 or newer).

Technical strengths and tradeoffs of Mochi Diffusion

What sets Mochi Diffusion apart technically is its tight integration with Apple’s Core ML and Neural Engine. Core ML is designed to abstract away hardware details while enabling efficient execution on CPUs, GPUs, and the Neural Engine. Mochi Diffusion specifically targets the ANE for running Stable Diffusion models, which is notable because these models typically demand large amounts of GPU VRAM.

The repo uses split_einsum model variants optimized for the ANE, which breaks down matrix multiplications into smaller parts that fit the hardware’s constraints better. This optimization is crucial to achieve the low memory footprint.

Another notable point is the app’s memory usage: it runs with around 150MB when using the Neural Engine. This is an order of magnitude smaller than typical Stable Diffusion setups on CUDA GPUs, which often require several GBs of VRAM. This low footprint means users can run advanced diffusion models on relatively modest hardware.

However, there is a tradeoff in startup latency. The Neural Engine requires up to 2 minutes on first use to compile a cached version of the model, which can feel slow compared to immediate GPU inference. This is a one-time cost per model, after which inference speeds improve.

Code quality-wise, the project is implemented in Swift and SwiftUI, which makes it idiomatic for macOS development. The codebase is surprisingly clean and well-organized, reflecting solid engineering practices in bridging machine learning model conversion, hardware-specific optimizations, and a native user interface.

The offline-only design prioritizes privacy and control but limits accessibility to users with Apple Silicon Macs running macOS 15.6 or later. This excludes Intel Macs or older macOS versions, which is a hard requirement due to the reliance on the Neural Engine and Core ML frameworks.

Explore the project

The Mochi Diffusion repo is organized around a SwiftUI macOS application. The main Swift code handles user interactions, model loading, and inference orchestration with Core ML.

Key areas to explore include:

  • Model conversion scripts: These use Apple’s coremltools to convert and optimize Stable Diffusion models into Core ML format, focusing on split_einsum variants for ANE compatibility.

  • Inference engine: The code that interfaces with Core ML and manages model execution on the Neural Engine.

  • UI components: SwiftUI views providing image generation controls, gallery management, and metadata display.

The README offers a solid overview of the app’s features, system requirements, and technical approach. For developers interested in how Core ML can be used to run large diffusion models efficiently, the repo is a useful reference.

Since the app runs entirely offline, all data stays local, which is a big plus for users concerned about privacy or network connectivity.

Verdict

Mochi Diffusion is an interesting project if you want to run Stable Diffusion models natively on Apple Silicon Macs without relying on cloud services or heavy GPU setups. Its use of Core ML and the Neural Engine to achieve a ~150MB memory footprint is impressive.

The tradeoffs are clear: it requires modern hardware (M1 or later) and macOS 15.6+, and initial model compilation on the Neural Engine can take up to 2 minutes. But after that, inference is smooth and memory-efficient.

For developers or power users invested in the Apple ecosystem, Mochi Diffusion offers a neat example of how to optimize large AI models for specialized hardware. It’s less relevant if you need cross-platform support or immediate startup times.

Overall, Mochi Diffusion solves a real problem by making diffusion models accessible on consumer Macs with minimal resource demands, showing the practical potential of Core ML and Apple Silicon for on-device AI.


→ GitHub Repo: MochiDiffusion/MochiDiffusion ⭐ 7,892 · Swift