Orion bypasses CoreML to access Apple’s Neural Engine directly via private frameworks, enabling on-device inference and fine-tuning of small LLMs with 8.5x reduced training overhead.
Mochi Diffusion runs Stable Diffusion and FLUX.2 Klein models locally on Apple Silicon Macs using Core ML, achieving ~150MB memory usage with fast inference, all offline.