QwenVoice: offline Apple Silicon text-to-speech with XPC isolation and model quantization tradeoffs

Running large text-to-speech models locally on Apple Silicon is challenging, especially if you want a smooth native app experience without cloud dependencies. QwenVoice (soon rebranded as Vocello) tackles this by running the Qwen3-TTS 1.7B model offline on macOS and iOS using Apple’s MLX framework, isolating the heavy inference work in an XPC process to keep the UI responsive. It offers multiple voice synthesis modes, including custom built-in voices, user-driven voice design, and voice cloning from audio samples — all without a Python backend or command-line interface.

What QwenVoice does and how it’s built

QwenVoice is a native macOS and iOS application written in Swift, targeting only Apple Silicon devices with macOS/iOS version 26.0 or later. Its main mission is to provide offline text-to-speech (TTS) using the Qwen3-TTS 1.7B model, which it runs locally via Apple’s MLX framework optimized for M-series chips.

The app supports three main modes:

Custom Voice: Four built-in English speaker voices shipped with the app.
Voice Design: A mode where users can shape voices using natural language prompts.
Voice Cloning: Users can create custom voices by providing audio clips and optional transcripts.

Under the hood, the app uses macOS XPC services to isolate the ML inference engine in a separate process. This architectural choice prevents the UI from blocking during expensive model inference and improves stability since the heavy lifting happens outside the main SwiftUI app.

Model weights are downloaded locally from Hugging Face, and the app supports two quantization levels:

8-bit model variant prioritizing quality.
4-bit model variant optimized for inference speed.

This contract-driven model variant system allows the app to adapt inference strategies depending on available hardware resources and user preferences.

For data persistence, QwenVoice uses GRDB, a Swift wrapper around SQLite, to keep a history of generated speech instances. The whole codebase is MIT-licensed and deliberately avoids a Python backend or CLI interface, focusing strictly on a native Swift experience.

What sets QwenVoice apart: xpc isolation and quantization tradeoffs

The standout technical design is the use of macOS XPC for process isolation of the ML inference engine. Running a 1.7B parameter TTS model locally is resource-intensive, and embedding that directly in the UI process risks UI freezes or crashes. By using XPC, the app delegates generation to a separate process, communicating via well-defined IPC channels. This design preserves SwiftUI responsiveness and aligns with macOS best practices for heavy computation tasks.

Another notable aspect is the contract-driven approach to model variants. Supporting both 8-bit and 4-bit quantized versions of the Qwen3-TTS model introduces a clear tradeoff:

The 8-bit variant delivers higher voice quality at the cost of slower inference and higher memory consumption.
The 4-bit variant offers faster inference and lower memory use but with some quality loss.

The app hides complexity around temperature or max-token controls, which are common in TTS tools but can confuse users. Instead, QwenVoice opts for simplicity and reliability by not exposing these hyperparameters.

The code quality is surprisingly clean for a relatively niche AI macOS app, leveraging modern Swift concurrency and strong typing. The use of GRDB for local SQLite handling is a solid choice, offering robustness without adding unnecessary dependencies.

Limitations are mainly platform and hardware constraints: it requires macOS/iOS 26.0 or newer and Apple Silicon chips. Users on Intel Macs or older versions are out of luck. Also, the lack of streaming batch UI or finer control over TTS parameters may frustrate advanced users looking for customization.

Quick start with QwenVoice

The README provides clear installation and usage steps for the shipped v1.2.3 build and the upcoming Vocello release.

Requirements

macOS 26.0+ or iOS 26.0+ (iPhone 15 Pro or later)
Apple Silicon chip
Minimum 8 GB RAM on macOS
Xcode 26.0 and XcodeGen for building from source

Installing the released app

Download the latest release from the GitHub Releases page.
Open the Vocello-macos26.dmg file.
Drag Vocello.app to /Applications.
Launch the app, navigate to the Models tab, download the desired model, and start generating speech.

This straightforward flow makes it accessible even if you aren’t deep into AI development or macOS app building.

Verdict

QwenVoice fills a practical niche: a native, offline TTS app running large transformer models on Apple Silicon with a clean SwiftUI interface. Its architectural choice of XPC isolation for ML inference is a solid pattern worth understanding for anyone building heavy ML apps on macOS.

The dual quantization approach offers a meaningful tradeoff between speed and quality, catering to different user needs and hardware constraints. However, it’s limited to the latest Apple hardware and OS versions, which narrows its audience.

The app’s deliberate simplicity — no temperature tuning, no streaming UI — means it is best suited for users who want reliable, local TTS without fuss, rather than power users needing fine-grained control.

If you are developing AI-powered macOS/iOS apps or are interested in efficient local ML model deployment on Apple Silicon, QwenVoice offers an instructive example of balancing UX responsiveness with heavy ML workloads. For everyday users, the app provides a capable offline voice synthesis experience with multiple voice modes, including voice cloning.

Overall, it’s a technically sound, well-organized project that trades some flexibility for stability and native integration. Worth exploring if you’re Apple Silicon focused and want a local voice synthesis solution.

Qwen Code: A multi-provider terminal AI coding agent with unified config abstraction — Qwen Code is a TypeScript terminal AI coding agent that abstracts multiple LLM providers behind a unified config, enabli
LlamaFactory: modular, extensible fine-tuning framework for large language models — LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, incl

→ GitHub Repo: PowerBeef/QwenVoice ⭐ 262 · Swift

Noureddine RAMDI / QwenVoice: offline Apple Silicon text-to-speech with XPC isolation and model quantization tradeoffs