Parrot offers a rare combination in the AI text-to-speech (TTS) space: a fully offline, privacy-first desktop application powered by a Rust backend and a Bun frontend. It uses a lightweight yet capable neural model for speech synthesis, enabling users to instantly listen to highlighted text without relying on cloud services or GPUs. This setup tackles the common pain points of privacy concerns and network dependency that plague many TTS solutions.
What parrot does and how it is built
Parrot is a desktop TTS application that works across major operating systems—macOS, Windows, and Linux. Its core is a Rust backend, chosen for its performance and efficiency, which runs the Kokoro-82M neural TTS model. This model is approximately 115 MB in size, small enough to download once and run entirely on-device without a GPU or internet connection afterward.
The app supports 54 distinct voices spanning 9 languages, which is impressive for an offline solution. Users highlight text anywhere on their system and trigger the TTS playback with a keyboard shortcut (Option+Space on macOS, Ctrl+Space on Windows/Linux). The audio starts streaming almost immediately, as the app begins playback before the entire text is synthesized.
The frontend is built with Bun, a modern JavaScript runtime, providing the UI and integrating with the Rust backend. This combination balances a high-performance core with a flexible, responsive interface. The app also includes features like a floating overlay with pause and resume controls, customizable shortcuts, and CLI flags to integrate with window managers or enable remote control toggles.
Under the hood, the Kokoro-82M model is a neural TTS model designed for on-device use, trading off size and computational requirements against voice quality and latency. Its relatively small footprint (~115 MB) makes it feasible for offline desktop use without demanding specialized hardware.
Technical strengths and design tradeoffs
The standout technical aspect of Parrot is its fully offline operation with a neural TTS model that does not require GPU acceleration. Running Kokoro-82M in Rust is a deliberate choice to maximize CPU efficiency and safety, as Rust provides low-level control and memory safety guarantees. This means the app can run on a wide range of consumer hardware without specialized AI acceleration.
Streaming audio playback before full synthesis completes is a significant UX improvement. It reduces perceived latency, making the TTS feel instantaneous. This streaming approach requires careful orchestration between the synthesis pipeline and audio output buffer, which the Rust backend handles effectively.
Supporting 54 voices across multiple languages offline is ambitious. It likely involves either a multi-speaker model or multiple smaller models. This breadth increases the app’s utility but also the complexity of managing model files and voice selection.
The choice of Bun for the frontend is interesting and somewhat unconventional compared to more established Electron or React Native approaches. Bun promises faster startup and lower resource usage, aligning well with Parrot’s lightweight, efficient philosophy.
The tradeoff here is the initial 115 MB model download, which might be a barrier for users with limited bandwidth or storage. Also, while the app avoids cloud dependencies, it doesn’t leverage GPU acceleration, which could limit synthesis speed on lower-end CPUs. The app’s design prioritizes privacy and offline use over maximum voice quality or synthesis speed that GPU-backed cloud solutions might achieve.
The codebase is likely focused on a clear separation of concerns: Rust handling the heavy model inference and audio streaming, Bun managing the UI and user input. This separation improves maintainability and allows each part to be optimized independently.
Installation and quick start
## Installation
Download the latest stable version for macOS, Windows, and Linux from the Parrot website.
On first launch, Parrot prompts you to download the TTS model (~115 MB). Once downloaded, the app works completely offline.
# Install frontend dependencies
bun install
This installation approach emphasizes ease of use: users get a ready-to-run binary for their platform and download the model only once. The explicit call to install Bun dependencies is needed if building or modifying the frontend.
verdict
Parrot is well-suited for users who want a privacy-conscious, offline TTS solution that doesn’t rely on cloud APIs or GPUs. It’s a practical tool for anyone needing quick text-to-speech on their desktop, supporting multiple languages and voices without internet connectivity.
The tradeoff is the upfront model download and reliance on CPU-based synthesis, which may not match the audio quality or speed of cloud-based GPU models. However, for many practical purposes, especially in privacy-sensitive contexts, Parrot hits a useful balance.
Developers interested in offline AI applications will appreciate its Rust backend and Bun frontend architecture, which together deliver a lightweight but flexible desktop app. Parrot isn’t a drop-in replacement for high-quality cloud TTS services but fills a niche for offline, private, and responsive text-to-speech on desktop environments.
Related Articles
- QwenVoice: offline Apple Silicon text-to-speech with XPC isolation and model quantization tradeoffs — QwenVoice runs Qwen3-TTS 1.7B offline on Apple Silicon using MLX with XPC isolation and supports voice cloning. It balan
- Voice Clone Studio: unified modular web UI for multi-engine voice cloning and TTS — Voice Clone Studio unifies multiple voice AI engines in a modular Gradio web UI. Supports voice cloning, multi-speaker d
- ChatTTS: conversational text-to-speech with prosodic control and responsible AI tradeoffs — ChatTTS is an open-source conversational text-to-speech model trained on 100,000+ hours of bilingual audio. It offers fi
- Voice-Pro: chaining Whisper, translation, and voice cloning in a portable Gradio app — Voice-Pro bundles Whisper variants, translation, and zero-shot voice cloning into a single Python Gradio app, balancing
- Elato-Local: a local voice AI platform bridging desktop and embedded IoT on Apple Silicon — Elato-Local is a local voice AI platform combining Whisper ASR, local LLMs, and ESP32-S3 firmware flashing from a Tauri
→ GitHub Repo: rishiskhare/parrot ⭐ 104 · Rust