Fun-ASR is Alibaba Tongyi Lab’s end-to-end speech recognition model with 800M parameters, supporting 31 languages and real-time transcription in noisy environments.
Kimi-Audio combines continuous acoustic and discrete semantic tokens within a 7B LLM for unified audio-text understanding and generation. It achieves state-of-the-art ASR with low-latency audio synthesis.
LiveCaptions Translator taps Windows 11’s on-device LiveCaptions for real-time speech translation via multiple LLM and traditional APIs, all in a sleek C# desktop app.