Kimi-Audio combines continuous acoustic and discrete semantic tokens within a 7B LLM for unified audio-text understanding and generation. It achieves state-of-the-art ASR with low-latency audio synthesis.
YuE is an open-source Python foundation model for generating complete songs from lyrics using a two-stage architecture and audio in-context learning. It supports style cloning and LoRA finetuning under Apache 2.0.