diff --git a/README.md b/README.md index c210290..1626bbb 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,21 @@ # Mini-Omni -

+

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming -

+ +

🤗 Hugging Face | 📖 Github -| 📑 Technical report (coming soon) -

+| 📑 Technical report +

Mini-Omni is an open-source multimodel large language model that can **hear, talk while thinking**. Featuring real-time end-to-end speech input and **streaming audio output** conversational capabilities.

-

+

## Features @@ -29,7 +30,10 @@ Mini-Omni is an open-source multimodel large language model that can **hear, tal ## Demo -![](./data/demo_streamlit.mov) +NOTE: need to unmute first. + +https://github.com/user-attachments/assets/03bdde05-9514-4748-b527-003bea57f118 + ## Install @@ -71,7 +75,10 @@ API_URL=http://0.0.0.0:60808/chat python3 webui/omni_gradio.py example: -![](./data/demo_gradio.mov) +NOTE: need to unmute first. Gradio seems can not play audio stream instantly, so the latency feels a bit longer. + +https://github.com/user-attachments/assets/29187680-4c42-47ff-b352-f0ea333496d9 + **Local test** @@ -89,4 +96,4 @@ python inference.py - [whisper](https://github.com/openai/whisper/) for audio encoding. - [snac](https://github.com/hubertsiuzdak/snac/) for audio decoding. - [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) for generating synthetic speech. -- [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) and [MOSS](https://github.com/OpenMOSS/MOSS/tree/main) for alignment. \ No newline at end of file +- [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) and [MOSS](https://github.com/OpenMOSS/MOSS/tree/main) for alignment.