mirror of https://github.com/gpt-omni/mini-omni synced 2025-06-26 18:16:26 +00:00

Go to file

mini-omni 40ba0c7c3a add license		2024-08-29 19:11:28 +08:00
data	init mini-omni	2024-08-29 19:10:03 +08:00
litgpt	init mini-omni	2024-08-29 19:10:03 +08:00
utils	init mini-omni	2024-08-29 19:10:03 +08:00
webui	init mini-omni	2024-08-29 19:10:03 +08:00
.gitignore	add license	2024-08-29 19:11:28 +08:00
inference.py	init mini-omni	2024-08-29 19:10:03 +08:00
LICENSE	add license	2024-08-29 19:11:28 +08:00
README.md	init mini-omni	2024-08-29 19:10:03 +08:00
requirements.txt	init mini-omni	2024-08-29 19:10:03 +08:00
server.py	init mini-omni	2024-08-29 19:10:03 +08:00

README.md

Mini-Omni

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

🤗 Hugging Face | 📖 Github | 📑 Technical report (coming soon)

Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Features

✅ Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.

✅ Talking while thinking, with the ability to generate text and audio at the same time.

✅ Streaming audio outupt capabilities.

✅ With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.

Demo

Install

Create a new conda environment and install the required packages:

conda create -n omni python=3.10
conda activate omni

git clone https://github.com/gpt-omni/mini-omni.git
cd mini-omni
pip install -r requirements.txt

Quick start

Interactive demo

start server

conda activate omni
cd mini-omni
python3 server.py --ip '0.0.0.0' --port 60808

run streamlit demo

NOTE: you need to run streamlit locally with PyAudio installed.

pip install PyAudio==0.2.14
API_URL=http://0.0.0.0:60808/chat streamlit run webui/omni_streamlit.py

run gradio demo

API_URL=http://0.0.0.0:60808/chat python3 webui/omni_gradio.py

example:

Local test

conda activate omni
cd mini-omni
# test run the preset audio samples and questions
python inference.py

Acknowledgements

Qwen2 as the LLM backbone.
litGPT for training and inference.
whisper for audio encoding.
snac for audio decoding.
CosyVoice for generating synthetic speech.
OpenOrca and MOSS for alignment.