mirror of https://github.com/gpt-omni/mini-omni synced 2025-06-26 18:16:26 +00:00

Go to file

mini-omni 5f87f73785 Merge pull request #66 from Lollipop/patch-1 fix typo		2024-09-13 12:33:50 +08:00
data	init mini-omni	2024-08-29 19:10:03 +08:00
litgpt	fix typo	2024-09-12 14:16:10 +08:00
utils	fix device	2024-09-04 23:00:47 +03:00
webui	Fix undefined variable 'tik' for non-API mode	2024-09-06 01:08:36 +08:00
.gitignore	fix device	2024-09-04 23:00:47 +03:00
inference.py	feat:增加device 参数	2024-09-06 10:56:48 +08:00
LICENSE	add license	2024-08-29 19:11:28 +08:00
README.md	update readme	2024-09-09 12:14:40 +08:00
requirements.txt	init mini-omni	2024-08-29 19:10:03 +08:00
server.py	feat:增加device 参数	2024-09-06 10:56:48 +08:00

README.md

Mini-Omni

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

🤗 Hugging Face | 📖 Github | 📑 Technical report

Mini-Omni is an open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Features

✅ Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.

✅ Talking while thinking, with the ability to generate text and audio at the same time.

✅ Streaming audio output capabilities.

✅ With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.

Demo

NOTE: need to unmute first.

https://github.com/user-attachments/assets/03bdde05-9514-4748-b527-003bea57f118

Install

Create a new conda environment and install the required packages:

conda create -n omni python=3.10
conda activate omni

git clone https://github.com/gpt-omni/mini-omni.git
cd mini-omni
pip install -r requirements.txt

Quick start

Interactive demo

start server

NOTE: you need to start the server before running the streamlit or gradio demo with API_URL set to the server address.

sudo apt-get install ffmpeg
conda activate omni
cd mini-omni
python3 server.py --ip '0.0.0.0' --port 60808

run streamlit demo

NOTE: you need to run streamlit locally with PyAudio installed. For error: ModuleNotFoundError: No module named 'utils.vad', please run export PYTHONPATH=./ first.

pip install PyAudio==0.2.14
API_URL=http://0.0.0.0:60808/chat streamlit run webui/omni_streamlit.py

run gradio demo

API_URL=http://0.0.0.0:60808/chat python3 webui/omni_gradio.py

example:

NOTE: need to unmute first. Gradio seems can not play audio stream instantly, so the latency feels a bit longer.

https://github.com/user-attachments/assets/29187680-4c42-47ff-b352-f0ea333496d9

Local test

conda activate omni
cd mini-omni
# test run the preset audio samples and questions
python inference.py

Common issues

Error: ModuleNotFoundError: No module named 'utils.xxxx'

Answer: run export PYTHONPATH=./ first.

Acknowledgements

Qwen2 as the LLM backbone.
litGPT for training and inference.
whisper for audio encoding.
snac for audio decoding.
CosyVoice for generating synthetic speech.
OpenOrca and MOSS for alignment.