openedai-speech

UniqAI/openedai-speech

Fork 0

mirror of https://github.com/matatonic/openedai-speech synced 2025-06-26 18:16:32 +00:00

Go to file

matatonic ba08f6e8f3 0.6.0 rc1

2023-11-27 23:25:44 -05:00

voices

0.5.0 rc1

2023-11-27 22:43:41 -05:00

.gitignore

xtts wip

2023-11-27 16:57:53 -05:00

docker-compose.yml

0.6.0 rc1

2023-11-27 23:25:44 -05:00

Dockerfile

0.6.0 rc1

2023-11-27 23:25:44 -05:00

Dockerfile.min

0.6.0 rc1

2023-11-27 23:25:44 -05:00

download_samples.sh

0.5.0 rc1

2023-11-27 22:43:41 -05:00

download_voices_tts-1-hd.sh

0.2.0 rc5

2023-11-27 21:33:59 -05:00

download_voices_tts-1.sh

0.5.0 rc1

2023-11-27 22:43:41 -05:00

LICENSE

Initial commit

2023-11-26 21:21:09 -05:00

main.py

0.6.0 rc1

2023-11-27 23:25:44 -05:00

pre_process_map.yaml

initial

2023-11-26 21:41:59 -05:00

README.md

0.6.0 rc1

2023-11-27 23:25:44 -05:00

requirements.txt

0.2.0 rc1

2023-11-27 19:27:31 -05:00

test_voices.sh

0.6.0 rc1

2023-11-27 23:25:44 -05:00

voice_to_speaker.yaml

0.5.0 rc1

2023-11-27 22:43:41 -05:00

README.md

OpenedAI API for audio/speech

This is an API clone of the OpenAI API for text to speech audio generation.

Compatible with the OpenAI audio/speech API
Does not connect to the OpenAI API and does not require a (real) OpenAI API Key
Not affiliated with OpenAI in any way

Full Compatibility:

tts-1: alloy, echo, fable, onyx, nova, and shimmer (configurable)
tts-1-hd: alloy, echo, fable, onyx, nova, and shimmer (configurable, uses OpenAI samples by default)
response_format: mp3, opus, aac, or flac
speed 0.25-4.0 (and more)

Details:

model 'tts-1' via piper tts (fast, can use cpu)
model 'tts-1-hd' via coqui-ai/TTS xtts_v2 voice cloning (fast, uses almost 4GB GPU VRAM)
Can be run without TTS/xtts_v2, entirely on cpu
Custom cloned voices can be used for tts-1-hd, just save a WAV file in /voices/
You can map your own piper voices and xtts_v2 speaker clones via voice_to_speaker.yaml
Sometimes certain words or symbols will sound bad, you can fix them with regex via pre_process_map.yaml

If you find a better voice match for tts-1 or tts-1-hd, please let me know so I can update the defaults.

Version: 0.2.0

Last update: 2023-11-27

API Documentation

Installation instructions

# Install the Python requirements
pip install -r requirements.txt
# install ffmpeg & curl
sudo apt install ffmpeg curl
# Download the voice models:
# for tts-1
bash download_voices_tts-1.sh
# and for tts-1-hd
bash download_voices_tts-1-hd.sh

Usage

usage: main.py [-h] [--piper_cuda] [--xtts_device XTTS_DEVICE] [--preload_xtts] [-P PORT] [-H HOST]

OpenedAI Speech API Server

options:
  -h, --help            show this help message and exit
  --piper_cuda          Enable cuda for piper. Note: --cuda/onnxruntime-gpu is not working for me, but cpu is fast enough (default: False)
  --xtts_device XTTS_DEVICE
                        Set the device for the xtts model. The special value of 'none' will use piper for all models. (default: cuda)
  --preload_xtts        Preload the xtts model. By default it's loaded on first use. (default: False)
  -P PORT, --port PORT  Server tcp port (default: 8000)
  -H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: localhost)

Sample API Usage

You can use it like this:

curl http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.0
  }' > speech.mp3

Or just like this:

curl http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{
    "input": "The quick brown fox jumped over the lazy dog."}' > speech.mp3

Or like this example from the OpenAI Text to speech guide:

import openai

client = openai.OpenAI(
  # This part is not needed if you set these environment variables before import openai
  # export OPENAI_API_KEY=sk-11111111111
  # export OPENAI_BASE_URL=http://localhost:8000/v1
  api_key = "sk-111111111",
  base_url = "http://localhost:8000/v1",
)

response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is a wonderful day to build something people love!"
)

response.stream_to_file("speech.mp3")

Docker support

You can run the server via docker like so:

docker compose build
docker compose up

If you want a minimal docker image with piper support only (900MB vs. 13GB, see: Dockerfile.min). You can edit the docker-compose.yml to change this.