mirror of
https://github.com/matatonic/openedai-speech
synced 2025-06-26 18:16:32 +00:00
0.10.0 + docs update
This commit is contained in:
parent
6864cf03b1
commit
c4d9d4e7a7
77
README.md
77
README.md
@ -1,5 +1,4 @@
|
||||
OpenedAI Speech
|
||||
---------------
|
||||
# OpenedAI Speech
|
||||
|
||||
An OpenAI API compatible text to speech server.
|
||||
|
||||
@ -24,11 +23,13 @@ Details:
|
||||
|
||||
If you find a better voice match for `tts-1` or `tts-1-hd`, please let me know so I can update the defaults.
|
||||
|
||||
## Recent Changes
|
||||
|
||||
Version: 0.10.0, 2024-04-26
|
||||
|
||||
* Better upgrades: Reorganize config files under config, voice models under voices
|
||||
* * **If you customized your `voice_to_speaker.yaml` or `pre_process_map.yaml` you need to move them to the `config/` folder.**
|
||||
* Prebuilt & tested docker images, smaller docker images (8GB or 860MB)
|
||||
* Better upgrades: reorganize config files under `config/`, voice models under `voices/`
|
||||
* **Compatibility!** If you customized your `voice_to_speaker.yaml` or `pre_process_map.yaml` you need to move them to the `config/` folder.
|
||||
* default listen host to 0.0.0.0
|
||||
|
||||
Version: 0.9.0, 2024-04-23
|
||||
@ -36,29 +37,17 @@ Version: 0.9.0, 2024-04-23
|
||||
* Fix bug with yaml and loading UTF-8
|
||||
* New sample text-to-speech application `say.py`
|
||||
* Smaller docker base image
|
||||
* Add beta [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) support (you can describe very basic features of the speaker voice), See: (https://www.text-description-to-speech.com/) for some examples of how to describe voices. Voices can be defined in the `voice_to_speaker.yaml`.
|
||||
* 2 example [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) voices are included in the `voice_to_speaker.yaml` file.
|
||||
* parler-tts is experimental software and is kind of slow. The exact voice will be slightly different each generation but should be similar to the basic description.
|
||||
* Add beta [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) support (you can describe very basic features of the speaker voice), See: (https://www.text-description-to-speech.com/) for some examples of how to describe voices. Voices can be defined in the `voice_to_speaker.default.yaml`. Two example [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) voices are included in the `voice_to_speaker.default.yaml` file. `parler-tts` is experimental software and is kind of slow. The exact voice will be slightly different each generation but should be similar to the basic description.
|
||||
|
||||
Version: 0.8.0, 2024-03-23
|
||||
|
||||
* Cleanup, docs update.
|
||||
...
|
||||
|
||||
Version: 0.7.3, 2024-03-20
|
||||
|
||||
* Allow different xtts versions per voice in `voice_to_speaker.yaml`, ex. xtts_v2.0.2
|
||||
* Quality: Fix xtts sample rate (24000 vs. 22050 for piper) and pops
|
||||
* use CUDA 12.2-base in Dockerfile
|
||||
|
||||
API Documentation
|
||||
-----------------
|
||||
|
||||
* [OpenAI Text to speech guide](https://platform.openai.com/docs/guides/text-to-speech)
|
||||
* [OpenAI API Reference](https://platform.openai.com/docs/api-reference/audio/createSpeech)
|
||||
|
||||
|
||||
Installation instructions
|
||||
-------------------------
|
||||
## Installation instructions
|
||||
|
||||
1) Download the models & voices
|
||||
```shell
|
||||
@ -68,24 +57,42 @@ bash download_voices_tts-1.sh
|
||||
bash download_voices_tts-1-hd.sh
|
||||
```
|
||||
|
||||
2a) Docker (**recommended**): You can run the server via docker like so:
|
||||
If you have different models which you want to use, both of the download scripts accept arguments for which models to download.
|
||||
|
||||
Example:
|
||||
```shell
|
||||
# Download en_US-ryan-high too
|
||||
bash download_voices_tts-1.sh en_US-libritts_r-medium en_GB-northern_english_male-medium en_US-ryan-high
|
||||
# Download xtts (latest) and xtts_v2.0.2
|
||||
bash download_voices_tts-1-hd.sh xtts xtts_v2.0.2
|
||||
```
|
||||
|
||||
|
||||
2a) Option 1: Docker (**recommended**) (prebuilt images are available)
|
||||
|
||||
You can run the server via docker like so:
|
||||
```shell
|
||||
cp sample.env speech.env # edit to suit your environment as needed, you can preload a model on startup
|
||||
docker compose up
|
||||
```
|
||||
If you want a minimal docker image with piper support only (~1GB vs. ~10GB, see: Dockerfile.min). You can edit the `docker-compose.yml` to easily change this.
|
||||
If you want a minimal docker image with piper support only (<1GB vs. 8GB, see: Dockerfile.min). You can edit the `docker-compose.yml` to easily change this.
|
||||
To install the docker image as a service, edit the `docker-compose.yml` and uncomment `restart: unless-stopped`, then start the service with: `docker compose up -d`.
|
||||
|
||||
2b) Manual instructions:
|
||||
|
||||
2b) Option 2: Manual instructions:
|
||||
```shell
|
||||
# Install the Python requirements
|
||||
pip install -r requirements.txt
|
||||
# install ffmpeg and curl
|
||||
sudo apt install ffmpeg curl
|
||||
# Create & activate a new virtual environment
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
# Install the Python requirements
|
||||
pip install -r requirements.txt
|
||||
# run the server
|
||||
python speech.py
|
||||
```
|
||||
|
||||
Usage
|
||||
-----
|
||||
## Usage
|
||||
|
||||
```
|
||||
usage: speech.py [-h] [--piper_cuda] [--xtts_device XTTS_DEVICE] [--preload PRELOAD] [-P PORT] [-H HOST]
|
||||
@ -103,8 +110,13 @@ options:
|
||||
|
||||
```
|
||||
|
||||
Sample API Usage
|
||||
----------------
|
||||
## API Documentation
|
||||
|
||||
* [OpenAI Text to speech guide](https://platform.openai.com/docs/guides/text-to-speech)
|
||||
* [OpenAI API Reference](https://platform.openai.com/docs/api-reference/audio/createSpeech)
|
||||
|
||||
|
||||
### Sample API Usage
|
||||
|
||||
You can use it like this:
|
||||
|
||||
@ -148,9 +160,9 @@ with client.audio.speech.with_streaming_response.create(
|
||||
|
||||
Also see the `say.py` sample application for an example of how to use the openai-python API.
|
||||
|
||||
```
|
||||
$ python say.py -t "The quick brown fox jumped over the lazy dog." -p # play the audio, requires 'pip install playsound'
|
||||
$ python say.py -t "The quick brown fox jumped over the lazy dog." -m tts-1-hd -v onyx -f flac -o fox.flac # save to a file.
|
||||
```shell
|
||||
python say.py -t "The quick brown fox jumped over the lazy dog." -p # play the audio, requires 'pip install playsound'
|
||||
python say.py -t "The quick brown fox jumped over the lazy dog." -m tts-1-hd -v onyx -f flac -o fox.flac # save to a file.
|
||||
```
|
||||
|
||||
```
|
||||
@ -176,8 +188,7 @@ options:
|
||||
-p, --playsound Play the audio (default: False)
|
||||
```
|
||||
|
||||
Custom Voices Howto
|
||||
-------------------
|
||||
## Custom Voices Howto
|
||||
|
||||
Custom voices should be mono 22050 hz sample rate WAV files with low noise (no background music, etc.) and not contain any partial words.Sample voices for xtts should be at least 6 seconds long, but they can be longer. However, longer samples do not always produce better results.
|
||||
|
||||
|
||||
@ -1,7 +1,7 @@
|
||||
services:
|
||||
server:
|
||||
build:
|
||||
dockerfile: Dockerfile # for tts-1-hd support via xtts_v2, ~4GB VRAM required, ~10GB
|
||||
dockerfile: Dockerfile # for tts-1-hd support via xtts_v2, ~4GB VRAM required, ~8GB
|
||||
#dockerfile: Dockerfile.min # piper for all models, no gpu/nvidia required, ~1GB
|
||||
image: ghcr.io/matatonic/openedai-speech
|
||||
#image: ghcr.io/matatonic/openedai-speech-min
|
||||
@ -11,7 +11,8 @@ services:
|
||||
volumes:
|
||||
- ./voices:/app/voices
|
||||
- ./config:/app/config
|
||||
#restart: unless-stopped # install as a service
|
||||
# install as a service, run with docker compose up -d
|
||||
#restart: unless-stopped
|
||||
# Below can be removed if not using GPU
|
||||
runtime: nvidia
|
||||
deploy:
|
||||
|
||||
Loading…
Reference in New Issue
Block a user