From 6c28dee4c2b54a32e433d74e3f0f67a295b12011 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:23:39 +0000 Subject: [PATCH 1/7] Create openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 101 +++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/tutorial/openedai-speech-integration.md diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md new file mode 100644 index 0000000..c631114 --- /dev/null +++ b/docs/tutorial/openedai-speech-integration.md @@ -0,0 +1,101 @@ +--- +sidebar_position: 11 +title: "Integrating OpenedAI-Speech with Open WebUI using Docker Desktop" +--- + +Integrating `openedai-speech` into Open WebUI using Docker Desktop +================================================================ + +**Prerequisites** +--------------- + +* Docker Desktop installed on your system +* Open WebUI running in a Docker container +* A basic understanding of Docker and Docker Compose + +**Step 1: Create a new folder for the `openedai-speech` service** +--------------------------------------------------------- + +Create a new folder, for example, `openedai-speech-service`, to store the `docker-compose.yml` and `.env` files. + +**Step 2: Create a `docker-compose.yml` file** +------------------------------------------ + +In the `openedai-speech-service` folder, create a new file named `docker-compose.yml` with the following contents: +```yaml +services: + server: + image: ghcr.io/matatonic/openedai-speech + container_name: openedai-speech + env_file: .env + ports: + - "8000:8000" + volumes: + - tts-voices:/app/voices + - tts-config:/app/config + # labels: + # - "com.centurylinklabs.watchtower.enable=true" + restart: unless-stopped + +volumes: + tts-voices: + tts-config: +``` +**Step 3: Create an `.env` file (optional)** +----------------------------------------- + +In the same `openedai-speech-service` folder, create a new file named `.env` with the following contents: +``` +TTS_HOME=voices +HF_HOME=voices +#PRELOAD_MODEL=xtts +#PRELOAD_MODEL=xtts_v2.0.2 +#PRELOAD_MODEL=parler-tts/parler_tts_mini_v0.1 +``` +**Step 4: Run `docker-compose` to start the `openedai-speech` service** +--------------------------------------------------------- + +Run the following command in the `openedai-speech-service` folder to start the `openedai-speech` service in detached mode: +``` +docker compose up -d +``` +This will start the `openedai-speech` service in the background. + +**Step 5: Configure Open WebUI to use `openedai-speech`** +--------------------------------------------------------- + +Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration: + +* **API Base URL**: `http://host.docker.internal:8000/v1` +* **API Key**: `sk-111111111` (note: this is a dummy API key, as `openedai-speech` doesn't require an API key; you can use whatever for this field) + +**Step 6: Choose a voice** +------------------------- + +Under Set Voice, you can choose from the following voices: + +* alloy +* echo +* echo-alt +* fable +* onyx +* nova +* shimmer + +**Step 7: Enjoy naturally sounding voices** +----------------------------------------- + +You should now be able to use the `openedai-speech` integration with Open WebUI to generate naturally sounding voices. + +**Troubleshooting** +------------------- + +If you encounter any issues, make sure that: + +* The `openedai-speech` service is running and exposed on port 8000. +* The `host.docker.internal` hostname is resolvable from within the Open WebUI container. +* `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. +* The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. + +Note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. +::: From 2c91ef94ceae2f7239b23118c6f3f7ccce6df4c2 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:46:23 +0000 Subject: [PATCH 2/7] Update openedai-speech-integration.md Updates --- docs/tutorial/openedai-speech-integration.md | 27 +++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index c631114..16b87c5 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -6,6 +6,11 @@ title: "Integrating OpenedAI-Speech with Open WebUI using Docker Desktop" Integrating `openedai-speech` into Open WebUI using Docker Desktop ================================================================ +**What is `openedai-speech`?** +----------------------------- + +`openedai-speech` is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. + **Prerequisites** --------------- @@ -13,6 +18,9 @@ Integrating `openedai-speech` into Open WebUI using Docker Desktop * Open WebUI running in a Docker container * A basic understanding of Docker and Docker Compose +**Option 1: Using Docker Compose** +--------------------------------- + **Step 1: Create a new folder for the `openedai-speech` service** --------------------------------------------------------- @@ -61,6 +69,19 @@ docker compose up -d ``` This will start the `openedai-speech` service in the background. +**Option 2: Using Docker Run Commands** +------------------------------------- + +You can also use the following Docker run commands to start the `openedai-speech` service in detached mode: + +**With GPU (Nvidia) support:** +```bash +docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest +``` +**Alternative without GPU support:** +```bash +docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest +``` **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- @@ -97,5 +118,9 @@ If you encounter any issues, make sure that: * `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. * The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. +**Additional Resources** +------------------------- + +For more information on `openedai-speech`, please visit the [GitHub repository](https://github.com/matatonic/openedai-speech). + Note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. -::: From 72e1e046c7514f7de047d623ca409056011789de Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:53:54 +0000 Subject: [PATCH 3/7] Update openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index 16b87c5..cc47836 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -9,7 +9,7 @@ Integrating `openedai-speech` into Open WebUI using Docker Desktop **What is `openedai-speech`?** ----------------------------- -`openedai-speech` is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. +[openedai-speech](https://github.com/matatonic/openedai-speech) is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. **Prerequisites** --------------- @@ -113,9 +113,8 @@ You should now be able to use the `openedai-speech` integration with Open WebUI If you encounter any issues, make sure that: -* The `openedai-speech` service is running and exposed on port 8000. -* The `host.docker.internal` hostname is resolvable from within the Open WebUI container. -* `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. +* The `openedai-speech` service is running and the port you set in the docker-compose.yml file is exposed. +* The `host.docker.internal` hostname is resolvable from within the Open WebUI container. `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. * The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. **Additional Resources** From b6a67649745824838afc799761cb422ac8150d65 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:56:06 +0000 Subject: [PATCH 4/7] Update openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index cc47836..f6981d5 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -82,6 +82,11 @@ docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/a ```bash docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest ``` +**Configuring Open WebUI** +------------------------- + +For more information on configuring Open WebUI to use `openedai-speech`, including setting environment variables, see the [Open WebUI documentation](https://docs.openwebui.com/getting-started/env-configuration/#text-to-speech). + **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- From d5fe69d048d3d524b550777c33aa114d0d1a0adf Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 22:29:53 +0000 Subject: [PATCH 5/7] Update openedai-speech-integration.md Updoot --- docs/tutorial/openedai-speech-integration.md | 39 ++++++++++++++------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index f6981d5..2e53a7e 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -74,7 +74,7 @@ This will start the `openedai-speech` service in the background. You can also use the following Docker run commands to start the `openedai-speech` service in detached mode: -**With GPU (Nvidia) support:** +**With GPU (Nvidia CUDA) support:** ```bash docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest ``` @@ -90,7 +90,9 @@ For more information on configuring Open WebUI to use `openedai-speech`, includi **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- -Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration: +Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration as shown in the following image: + +![openedai-tts](https://github.com/silentoplayz/docs/assets/50341825/ea08494f-2ebf-41a2-bb0f-9b48dd3ace79) * **API Base URL**: `http://host.docker.internal:8000/v1` * **API Key**: `sk-111111111` (note: this is a dummy API key, as `openedai-speech` doesn't require an API key; you can use whatever for this field) @@ -98,20 +100,26 @@ Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel **Step 6: Choose a voice** ------------------------- -Under Set Voice, you can choose from the following voices: +Under `TTS Voice` within the same audio settings menu in the admin panel, you can set the `TTS Model` to use from the following choices below that `openedai-speech` supports. The voices of these models are optimized for the English language. -* alloy -* echo -* echo-alt -* fable -* onyx -* nova -* shimmer +* `tts-1`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` +* `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (configurable, uses OpenAI samples by default) -**Step 7: Enjoy naturally sounding voices** +**Model Details:** + +* `tts-1` via [Piper TTS](https://github.com/rhasspy/piper) (very fast, runs on CPU): You can map your own [Piper voices](https://rhasspy.github.io/piper-samples/) via the `voice_to_speaker.yaml` configuration file. +* `tts-1-hd` via [Coqui AI/TTS](https://github.com/coqui-ai/TTS) XTTS v2 voice cloning (fast, but requires around 4GB GPU VRAM & Nvidia GPU with CUDA): Custom cloned voices can be used for `tts-1-hd`. See: [Custom Voices Howto](https://github.com/matatonic/openedai-speech/blob/main/docs/custom_voices.md) + + [Multilingual Support](https://github.com/matatonic/openedai-speech#multilingual) with XTTS voices + +**Step 7: Press `Save` to apply the changes** ----------------------------------------- -You should now be able to use the `openedai-speech` integration with Open WebUI to generate naturally sounding voices. +Press the `Save` button to apply the changes to your Open WebUI settings. + +**Step 8: Enjoy naturally sounding voices** +----------------------------------------- + +You should now be able to use the `openedai-speech` integration with Open WebUI to generate naturally sounding voices with text-to-speech throughout Open WebUI. **Troubleshooting** ------------------- @@ -122,6 +130,13 @@ If you encounter any issues, make sure that: * The `host.docker.internal` hostname is resolvable from within the Open WebUI container. `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. * The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. +**FAQ** +---- + +**How can I control the emotional range of the generated audio?** + +There is no direct mechanism to control the emotional output of the audio generated. Certain factors may influence the output audio like capitalization or grammar, but internal tests have yielded mixed results. + **Additional Resources** ------------------------- From 61fe80f8bb7e185115e42e342a3acbe2c5c4d99e Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Mon, 10 Jun 2024 14:06:00 +0000 Subject: [PATCH 6/7] Update openedai-speech-integration.md updoot --- docs/tutorial/openedai-speech-integration.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index 2e53a7e..e3dc4b3 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -9,7 +9,7 @@ Integrating `openedai-speech` into Open WebUI using Docker Desktop **What is `openedai-speech`?** ----------------------------- -[openedai-speech](https://github.com/matatonic/openedai-speech) is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. +:::info: [openedai-speech](https://github.com/matatonic/openedai-speech) is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. ::: **Prerequisites** --------------- @@ -53,7 +53,7 @@ volumes: ----------------------------------------- In the same `openedai-speech-service` folder, create a new file named `.env` with the following contents: -``` +```yaml TTS_HOME=voices HF_HOME=voices #PRELOAD_MODEL=xtts @@ -64,7 +64,7 @@ HF_HOME=voices --------------------------------------------------------- Run the following command in the `openedai-speech-service` folder to start the `openedai-speech` service in detached mode: -``` +```yaml docker compose up -d ``` This will start the `openedai-speech` service in the background. @@ -75,17 +75,17 @@ This will start the `openedai-speech` service in the background. You can also use the following Docker run commands to start the `openedai-speech` service in detached mode: **With GPU (Nvidia CUDA) support:** -```bash +```yaml docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest ``` **Alternative without GPU support:** -```bash +```yaml docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest ``` **Configuring Open WebUI** ------------------------- -For more information on configuring Open WebUI to use `openedai-speech`, including setting environment variables, see the [Open WebUI documentation](https://docs.openwebui.com/getting-started/env-configuration/#text-to-speech). +:::tip: For more information on configuring Open WebUI to use `openedai-speech`, including setting environment variables, see the [Open WebUI documentation](https://docs.openwebui.com/getting-started/env-configuration/#text-to-speech). ::: **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- @@ -102,8 +102,7 @@ Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel Under `TTS Voice` within the same audio settings menu in the admin panel, you can set the `TTS Model` to use from the following choices below that `openedai-speech` supports. The voices of these models are optimized for the English language. -* `tts-1`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` -* `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (configurable, uses OpenAI samples by default) +* `tts-1` or `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (`tts-1-hd` is configurable; uses OpenAI samples by default) **Model Details:** @@ -142,4 +141,4 @@ There is no direct mechanism to control the emotional output of the audio genera For more information on `openedai-speech`, please visit the [GitHub repository](https://github.com/matatonic/openedai-speech). -Note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. +:::note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. ::: From bea96c094e4014503ae4525aa4d32ff8c78fe153 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Mon, 10 Jun 2024 20:08:35 +0000 Subject: [PATCH 7/7] Update openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index e3dc4b3..5fa88ee 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -102,7 +102,7 @@ Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel Under `TTS Voice` within the same audio settings menu in the admin panel, you can set the `TTS Model` to use from the following choices below that `openedai-speech` supports. The voices of these models are optimized for the English language. -* `tts-1` or `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (`tts-1-hd` is configurable; uses OpenAI samples by default) +* `tts-1` or `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (`tts-1-hd` is configurable; uses OpenAI samples by default) **Model Details:** @@ -110,6 +110,8 @@ Under `TTS Voice` within the same audio settings menu in the admin panel, you ca * `tts-1-hd` via [Coqui AI/TTS](https://github.com/coqui-ai/TTS) XTTS v2 voice cloning (fast, but requires around 4GB GPU VRAM & Nvidia GPU with CUDA): Custom cloned voices can be used for `tts-1-hd`. See: [Custom Voices Howto](https://github.com/matatonic/openedai-speech/blob/main/docs/custom_voices.md) + [Multilingual Support](https://github.com/matatonic/openedai-speech#multilingual) with XTTS voices +* Beta [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) support (you can describe very basic features of the speaker voice), See: (https://www.text-description-to-speech.com/) for some examples of how to describe voices. Voices can be defined in the `voice_to_speaker.default.yaml`. Two example [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) voices are included in the `voice_to_speaker.default.yaml` file. `parler-tts` is experimental software and is on the slower side. The exact voice will be slightly different each generation but should be similar to the basic description. + **Step 7: Press `Save` to apply the changes** -----------------------------------------