From e87a43ed4a592bd17f37601b25e84742536b6c2c Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:23:39 +0000 Subject: [PATCH 1/7] Create openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 101 +++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/tutorial/openedai-speech-integration.md diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md new file mode 100644 index 0000000..c631114 --- /dev/null +++ b/docs/tutorial/openedai-speech-integration.md @@ -0,0 +1,101 @@ +--- +sidebar_position: 11 +title: "Integrating OpenedAI-Speech with Open WebUI using Docker Desktop" +--- + +Integrating `openedai-speech` into Open WebUI using Docker Desktop +================================================================ + +**Prerequisites** +--------------- + +* Docker Desktop installed on your system +* Open WebUI running in a Docker container +* A basic understanding of Docker and Docker Compose + +**Step 1: Create a new folder for the `openedai-speech` service** +--------------------------------------------------------- + +Create a new folder, for example, `openedai-speech-service`, to store the `docker-compose.yml` and `.env` files. + +**Step 2: Create a `docker-compose.yml` file** +------------------------------------------ + +In the `openedai-speech-service` folder, create a new file named `docker-compose.yml` with the following contents: +```yaml +services: + server: + image: ghcr.io/matatonic/openedai-speech + container_name: openedai-speech + env_file: .env + ports: + - "8000:8000" + volumes: + - tts-voices:/app/voices + - tts-config:/app/config + # labels: + # - "com.centurylinklabs.watchtower.enable=true" + restart: unless-stopped + +volumes: + tts-voices: + tts-config: +``` +**Step 3: Create an `.env` file (optional)** +----------------------------------------- + +In the same `openedai-speech-service` folder, create a new file named `.env` with the following contents: +``` +TTS_HOME=voices +HF_HOME=voices +#PRELOAD_MODEL=xtts +#PRELOAD_MODEL=xtts_v2.0.2 +#PRELOAD_MODEL=parler-tts/parler_tts_mini_v0.1 +``` +**Step 4: Run `docker-compose` to start the `openedai-speech` service** +--------------------------------------------------------- + +Run the following command in the `openedai-speech-service` folder to start the `openedai-speech` service in detached mode: +``` +docker compose up -d +``` +This will start the `openedai-speech` service in the background. + +**Step 5: Configure Open WebUI to use `openedai-speech`** +--------------------------------------------------------- + +Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration: + +* **API Base URL**: `http://host.docker.internal:8000/v1` +* **API Key**: `sk-111111111` (note: this is a dummy API key, as `openedai-speech` doesn't require an API key; you can use whatever for this field) + +**Step 6: Choose a voice** +------------------------- + +Under Set Voice, you can choose from the following voices: + +* alloy +* echo +* echo-alt +* fable +* onyx +* nova +* shimmer + +**Step 7: Enjoy naturally sounding voices** +----------------------------------------- + +You should now be able to use the `openedai-speech` integration with Open WebUI to generate naturally sounding voices. + +**Troubleshooting** +------------------- + +If you encounter any issues, make sure that: + +* The `openedai-speech` service is running and exposed on port 8000. +* The `host.docker.internal` hostname is resolvable from within the Open WebUI container. +* `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. +* The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. + +Note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. +::: From dd0a110ca2ee8e6cdd6a3ba8de20295b7c74e01b Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:46:23 +0000 Subject: [PATCH 2/7] Update openedai-speech-integration.md Updates --- docs/tutorial/openedai-speech-integration.md | 27 +++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index c631114..16b87c5 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -6,6 +6,11 @@ title: "Integrating OpenedAI-Speech with Open WebUI using Docker Desktop" Integrating `openedai-speech` into Open WebUI using Docker Desktop ================================================================ +**What is `openedai-speech`?** +----------------------------- + +`openedai-speech` is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. + **Prerequisites** --------------- @@ -13,6 +18,9 @@ Integrating `openedai-speech` into Open WebUI using Docker Desktop * Open WebUI running in a Docker container * A basic understanding of Docker and Docker Compose +**Option 1: Using Docker Compose** +--------------------------------- + **Step 1: Create a new folder for the `openedai-speech` service** --------------------------------------------------------- @@ -61,6 +69,19 @@ docker compose up -d ``` This will start the `openedai-speech` service in the background. +**Option 2: Using Docker Run Commands** +------------------------------------- + +You can also use the following Docker run commands to start the `openedai-speech` service in detached mode: + +**With GPU (Nvidia) support:** +```bash +docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest +``` +**Alternative without GPU support:** +```bash +docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest +``` **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- @@ -97,5 +118,9 @@ If you encounter any issues, make sure that: * `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. * The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. +**Additional Resources** +------------------------- + +For more information on `openedai-speech`, please visit the [GitHub repository](https://github.com/matatonic/openedai-speech). + Note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. -::: From 3e33679c910bb4873bdb0156a655f2e8a2f344a4 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:53:54 +0000 Subject: [PATCH 3/7] Update openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index 16b87c5..cc47836 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -9,7 +9,7 @@ Integrating `openedai-speech` into Open WebUI using Docker Desktop **What is `openedai-speech`?** ----------------------------- -`openedai-speech` is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. +[openedai-speech](https://github.com/matatonic/openedai-speech) is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. **Prerequisites** --------------- @@ -113,9 +113,8 @@ You should now be able to use the `openedai-speech` integration with Open WebUI If you encounter any issues, make sure that: -* The `openedai-speech` service is running and exposed on port 8000. -* The `host.docker.internal` hostname is resolvable from within the Open WebUI container. -* `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. +* The `openedai-speech` service is running and the port you set in the docker-compose.yml file is exposed. +* The `host.docker.internal` hostname is resolvable from within the Open WebUI container. `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. * The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. **Additional Resources** From b15ba70becf981598321ec9f9c3ca1150bf85ff3 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 17:56:06 +0000 Subject: [PATCH 4/7] Update openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index cc47836..f6981d5 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -82,6 +82,11 @@ docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/a ```bash docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest ``` +**Configuring Open WebUI** +------------------------- + +For more information on configuring Open WebUI to use `openedai-speech`, including setting environment variables, see the [Open WebUI documentation](https://docs.openwebui.com/getting-started/env-configuration/#text-to-speech). + **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- From 6b227923fdc01cd1c10f0a7223b17fb25fa71ffe Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Sun, 9 Jun 2024 22:29:53 +0000 Subject: [PATCH 5/7] Update openedai-speech-integration.md Updoot --- docs/tutorial/openedai-speech-integration.md | 39 ++++++++++++++------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index f6981d5..2e53a7e 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -74,7 +74,7 @@ This will start the `openedai-speech` service in the background. You can also use the following Docker run commands to start the `openedai-speech` service in detached mode: -**With GPU (Nvidia) support:** +**With GPU (Nvidia CUDA) support:** ```bash docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest ``` @@ -90,7 +90,9 @@ For more information on configuring Open WebUI to use `openedai-speech`, includi **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- -Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration: +Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration as shown in the following image: + +![openedai-tts](https://github.com/silentoplayz/docs/assets/50341825/ea08494f-2ebf-41a2-bb0f-9b48dd3ace79) * **API Base URL**: `http://host.docker.internal:8000/v1` * **API Key**: `sk-111111111` (note: this is a dummy API key, as `openedai-speech` doesn't require an API key; you can use whatever for this field) @@ -98,20 +100,26 @@ Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel **Step 6: Choose a voice** ------------------------- -Under Set Voice, you can choose from the following voices: +Under `TTS Voice` within the same audio settings menu in the admin panel, you can set the `TTS Model` to use from the following choices below that `openedai-speech` supports. The voices of these models are optimized for the English language. -* alloy -* echo -* echo-alt -* fable -* onyx -* nova -* shimmer +* `tts-1`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` +* `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (configurable, uses OpenAI samples by default) -**Step 7: Enjoy naturally sounding voices** +**Model Details:** + +* `tts-1` via [Piper TTS](https://github.com/rhasspy/piper) (very fast, runs on CPU): You can map your own [Piper voices](https://rhasspy.github.io/piper-samples/) via the `voice_to_speaker.yaml` configuration file. +* `tts-1-hd` via [Coqui AI/TTS](https://github.com/coqui-ai/TTS) XTTS v2 voice cloning (fast, but requires around 4GB GPU VRAM & Nvidia GPU with CUDA): Custom cloned voices can be used for `tts-1-hd`. See: [Custom Voices Howto](https://github.com/matatonic/openedai-speech/blob/main/docs/custom_voices.md) + + [Multilingual Support](https://github.com/matatonic/openedai-speech#multilingual) with XTTS voices + +**Step 7: Press `Save` to apply the changes** ----------------------------------------- -You should now be able to use the `openedai-speech` integration with Open WebUI to generate naturally sounding voices. +Press the `Save` button to apply the changes to your Open WebUI settings. + +**Step 8: Enjoy naturally sounding voices** +----------------------------------------- + +You should now be able to use the `openedai-speech` integration with Open WebUI to generate naturally sounding voices with text-to-speech throughout Open WebUI. **Troubleshooting** ------------------- @@ -122,6 +130,13 @@ If you encounter any issues, make sure that: * The `host.docker.internal` hostname is resolvable from within the Open WebUI container. `host.docker.internal` is required since `openedai-speech` is exposed via `localhost` on your PC, but `open-webui` cannot normally access this from within its container. * The API key is set to a dummy value, as `openedai-speech` doesn't require an API key. +**FAQ** +---- + +**How can I control the emotional range of the generated audio?** + +There is no direct mechanism to control the emotional output of the audio generated. Certain factors may influence the output audio like capitalization or grammar, but internal tests have yielded mixed results. + **Additional Resources** ------------------------- From ef06a1676cb6c14766388666ec0bacf3d5e63312 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Mon, 10 Jun 2024 14:06:00 +0000 Subject: [PATCH 6/7] Update openedai-speech-integration.md updoot --- docs/tutorial/openedai-speech-integration.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index 2e53a7e..e3dc4b3 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -9,7 +9,7 @@ Integrating `openedai-speech` into Open WebUI using Docker Desktop **What is `openedai-speech`?** ----------------------------- -[openedai-speech](https://github.com/matatonic/openedai-speech) is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. +:::info: [openedai-speech](https://github.com/matatonic/openedai-speech) is an OpenAI API compatible text-to-speech server that uses Coqui AI's `xtts_v2` and/or `Piper TTS` as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API. ::: **Prerequisites** --------------- @@ -53,7 +53,7 @@ volumes: ----------------------------------------- In the same `openedai-speech-service` folder, create a new file named `.env` with the following contents: -``` +```yaml TTS_HOME=voices HF_HOME=voices #PRELOAD_MODEL=xtts @@ -64,7 +64,7 @@ HF_HOME=voices --------------------------------------------------------- Run the following command in the `openedai-speech-service` folder to start the `openedai-speech` service in detached mode: -``` +```yaml docker compose up -d ``` This will start the `openedai-speech` service in the background. @@ -75,17 +75,17 @@ This will start the `openedai-speech` service in the background. You can also use the following Docker run commands to start the `openedai-speech` service in detached mode: **With GPU (Nvidia CUDA) support:** -```bash +```yaml docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest ``` **Alternative without GPU support:** -```bash +```yaml docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest ``` **Configuring Open WebUI** ------------------------- -For more information on configuring Open WebUI to use `openedai-speech`, including setting environment variables, see the [Open WebUI documentation](https://docs.openwebui.com/getting-started/env-configuration/#text-to-speech). +:::tip: For more information on configuring Open WebUI to use `openedai-speech`, including setting environment variables, see the [Open WebUI documentation](https://docs.openwebui.com/getting-started/env-configuration/#text-to-speech). ::: **Step 5: Configure Open WebUI to use `openedai-speech`** --------------------------------------------------------- @@ -102,8 +102,7 @@ Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel Under `TTS Voice` within the same audio settings menu in the admin panel, you can set the `TTS Model` to use from the following choices below that `openedai-speech` supports. The voices of these models are optimized for the English language. -* `tts-1`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` -* `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (configurable, uses OpenAI samples by default) +* `tts-1` or `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (`tts-1-hd` is configurable; uses OpenAI samples by default) **Model Details:** @@ -142,4 +141,4 @@ There is no direct mechanism to control the emotional output of the audio genera For more information on `openedai-speech`, please visit the [GitHub repository](https://github.com/matatonic/openedai-speech). -Note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. +:::note: You can change the port number in the `docker-compose.yml` file to any open and usable port, but make sure to update the **API Base URL** in Open WebUI Admin Audio settings accordingly. ::: From 1bfeb0831960d54b51c4bcf3b700f36540716cb6 Mon Sep 17 00:00:00 2001 From: silentoplayz <50341825+silentoplayz@users.noreply.github.com> Date: Mon, 10 Jun 2024 20:08:35 +0000 Subject: [PATCH 7/7] Update openedai-speech-integration.md --- docs/tutorial/openedai-speech-integration.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index e3dc4b3..5fa88ee 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -102,7 +102,7 @@ Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel Under `TTS Voice` within the same audio settings menu in the admin panel, you can set the `TTS Model` to use from the following choices below that `openedai-speech` supports. The voices of these models are optimized for the English language. -* `tts-1` or `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (`tts-1-hd` is configurable; uses OpenAI samples by default) +* `tts-1` or `tts-1-hd`: `alloy`, `echo`, `echo-alt`, `fable`, `onyx`, `nova`, and `shimmer` (`tts-1-hd` is configurable; uses OpenAI samples by default) **Model Details:** @@ -110,6 +110,8 @@ Under `TTS Voice` within the same audio settings menu in the admin panel, you ca * `tts-1-hd` via [Coqui AI/TTS](https://github.com/coqui-ai/TTS) XTTS v2 voice cloning (fast, but requires around 4GB GPU VRAM & Nvidia GPU with CUDA): Custom cloned voices can be used for `tts-1-hd`. See: [Custom Voices Howto](https://github.com/matatonic/openedai-speech/blob/main/docs/custom_voices.md) + [Multilingual Support](https://github.com/matatonic/openedai-speech#multilingual) with XTTS voices +* Beta [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) support (you can describe very basic features of the speaker voice), See: (https://www.text-description-to-speech.com/) for some examples of how to describe voices. Voices can be defined in the `voice_to_speaker.default.yaml`. Two example [parler-tts](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) voices are included in the `voice_to_speaker.default.yaml` file. `parler-tts` is experimental software and is on the slower side. The exact voice will be slightly different each generation but should be similar to the basic description. + **Step 7: Press `Save` to apply the changes** -----------------------------------------