Merge pull request #366 from travisvn/patch-6

Update openai-edge-tts-integration.md
This commit is contained in:
Timothy Jaeryang Baek 2025-01-29 14:48:33 -08:00 committed by GitHub
commit 5a05b5d7af
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -9,7 +9,16 @@ This tutorial is a community contribution and is not supported by the OpenWebUI
# Integrating `openai-edge-tts` 🗣️ with Open WebUI # Integrating `openai-edge-tts` 🗣️ with Open WebUI
## What is `openai-edge-tts`, and how is it different from `openedai-speech`? ## What is `openai-edge-tts`?
[OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where you can define the endpoint URL, like with Open WebUI.
It uses the [edge-tts](https://github.com/rany2/edge-tts) package, which leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free.
[Sample the voices here](https://tts.travisvn.com)
<details>
<summary>How is it different from 'openedai-speech'?</summary>
Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [openai-edge-tts](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured. Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [openai-edge-tts](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured.
@ -17,13 +26,12 @@ Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [ope
`openai-edge-tts` is a simpler option that uses a Python package called `edge-tts` to generate the audio. `openai-edge-tts` is a simpler option that uses a Python package called `edge-tts` to generate the audio.
`edge-tts` ([repo](https://github.com/rany2/edge-tts)) leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free. </details>
## Requirements ## Requirements
- Docker installed on your system - Docker installed on your system
- Open WebUI running - Open WebUI running
- ffmpeg (Optional - Only required if opting to not use `mp3` format)
## ⚡️ Quick start ## ⚡️ Quick start
@ -37,7 +45,7 @@ This will run the service at port 5050 with all the default configs
## Setting up Open WebUI to use `openai-edge-tts` ## Setting up Open WebUI to use `openai-edge-tts`
- Open the Admin Panel and go to Settings -> Audio - Open the Admin Panel and go to `Settings` -> `Audio`
- Set your TTS Settings to match the screenshot below - Set your TTS Settings to match the screenshot below
- _Note: you can specify the TTS Voice here_ - _Note: you can specify the TTS Voice here_
@ -49,15 +57,11 @@ The default API key is the string `your_api_key_here`. You do not have to change
**And that's it! You can end here** **And that's it! You can end here**
See the [Usage](#usage) section for request examples.
# Please ⭐️ star the repo on GitHub if you find [OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) useful # Please ⭐️ star the repo on GitHub if you find [OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) useful
:::tip
You can define the environment variables directly in the `docker run` command. See [Quick Config for Docker](#-quick-config-for-docker) below.
:::
## Alternative Options <details>
<summary>Running with Python</summary>
### 🐍 Running with Python ### 🐍 Running with Python
@ -100,9 +104,9 @@ Create a `.env` file in the root directory and set the following variables:
API_KEY=your_api_key_here API_KEY=your_api_key_here
PORT=5050 PORT=5050
DEFAULT_VOICE=en-US-AndrewNeural DEFAULT_VOICE=en-US-AvaNeural
DEFAULT_RESPONSE_FORMAT=mp3 DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.2 DEFAULT_SPEED=1.0
DEFAULT_LANGUAGE=en-US DEFAULT_LANGUAGE=en-US
@ -125,7 +129,10 @@ The server will start running at `http://localhost:5050`.
You can now interact with the API at `http://localhost:5050/v1/audio/speech` and other available endpoints. See the [Usage](#usage) section for request examples. You can now interact with the API at `http://localhost:5050/v1/audio/speech` and other available endpoints. See the [Usage](#usage) section for request examples.
#### Usage </details>
<details>
<summary>Usage details</summary>
##### Endpoint: `/v1/audio/speech` (aliased with `/audio/speech`) ##### Endpoint: `/v1/audio/speech` (aliased with `/audio/speech`)
@ -138,9 +145,9 @@ Generates audio from the input text. Available parameters:
**Optional Parameters:** **Optional Parameters:**
- **model** (string): Set to "tts-1" or "tts-1-hd" (default: `"tts-1"`). - **model** (string): Set to "tts-1" or "tts-1-hd" (default: `"tts-1"`).
- **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AndrewNeural"`). - **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AvaNeural"`).
- **response_format** (string): Audio format. Options: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` (default: `mp3`). - **response_format** (string): Audio format. Options: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` (default: `mp3`).
- **speed** (number): Playback speed (0.25 to 4.0). Default is `1.2`. - **speed** (number): Playback speed (0.25 to 4.0). Default is `1.0`.
:::tip :::tip
You can browse available voices and listen to sample previews at [tts.travisvn.com](https://tts.travisvn.com) You can browse available voices and listen to sample previews at [tts.travisvn.com](https://tts.travisvn.com)
@ -203,6 +210,8 @@ Additionally, there are endpoints for **Azure AI Speech** and **ElevenLabs** for
These can be disabled by setting the environment variable `EXPAND_API=False`. These can be disabled by setting the environment variable `EXPAND_API=False`.
::: :::
</details>
## 🐳 Quick Config for Docker ## 🐳 Quick Config for Docker
You can configure the environment variables in the command used to run the project You can configure the environment variables in the command used to run the project
@ -211,9 +220,9 @@ You can configure the environment variables in the command used to run the proje
docker run -d -p 5050:5050 \ docker run -d -p 5050:5050 \
-e API_KEY=your_api_key_here \ -e API_KEY=your_api_key_here \
-e PORT=5050 \ -e PORT=5050 \
-e DEFAULT_VOICE=en-US-AndrewNeural \ -e DEFAULT_VOICE=en-US-AvaNeural \
-e DEFAULT_RESPONSE_FORMAT=mp3 \ -e DEFAULT_RESPONSE_FORMAT=mp3 \
-e DEFAULT_SPEED=1.2 \ -e DEFAULT_SPEED=1.0 \
-e DEFAULT_LANGUAGE=en-US \ -e DEFAULT_LANGUAGE=en-US \
-e REQUIRE_API_KEY=True \ -e REQUIRE_API_KEY=True \
-e REMOVE_FILTER=False \ -e REMOVE_FILTER=False \