5.5 KiB
sidebar_position | title |
---|---|
18 | Edge TTS |
Integrating openai-edge-tts
with Open WebUI
What is openai-edge-tts
, and how is it different from openedai-speech
?
Similar to openedai-speech, openai-edge-tts is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured.
openedai-speech
is a more comprehensive option that allows for entirely offline generation of speech with many modalities to choose from.
openai-edge-tts
is a simpler option that uses a Python package called edge-tts
to generate the audio.
edge-tts
leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free.
Requirements
- Docker installed on your system
- Open WebUI running
- ffmpeg installed (required for audio format conversion and playback speed adjustments)
Quick start
The simplest way to get started without having to configure anything is to run the command below
docker run -d -p 5050:5050 travisvn/openai-edge-tts:latest
This will run the service at port 5050 with all the default configs
Setting up Open WebUI to use openai-edge-tts
- Open the Admin Panel and go to Settings -> Audio
- Set your TTS Settings to match the screenshot below
- Note: you can specify the TTS Voice here
:::info
The default API key is the string your_api_key_here
. You do not have to change that value if you do not need the added security.
:::
And that's it! You can end here
See the Usage section for request examples.
Alternative Options
Running with Python
If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server.
1. Clone the Repository
git clone https://github.com/your-username/openai-edge-tts.git
cd openai-edge-tts
2. Set Up a Virtual Environment
Create and activate a virtual environment to isolate dependencies:
# For macOS/Linux
python3 -m venv venv
source venv/bin/activate
# For Windows
python -m venv venv
venv\Scripts\activate
3. Install Dependencies
Use pip
to install the required packages listed in requirements.txt
:
pip install -r requirements.txt
4. Configure Environment Variables
Create a .env
file in the root directory and set the following variables:
API_KEY=your_api_key_here
PORT=5050
DEFAULT_VOICE=en-US-AndrewNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.0
DEFAULT_LANGUAGE=en-US
REQUIRE_API_KEY=True
5. Run the Server
Once configured, start the server with:
python app/server.py
The server will start running at http://localhost:5050
.
6. Test the API
You can now interact with the API at http://localhost:5050/v1/audio/speech
and other available endpoints. See the Usage section for request examples.
Usage
Endpoint: /v1/audio/speech
Generates audio from the input text. Available parameters:
Required Parameter:
- input (string): The text to be converted to audio (up to 4096 characters).
Optional Parameters:
- model (string): Set to "tts-1" or "tts-1-hd" (default:
"tts-1"
). - voice (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid
edge-tts
voice (default:"en-US-AndrewNeural"
). - response_format (string): Audio format. Options:
mp3
,opus
,aac
,flac
,wav
,pcm
(default:mp3
). - speed (number): Playback speed (0.25 to 4.0). Default is
1.0
.
Example request with curl
and saving the output to an mp3 file:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
"voice": "echo",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3
Or, to be in line with the OpenAI API endpoint parameters:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
"voice": "alloy"
}' \
--output speech.mp3
And an example of a language other than English:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "じゃあ、行く。電車の時間、調べておくよ。",
"voice": "ja-JP-KeitaNeural"
}' \
--output speech.mp3
Additional Endpoints
- GET /v1/models: Lists available TTS models.
- GET /v1/voices: Lists
edge-tts
voices for a given language / locale. - GET /v1/voices/all: Lists all
edge-tts
voices, with language support information.
Additional Resources
For more information on openai-edge-tts
, you can visit the GitHub repo