From a5e4a7fbf2f0bc9f0fa1253e9dd6960ec4704d47 Mon Sep 17 00:00:00 2001 From: Travis Van Nimwegen Date: Fri, 18 Oct 2024 20:01:30 -0400 Subject: [PATCH] Create openai-edge-tts-integration.md --- .../openai-edge-tts-integration.md | 187 ++++++++++++++++++ 1 file changed, 187 insertions(+) create mode 100644 docs/tutorials/integrations/openai-edge-tts-integration.md diff --git a/docs/tutorials/integrations/openai-edge-tts-integration.md b/docs/tutorials/integrations/openai-edge-tts-integration.md new file mode 100644 index 0000000..48c1bf2 --- /dev/null +++ b/docs/tutorials/integrations/openai-edge-tts-integration.md @@ -0,0 +1,187 @@ +--- +sidebar_position: 18 +title: "Edge TTS" +--- + +# Integrating `openai-edge-tts` with Open WebUI + +## What is `openai-edge-tts`, and how is it different from `openedai-speech`? + +Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [openai-edge-tts](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured. + +`openedai-speech` is a more comprehensive option that allows for entirely offline generation of speech with many modalities to choose from. + +`openai-edge-tts` is a simpler option that uses a Python package called `edge-tts` to generate the audio. + +`edge-tts` leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free. + +## Requirements + +- Docker installed on your system +- Open WebUI running +- ffmpeg installed (required for audio format conversion and playback speed adjustments) + +## Quick start + +The simplest way to get started without having to configure anything is to run the command below + +```bash +docker run -d -p 5050:5050 travisvn/openai-edge-tts:latest +``` + +This will run the service at port 5050 with all the default configs + +## Setting up Open WebUI to use `openai-edge-tts` + +- Open the Admin Panel and go to Settings -> Audio +- Set your TTS Settings to match the screenshot below +- _Note: you can specify the TTS Voice here_ + +![Screenshot of Open WebUI Admin Settings for Audio adding the correct endpoints for this project](https://utfs.io/f/MMMHiQ1TQaBoQ2AnPhUlTDGtR4B2v7E9JZN1PU5nAseoaXIc) + +::info +The default API key is the string `your_api_key_here`. You do not have to change that value if you do not need the added security. +:: + +**And that's it! You can end here** + +See the [Usage](#usage) section for request examples. + +## Alternative Options + +### Running with Python + +If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server. + +#### 1. Clone the Repository + +```bash +git clone https://github.com/your-username/openai-edge-tts.git +cd openai-edge-tts +``` + +#### 2. Set Up a Virtual Environment + +Create and activate a virtual environment to isolate dependencies: + +```bash +# For macOS/Linux +python3 -m venv venv +source venv/bin/activate + +# For Windows +python -m venv venv +venv\Scripts\activate +``` + +#### 3. Install Dependencies + +Use `pip` to install the required packages listed in `requirements.txt`: + +```bash +pip install -r requirements.txt +``` + +#### 4. Configure Environment Variables + +Create a `.env` file in the root directory and set the following variables: + +```plaintext +API_KEY=your_api_key_here +PORT=5050 + +DEFAULT_VOICE=en-US-AndrewNeural +DEFAULT_RESPONSE_FORMAT=mp3 +DEFAULT_SPEED=1.0 + +DEFAULT_LANGUAGE=en-US + +REQUIRE_API_KEY=True +``` + +#### 5. Run the Server + +Once configured, start the server with: + +```bash +python app/server.py +``` + +The server will start running at `http://localhost:5050`. + +#### 6. Test the API + +You can now interact with the API at `http://localhost:5050/v1/audio/speech` and other available endpoints. See the [Usage](#usage) section for request examples. + + +#### Usage + +##### Endpoint: `/v1/audio/speech` + +Generates audio from the input text. Available parameters: + +**Required Parameter:** + +- **input** (string): The text to be converted to audio (up to 4096 characters). + +**Optional Parameters:** + +- **model** (string): Set to "tts-1" or "tts-1-hd" (default: `"tts-1"`). +- **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AndrewNeural"`). +- **response_format** (string): Audio format. Options: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` (default: `mp3`). +- **speed** (number): Playback speed (0.25 to 4.0). Default is `1.0`. + +Example request with `curl` and saving the output to an mp3 file: + +```bash +curl -X POST http://localhost:5050/v1/audio/speech \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer your_api_key_here" \ + -d '{ + "input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.", + "voice": "echo", + "response_format": "mp3", + "speed": 1.0 + }' \ + --output speech.mp3 +``` + +Or, to be in line with the OpenAI API endpoint parameters: + +```bash +curl -X POST http://localhost:5050/v1/audio/speech \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer your_api_key_here" \ + -d '{ + "model": "tts-1", + "input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.", + "voice": "alloy" + }' \ + --output speech.mp3 +``` + +And an example of a language other than English: + +```bash +curl -X POST http://localhost:5050/v1/audio/speech \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer your_api_key_here" \ + -d '{ + "model": "tts-1", + "input": "じゃあ、行く。電車の時間、調べておくよ。", + "voice": "ja-JP-KeitaNeural" + }' \ + --output speech.mp3 +``` + +##### Additional Endpoints + +- **GET /v1/models**: Lists available TTS models. +- **GET /v1/voices**: Lists `edge-tts` voices for a given language / locale. +- **GET /v1/voices/all**: Lists all `edge-tts` voices, with language support information. + + +## Additional Resources + +For more information on `openai-edge-tts`, you can visit the [GitHub repo](https://github.com/travisvn/openai-edge-tts) +