docs/docs/tutorial/openedai-speech-integration.md
2024-06-09 17:56:06 +00:00

4.9 KiB

sidebar_position title
11 Integrating OpenedAI-Speech with Open WebUI using Docker Desktop

Integrating openedai-speech into Open WebUI using Docker Desktop

What is openedai-speech?

openedai-speech is an OpenAI API compatible text-to-speech server that uses Coqui AI's xtts_v2 and/or Piper TTS as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API.

Prerequisites

  • Docker Desktop installed on your system
  • Open WebUI running in a Docker container
  • A basic understanding of Docker and Docker Compose

Option 1: Using Docker Compose

Step 1: Create a new folder for the openedai-speech service

Create a new folder, for example, openedai-speech-service, to store the docker-compose.yml and .env files.

Step 2: Create a docker-compose.yml file

In the openedai-speech-service folder, create a new file named docker-compose.yml with the following contents:

services:
  server:
    image: ghcr.io/matatonic/openedai-speech
    container_name: openedai-speech
    env_file: .env
    ports:
      - "8000:8000"
    volumes:
      - tts-voices:/app/voices
      - tts-config:/app/config
    # labels:
    #   - "com.centurylinklabs.watchtower.enable=true"
    restart: unless-stopped

volumes:
  tts-voices:
  tts-config:

Step 3: Create an .env file (optional)

In the same openedai-speech-service folder, create a new file named .env with the following contents:

TTS_HOME=voices
HF_HOME=voices
#PRELOAD_MODEL=xtts
#PRELOAD_MODEL=xtts_v2.0.2
#PRELOAD_MODEL=parler-tts/parler_tts_mini_v0.1

Step 4: Run docker-compose to start the openedai-speech service

Run the following command in the openedai-speech-service folder to start the openedai-speech service in detached mode:

docker compose up -d

This will start the openedai-speech service in the background.

Option 2: Using Docker Run Commands

You can also use the following Docker run commands to start the openedai-speech service in detached mode:

With GPU (Nvidia) support:

docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest

Alternative without GPU support:

docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest

Configuring Open WebUI

For more information on configuring Open WebUI to use openedai-speech, including setting environment variables, see the Open WebUI documentation.

Step 5: Configure Open WebUI to use openedai-speech

Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration:

  • API Base URL: http://host.docker.internal:8000/v1
  • API Key: sk-111111111 (note: this is a dummy API key, as openedai-speech doesn't require an API key; you can use whatever for this field)

Step 6: Choose a voice

Under Set Voice, you can choose from the following voices:

  • alloy
  • echo
  • echo-alt
  • fable
  • onyx
  • nova
  • shimmer

Step 7: Enjoy naturally sounding voices

You should now be able to use the openedai-speech integration with Open WebUI to generate naturally sounding voices.

Troubleshooting

If you encounter any issues, make sure that:

  • The openedai-speech service is running and the port you set in the docker-compose.yml file is exposed.
  • The host.docker.internal hostname is resolvable from within the Open WebUI container. host.docker.internal is required since openedai-speech is exposed via localhost on your PC, but open-webui cannot normally access this from within its container.
  • The API key is set to a dummy value, as openedai-speech doesn't require an API key.

Additional Resources

For more information on openedai-speech, please visit the GitHub repository.

Note: You can change the port number in the docker-compose.yml file to any open and usable port, but make sure to update the API Base URL in Open WebUI Admin Audio settings accordingly.