From 8c192d4a66fa66f75913ed011af8205ce334ce29 Mon Sep 17 00:00:00 2001 From: Dave York Date: Thu, 9 Jan 2025 13:14:10 -0500 Subject: [PATCH 1/2] add instructions for kokoro --- .../Kokoro-FastAPI-integration.md | 65 +++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md diff --git a/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md b/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md new file mode 100644 index 0000000..56b0ca9 --- /dev/null +++ b/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md @@ -0,0 +1,65 @@ +--- +sidebar_position: 2 +title: "🗨️ Kokoro-FastAPI Using Docker" +--- + +:::warning +This tutorial is a community contribution and is not supported by the OpenWebUI team. It serves only as a demonstration on how to customize OpenWebUI for your specific use case. Want to contribute? Check out the contributing tutorial. +::: + +# Integrating `Kokoro-FastAPI` 🗣️ with Open WebUI + +## What is `Kokoro-FastAPI`? + +[Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) is a dockerized FastAPI wrapper for the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds: + +- 100x+ real-time speed via HF A100 +- 35-50x+ real-time speed via 4060Ti +- 5x+ real-time speed via M3 Pro CPU + +Key Features: +- OpenAI-compatible Speech endpoint with inline voice combination +- NVIDIA GPU accelerated or CPU Onnx inference +- Streaming support with variable chunking +- Multiple audio format support (mp3, wav, opus, flac, aac, pcm) +- Web UI interface for easy testing +- Phoneme endpoints for conversion and generation + +## Requirements + +- Docker installed on your system +- Open WebUI running +- For GPU support: NVIDIA GPU with CUDA 12.1 +- For CPU-only: No special requirements + +## ⚡️ Quick start + +You can choose between GPU or CPU versions: + +```bash +# GPU Version (Requires NVIDIA GPU with CUDA 12.1) +docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:latest + +# CPU Version (ONNX optimized inference) +docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:cpu-latest +``` + +## Setting up Open WebUI to use `Kokoro-FastAPI` + +- Open the Admin Panel and go to Settings -> Audio +- Set your TTS Settings to match the following: +- - Text-to-Speech Engine: OpenAI + - API Base URL: `http://localhost:8880/v1` + - API Key: `not-needed` + - TTS Model: `kokoro` + - TTS Voice: `af_bella` + + + +:::info +The default API key is the string `not-needed`. You do not have to change that value if you do not need the added security. +::: + +**And that's it!** + +# Please see the repo [Kokoro-FastAPI](https://github.com/Sharrnah/Kokoro-FastAPI) for instructions on how to build the docker container. (For chajnging ports etc) From 34a958b840f521f24f27afeae4812985ea3e82de Mon Sep 17 00:00:00 2001 From: Dave York Date: Fri, 10 Jan 2025 00:32:08 -0500 Subject: [PATCH 2/2] adding voices and languages --- .../Kokoro-FastAPI-integration.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md b/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md index 56b0ca9..3429097 100644 --- a/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md +++ b/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md @@ -25,6 +25,23 @@ Key Features: - Web UI interface for easy testing - Phoneme endpoints for conversion and generation +Voices: + - af + - af_bella + - af_nicole + - af_sarah + - af_sky + - am_adam + - am_michael + - bf_emma + - bf_isabella + - bf_george + - bf_lewis + +Languages: + - en_us + - en_uk + ## Requirements - Docker installed on your system