Merge pull request #354 from d4v3y0rk/add-Kokoro-FastAPI-docs

add instructions for kokoro
2025-06-16 11:28:36 +00:00 · 2025-01-10 10:59:09 -08:00 · 2025-01-10 10:59:09 -08:00 · ac7613425b
commit ac7613425b
parent d07a595142 34a958b840
1 changed files with 82 additions and 0 deletions
--- a/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md
+++ b/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md
@ -0,0 +1,82 @@
 ---
 sidebar_position: 2
 title: "🗨️ Kokoro-FastAPI Using Docker"
 ---
 :::warning
 This tutorial is a community contribution and is not supported by the OpenWebUI team. It serves only as a demonstration on how to customize OpenWebUI for your specific use case. Want to contribute? Check out the contributing tutorial.
 :::
 # Integrating `Kokoro-FastAPI` 🗣️ with Open WebUI
 ## What is `Kokoro-FastAPI`?
 [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) is a dockerized FastAPI wrapper for the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds:
 - 100x+ real-time speed via HF A100
 - 35-50x+ real-time speed via 4060Ti
 - 5x+ real-time speed via M3 Pro CPU
 Key Features:
 - OpenAI-compatible Speech endpoint with inline voice combination
 - NVIDIA GPU accelerated or CPU Onnx inference
 - Streaming support with variable chunking
 - Multiple audio format support (mp3, wav, opus, flac, aac, pcm)
 - Web UI interface for easy testing
 - Phoneme endpoints for conversion and generation
 Voices:
 - af
 - af_bella
 - af_nicole
 - af_sarah
 - af_sky
 - am_adam
 - am_michael
 - bf_emma
 - bf_isabella
 - bf_george
 - bf_lewis
 Languages:
 - en_us
 - en_uk
 ## Requirements
 - Docker installed on your system
 - Open WebUI running
 - For GPU support: NVIDIA GPU with CUDA 12.1
 - For CPU-only: No special requirements
 ## ⚡️ Quick start
 You can choose between GPU or CPU versions:
 ```bash
 # GPU Version (Requires NVIDIA GPU with CUDA 12.1)
 docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:latest
 # CPU Version (ONNX optimized inference)
 docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:cpu-latest
 ```
 ## Setting up Open WebUI to use `Kokoro-FastAPI`
 - Open the Admin Panel and go to Settings -> Audio
 - Set your TTS Settings to match the following:
 - - Text-to-Speech Engine: OpenAI
  - API Base URL: `http://localhost:8880/v1`
  - API Key: `not-needed`
  - TTS Model: `kokoro`
  - TTS Voice: `af_bella`
 :::info
 The default API key is the string `not-needed`. You do not have to change that value if you do not need the added security.
 :::
 **And that's it!**
 # Please see the repo [Kokoro-FastAPI](https://github.com/Sharrnah/Kokoro-FastAPI) for instructions on how to build the docker container. (For chajnging ports etc)