STT documentation

This commit is contained in:
Daniel Rosehill 2025-03-02 22:44:14 +02:00
parent 5fac1e1af3
commit fd46e595aa
9 changed files with 85 additions and 1 deletions

View File

@ -0,0 +1,25 @@
---
sidebar_position: 2
title: "Environment Variables"
---
# Environment Variables List
:::info
For a complete list of all Open WebUI environment variables, see the [Environment Variable Configuration](/docs/getting-started/env-configuration) page.
:::
The following is a summary of the environment variables for speech to text (STT).
# Environment Variables For Speech To Text (STT)
| Variable | Description |
|----------|-------------|
| `WHISPER_MODEL` | Sets the Whisper model to use for local Speech-to-Text |
| `WHISPER_MODEL_DIR` | Specifies the directory to store Whisper model files |
| `AUDIO_STT_ENGINE` | Specifies the Speech-to-Text engine to use (empty for local Whisper, or `openai`) |
| `AUDIO_STT_MODEL` | Specifies the Speech-to-Text model for OpenAI-compatible endpoints |
| `AUDIO_STT_OPENAI_API_BASE_URL` | Sets the OpenAI-compatible base URL for Speech-to-Text |
| `AUDIO_STT_OPENAI_API_KEY` | Sets the OpenAI API key for Speech-to-Text |

View File

@ -1,4 +1,63 @@
---
sidebar_position: 1
title: "🗨️ Configuration"
---
---
Open Web UI supports both local, browser, and remote speech to text.
![alt text](../../../static/images/tutorials/stt/image.png)
![alt text](../../../static/images/tutorials/stt/stt-providers.png)
## Cloud / Remote Speech To Text Proivders
The following cloud speech to text providers are currently supported. API keys can be configured as environment variables (OpenAI) or in the admin settings page (both keys).
| Service | API Key Required |
| ------------- | ------------- |
| OpenAI | ✅ |
| DeepGram | ✅ |
WebAPI provides STT via the built-in browser STT provider.
## Configuring Your STT Provider
To configure a speech to text provider:
- Navigate to the admin settings
- Choose Audio
- Provider an API key and choose a model from the dropdown
![alt text](../../../static/images/tutorials/stt/stt-config.png)
## User-Level Settings
In addition the instance settings provisioned in the admin panel, there are also a couple of user-level settings that can provide additional functionality.
* **STT Settings:** Contains settings related to Speech-to-Text functionality.
* **Speech-to-Text Engine:** Determines the engine used for speech recognition (Default or Web API).
![alt text](../../../static/images/tutorials/stt/user-settings.png)
## Using STT
Speech to text provides a highly efficient way of "writing" prompts using your voice and it performs robustly from both desktop and mobile devices.
To use STT, simply click on the microphone icon:
![alt text](../../../static/images/tutorials/stt/stt-operation.png)
A live audio waveform will indicate successful voice capture:
![alt text](../../../static/images/tutorials/stt/stt-in-progress.png)
## STT Mode Operation
Once your recording has begun you can:
- Click on the tick icon to save the recording (if auto send after completion is enabled it will send for completion; otherwise you can manually send)
- If you wish to abort the recording (for example, you wish to start a fresh recording) you can click on the 'x' icon to scape the recording interface
![alt text](../../../static/images/tutorials/stt/endstt.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB