Commit Graph

46 Commits

Author SHA1 Message Date
priten
f7920df870 Fix non-ascii error issue on ENABLE_FORWARD_USER_INFO_HEADERS 2025-06-16 12:33:11 -05:00
Timothy Jaeryang Baek
72df23ed79 refac 2025-06-16 17:24:55 +04:00
Timothy Jaeryang Baek
7a1afa9c66 feat: custom stt content type
Co-Authored-By: Bryan Berns <berns@uwalumni.com>
2025-06-16 16:13:40 +04:00
Timothy Jaeryang Baek
8258dfb5af enh: enable deepgram smart_format 2025-06-16 12:34:01 +04:00
Timothy Jaeryang Baek
036ce12dd9 doc: changelog 2025-05-30 01:14:38 +04:00
Romain Dauby
b12a493fe5 fix: only trust codec_name for audio conversion
Some files have .wav extension with incompatible OpenAI codec
2025-05-29 16:57:23 -04:00
Timothy Jaeryang Baek
baaa285534 feat: user stt language 2025-05-24 00:36:30 +04:00
Timothy Jaeryang Baek
73e64fe7fb refac: audio upload handling 2025-05-19 02:52:48 +04:00
Timothy Jaeryang Baek
b280f828b0 enh: very long audio transcription 2025-05-17 02:51:28 +04:00
Timothy Jaeryang Baek
b143c71da2 refac: AIOHTTP_CLIENT_SESSION_SSL 2025-05-14 23:33:52 +04:00
Timothy Jaeryang Baek
549989e9ec refac 2025-05-10 19:04:40 +04:00
Timothy Jaeryang Baek
827326e1a2 refac: audio transcription issue
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
2025-05-08 22:57:48 +04:00
Timothy Jaeryang Baek
bfa5550cc3 refac: openai already supports webm audio 2025-05-08 22:44:32 +04:00
Tim Jaeryang Baek
2a4dfc02a2
Merge pull request #13540 from NoMoreFood/dev
feat: Azure TTS Allow Base URL
2025-05-07 00:49:57 +04:00
Bryan Berns
5aabe21cbe Add Custom Azure TTS URL 2025-05-05 22:08:48 -04:00
Timothy Jaeryang Baek
7b36466c1c refac: audio transcribe supported filetype 2025-05-05 23:42:56 +04:00
Timothy Jaeryang Baek
4cfb99248d chore: format 2025-05-03 23:48:24 +04:00
Tim Jaeryang Baek
7b014e44ee
Merge pull request #13376 from Thaniel94/add-whisper-language-constraint
feat: Added WHISPER_LANGUAGE env variable
2025-05-02 03:08:00 -07:00
nathaniel
ef7acfbf3d WHISPER_LANGUAGE no longer a "PersistentConfig" variable (Was not appropriate with how WHISPER_LANGUAGE is currently configured). 2025-05-01 21:33:57 +01:00
Bryan Berns
6c8a9d000e Azure STT Allow Base URL & Max Speaker Setting 2025-04-30 08:51:01 -04:00
nathaniel
1efa708f83 Added WHISPER_LANGUAGE env variable. If set to a country's ISO2, constrains Whisper's stt to that language. Detects language as normal if unset 2025-04-27 05:58:06 +01:00
Timothy Jaeryang Baek
e7332fd6fe refac 2025-04-13 23:39:38 -07:00
Tom
24367d459b Enable vad_filter to improve quality of transcription in faster-whisper model. 2025-04-13 13:03:57 +01:00
Timothy Jaeryang Baek
bde89fd29e refac: audio 2025-04-12 18:40:09 -07:00
Timothy Jaeryang Baek
91a455a284 chore: format 2025-04-12 16:35:11 -07:00
Tim Jaeryang Baek
36ac81b229
Merge pull request #12727 from decent-engineer-decent-datascientist/main
feat: add Azure AI Speech STT provider
2025-04-10 16:50:40 -07:00
priten
9a55257c5b feat: add Azure AI Speech STT provider
- Add Azure STT configuration variables for API key, region and locales
- Implement Azure STT transcription endpoint with 200MB file size limit
- Update Audio settings UI to include Azure STT configuration fields
- Handle Azure API responses and error cases consistently
2025-04-10 15:38:59 -05:00
Timothy Jaeryang Baek
05aa9c6d9c refac 2025-04-10 12:27:11 -07:00
Thomas Rehn
4731e0d0e3 fix: convert webm to wav for OpenAI transcription endpoint 2025-04-10 09:00:51 +02:00
Thomas Rehn
d99a883867 fix: convert ogg to wav for OpenAI transcription endpoint 2025-04-08 15:04:04 +02:00
Hermógenes Oliveira
e936d7b53d fix: audio api endpoint filetype check
RFC2046 allows the Content-Type field to have additional parameters
after the main type/subtype information (Section 1).

Following RFC4281, many applications put codec information inside
parameters in the Content-Type. This is especially common for formats
that support many codecs, such as Ogg (RFC5334, Section 4).

The `/api/audio/transcriptions` endpoint is currently rejecting files
that contain parameters in the Content-Type field with a bad request
error.

This commit changes the current check in order to accept any
Content-Type field that begins with a supported type/subtype as listed
in the `supported_filetypes` tuple.

Since Content-Type here is provided by the user, I believe this check
is meant to prevent honest mistakes, like posting a PDF to an audio
processing endpoint, not as a security measure against possibly
malicious use. Therefore, I think it's OK not to validate the rest of
the field.
2025-03-08 18:03:30 -03:00
tidely
b15814c42f chore: remove unnecessary Path conversions
Remove unnecessary `pathlib.Path` conversions. (CACHE_DIR and DATA_DIR)

Use `/` Path joining shorthand to ensure using platform specific Path separators (Windows: \\, Unix: /)
2025-03-04 19:53:52 +02:00
Timothy Jaeryang Baek
3be5e3129b
Merge pull request #10752 from NovoNordisk-OpenSource/yvedeng/standardize-logging
refactor: replace print statements with logging
2025-02-25 10:53:02 -08:00
Yifang Deng
0e5d5ecb81
refactor: replace print statements with logging for better error tracking 2025-02-25 15:53:55 +01:00
Timothy Jaeryang Baek
613a087387 refac 2025-02-21 10:55:03 -08:00
Synergyst
f789ad59a9
Update audio.py
Removed original code that was commented out
2025-02-21 04:47:46 -06:00
Coleton M
cdf620e6ee Update audio.py to fetch custom URL voices and models 2025-02-21 04:41:45 -06:00
Timothy Jaeryang Baek
eeb00a5ca2 chore: format 2025-02-20 01:01:29 -08:00
Liu Yue
90d9cdacfa
fix: respect proxy and timeout settings in audio-related aiohttp requests 2025-02-20 14:55:45 +08:00
Timothy Jaeryang Baek
60095598ec chore: format 2025-02-09 22:20:47 -08:00
Tristan Morris
5df474abb9 Add support for Deepgram STT 2025-02-02 08:12:13 -06:00
Timothy Jaeryang Baek
8b6d03e430 fix: elevenlabs audio 2024-12-26 12:54:31 -08:00
Timothy Jaeryang Baek
70de5cf7b8 fix: audio 2024-12-19 16:18:54 -08:00
Timothy Jaeryang Baek
87d695caad Update audio.py 2024-12-11 04:47:35 -08:00
Timothy Jaeryang Baek
df0cdd9f3c wip 2024-12-11 04:37:47 -08:00
Timothy Jaeryang Baek
d3d161f723 wip 2024-12-10 00:54:13 -08:00