From 1f70dd5b4b18021af7c6a42a71db54dee7def9b2 Mon Sep 17 00:00:00 2001 From: Samuel Maier Date: Sun, 28 Jul 2024 18:28:59 +0200 Subject: [PATCH 1/5] add slim_down.md --- docs/tutorial/slim_down.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 docs/tutorial/slim_down.md diff --git a/docs/tutorial/slim_down.md b/docs/tutorial/slim_down.md new file mode 100644 index 0000000..695458c --- /dev/null +++ b/docs/tutorial/slim_down.md @@ -0,0 +1,26 @@ +--- +sidebar_position: 10 +title: "Slimming down RAM usage" +--- + +# Slimming down RAM usage + +If you deploy this image in a RAM constrained environment, there are a few things you can do do slim down the image. + +On a Raspberry Pi 4 (arm64) with version v0.3.10 this was able to reduce idle memory consumption from >1GB to ~200MB. + +## TLDR + +Set the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. + +## Longer explanation + +A lot of the memory consumption is because of loaded ML models. Even if you use an external language model (OpenAI or un-bundled ollama) a lot of models may be loaded for additional purposes. + +As of v0.3.10 this includes: +* Speach-to-text (defaults to whisper) +* RAG Embedding engine (defaults to local SentenceTransformers model) +* Image generation engine (disabled by default) + +The first 2 are enabled and set to local models by default. You can change the models in the admin planel (RAG: Documents category, set it to ollama or OpenAI, Speach-to-text: Audio section, OpenAI or WebAPI work). +If you deploy via docker you can also set these with the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. From 2d190b35408a37199daaac747d278707e06259b5 Mon Sep 17 00:00:00 2001 From: Samuel Maier Date: Sun, 28 Jul 2024 18:29:19 +0200 Subject: [PATCH 2/5] bump other (IMO less important) wiki pages down --- docs/tutorial/continue-dev.md | 2 +- docs/tutorial/ipex_llm.md | 2 +- docs/tutorial/openedai-speech-integration.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorial/continue-dev.md b/docs/tutorial/continue-dev.md index 84302c3..b5d360f 100644 --- a/docs/tutorial/continue-dev.md +++ b/docs/tutorial/continue-dev.md @@ -1,5 +1,5 @@ --- -sidebar_position: 12 +sidebar_position: 13 title: "Continue.dev VSCode Extension with Open WebUI" --- diff --git a/docs/tutorial/ipex_llm.md b/docs/tutorial/ipex_llm.md index ea1196b..741b712 100644 --- a/docs/tutorial/ipex_llm.md +++ b/docs/tutorial/ipex_llm.md @@ -1,5 +1,5 @@ --- -sidebar_position: 10 +sidebar_position: 11 title: "Local LLM Setup with IPEX-LLM on Intel GPU" --- diff --git a/docs/tutorial/openedai-speech-integration.md b/docs/tutorial/openedai-speech-integration.md index 6b3bcc3..907e81f 100644 --- a/docs/tutorial/openedai-speech-integration.md +++ b/docs/tutorial/openedai-speech-integration.md @@ -1,5 +1,5 @@ --- -sidebar_position: 11 +sidebar_position: 12 title: "TTS - OpenedAI-Speech using Docker" --- From 9e106a43acce8d8afd4728c0e69828c33fd751e5 Mon Sep 17 00:00:00 2001 From: Samuel Maier Date: Sun, 28 Jul 2024 18:32:46 +0200 Subject: [PATCH 3/5] explain how to observe ram usage --- docs/tutorial/slim_down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorial/slim_down.md b/docs/tutorial/slim_down.md index 695458c..2b519fa 100644 --- a/docs/tutorial/slim_down.md +++ b/docs/tutorial/slim_down.md @@ -7,7 +7,7 @@ title: "Slimming down RAM usage" If you deploy this image in a RAM constrained environment, there are a few things you can do do slim down the image. -On a Raspberry Pi 4 (arm64) with version v0.3.10 this was able to reduce idle memory consumption from >1GB to ~200MB. +On a Raspberry Pi 4 (arm64) with version v0.3.10 this was able to reduce idle memory consumption from >1GB to ~200MB (as observed with `docker container stats`). ## TLDR From 6eeecf9256795c2b308f21819b04e4c8a2796e3b Mon Sep 17 00:00:00 2001 From: Samuel Maier Date: Mon, 29 Jul 2024 16:56:07 +0200 Subject: [PATCH 4/5] apply deepl write to catch all forms of typos --- docs/tutorial/slim_down.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/tutorial/slim_down.md b/docs/tutorial/slim_down.md index 2b519fa..05d9550 100644 --- a/docs/tutorial/slim_down.md +++ b/docs/tutorial/slim_down.md @@ -1,13 +1,13 @@ --- sidebar_position: 10 -title: "Slimming down RAM usage" +title: "Reduce RAM usage" --- -# Slimming down RAM usage +# Reduce RAM usage -If you deploy this image in a RAM constrained environment, there are a few things you can do do slim down the image. +If you are deploying this image in a RAM-constrained environment, there are a few things you can do to slim down the image. -On a Raspberry Pi 4 (arm64) with version v0.3.10 this was able to reduce idle memory consumption from >1GB to ~200MB (as observed with `docker container stats`). +On a Raspberry Pi 4 (arm64) with version v0.3.10, this was able to reduce idle memory consumption from >1GB to ~200MB (as observed with `docker container stats`). ## TLDR @@ -15,12 +15,12 @@ Set the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_ ## Longer explanation -A lot of the memory consumption is because of loaded ML models. Even if you use an external language model (OpenAI or un-bundled ollama) a lot of models may be loaded for additional purposes. +Much of the memory consumption is due to loaded ML models. Even if you are using an external language model (OpenAI or unbundled ollama), many models may be loaded for additional purposes. As of v0.3.10 this includes: -* Speach-to-text (defaults to whisper) -* RAG Embedding engine (defaults to local SentenceTransformers model) +* Speech-to-text (whisper by default) +* RAG embedding engine (defaults to local SentenceTransformers model) * Image generation engine (disabled by default) -The first 2 are enabled and set to local models by default. You can change the models in the admin planel (RAG: Documents category, set it to ollama or OpenAI, Speach-to-text: Audio section, OpenAI or WebAPI work). -If you deploy via docker you can also set these with the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. +The first 2 are enabled and set to local models by default. You can change the models in the admin panel (RAG: Documents category, set it to Ollama or OpenAI, Speech-to-text: Audio section, work with OpenAI or WebAPI). +If you deploy via Docker, you can also set them with the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. \ No newline at end of file From 5c60399de530e5d97799f5af27c92ef616c895c5 Mon Sep 17 00:00:00 2001 From: Samuel Maier Date: Mon, 29 Jul 2024 17:13:08 +0200 Subject: [PATCH 5/5] Address feedback --- docs/tutorial/slim_down.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorial/slim_down.md b/docs/tutorial/slim_down.md index 05d9550..1c39c2f 100644 --- a/docs/tutorial/slim_down.md +++ b/docs/tutorial/slim_down.md @@ -11,7 +11,7 @@ On a Raspberry Pi 4 (arm64) with version v0.3.10, this was able to reduce idle m ## TLDR -Set the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. +Set the following environment variables (or the respective UI settings for an existing deployment): `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. ## Longer explanation @@ -23,4 +23,4 @@ As of v0.3.10 this includes: * Image generation engine (disabled by default) The first 2 are enabled and set to local models by default. You can change the models in the admin panel (RAG: Documents category, set it to Ollama or OpenAI, Speech-to-text: Audio section, work with OpenAI or WebAPI). -If you deploy via Docker, you can also set them with the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. \ No newline at end of file +If you are deploying a fresh Docker image, you can also set them with the following environment variables: `RAG_EMBEDDING_ENGINE: ollama`, `AUDIO_STT_ENGINE: openai`. Note that these environment variables have no effect if a `config.json` already exists.