docs/docs/getting-started/quick-start/starting-with-llama-cpp.mdx
Timothy Jaeryang Baek 56b4f5b241
Some checks are pending
Deploy site to Pages / build (push) Waiting to run
Deploy site to Pages / deploy (push) Blocked by required conditions
doc: starting with
2025-04-08 14:08:22 -07:00

128 lines
3.8 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
sidebar_position: 3
title: "🦙Starting with Llama.cpp"
---
## Overview
Open WebUI makes it simple and flexible to connect and manage a local Llama.cpp server to run efficient, quantized language models. Whether youve compiled Llama.cpp yourself or you're using precompiled binaries, this guide will walk you through how to:
- Set up your Llama.cpp server
- Load large models locally
- Integrate with Open WebUI for a seamless interface
Lets get you started!
---
## Step 1: Install Llama.cpp
To run models with Llama.cpp, you first need the Llama.cpp server installed locally.
You can either:
- 📦 [Download prebuilt binaries](https://github.com/ggerganov/llama.cpp/releases)
- 🛠️ Or build it from source by following the [official build instructions](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
After installing, make sure `llama-server` is available in your local system path or take note of its location.
---
## Step 2: Download a Supported Model
You can load and run various GGUF-format quantized LLMs using Llama.cpp. One impressive example is the DeepSeek-R1 1.58-bit model optimized by UnslothAI. To download this version:
1. Visit the [Unsloth DeepSeek-R1 repository on Hugging Face](https://huggingface.co/unsloth/DeepSeek-R1-GGUF)
2. Download the 1.58-bit quantized version around 131GB.
Alternatively, use Python to download programmatically:
```python
# pip install huggingface_hub hf_transfer
from huggingface_hub import snapshot_download
snapshot_download(
repo_id = "unsloth/DeepSeek-R1-GGUF",
local_dir = "DeepSeek-R1-GGUF",
allow_patterns = ["*UD-IQ1_S*"], # Download only 1.58-bit variant
)
```
This will download the model files into a directory like:
```
DeepSeek-R1-GGUF/
└── DeepSeek-R1-UD-IQ1_S/
├── DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf
├── DeepSeek-R1-UD-IQ1_S-00002-of-00003.gguf
└── DeepSeek-R1-UD-IQ1_S-00003-of-00003.gguf
```
📍 Keep track of the full path to the first GGUF file — youll need it in Step 3.
---
## Step 3: Serve the Model with Llama.cpp
Start the model server using the llama-server binary. Navigate to your llama.cpp folder (e.g., build/bin) and run:
```bash
./llama-server \
--model /your/full/path/to/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
--port 10000 \
--ctx-size 1024 \
--n-gpu-layers 40
```
🛠️ Tweak the parameters to suit your machine:
- --model: Path to your .gguf model file
- --port: 10000 (or choose another open port)
- --ctx-size: Token context length (can increase if RAM allows)
- --n-gpu-layers: Layers offloaded to GPU for faster performance
Once the server runs, it will expose a local OpenAI-compatible API on:
```
http://127.0.0.1:10000
```
---
## Step 4: Connect Llama.cpp to Open WebUI
To control and query your locally running model directly from Open WebUI:
1. Open Open WebUI in your browser
2. Go to ⚙️ Admin Settings → Connections → OpenAI Connections
3. Click Add Connection and enter:
- URL: `http://127.0.0.1:10000/v1`
(Or use `http://host.docker.internal:10000/v1` if running WebUI inside Docker)
- API Key: `none` (leave blank)
💡 Once saved, Open WebUI will begin using your local Llama.cpp server as a backend!
![Llama.cpp Connection in Open WebUI](/images/tutorials/deepseek/connection.png)
---
## Quick Tip: Try Out the Model via Chat Interface
Once connected, select the model from the Open WebUI chat menu and start interacting!
![Model Chat Preview](/images/tutorials/deepseek/response.png)
---
## You're Ready to Go!
Once configured, Open WebUI makes it easy to:
- Manage and switch between local models served by Llama.cpp
- Use the OpenAI-compatible API with no key needed
- Experiment with massive models like DeepSeek-R1 — right from your machine!
---
🚀 Have fun experimenting and building!