mirror of
https://github.com/deepseek-ai/DeepSeek-V2
synced 2025-01-22 18:45:31 +00:00
doc: recommend SGLang for DeepSeek V2 inference
blog: https://lmsys.org/blog/2024-09-04-sglang-v0-3/ slides: https://docs.google.com/presentation/d/1wB_Ul0LZwIDL47qFl64b8hVhH1_ya-1YPAPSSv0cKMs
This commit is contained in:
parent
c15bff20d9
commit
07245dbf85
17
README.md
17
README.md
@ -293,6 +293,23 @@ Assistant: {assistant_message_1}<|end▁of▁sentence|>User: {user_message_2
|
|||||||
|
|
||||||
Assistant:
|
Assistant:
|
||||||
```
|
```
|
||||||
|
### Inference with SGLang (recommended)
|
||||||
|
|
||||||
|
[SGLang](https://github.com/sgl-project/sglang) currently supports MLA, FP8 (W8A8), FP8 KV Cache, CUDA Graph, and Torch Compile, offering the best performance among open source frameworks. Here are some examples of commands:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# fp16 tp8
|
||||||
|
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-Coder-V2-Instruct --tp 8 --trust-remote-code
|
||||||
|
|
||||||
|
# fp16 tp8 w/ torch compile
|
||||||
|
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-Coder-V2-Instruct --tp 8 --trust-remote-code --enable-torch-compile
|
||||||
|
|
||||||
|
# fp16 tp8 w/ torch compile, max torch compile batch size 1
|
||||||
|
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-Coder-V2-Instruct --tp 8 --trust-remote-code --enable-torch-compile --max-torch-compile-bs 1
|
||||||
|
|
||||||
|
# fp8 tp8 w/ torch compile, fp8 e5m2 kv cache
|
||||||
|
python3 -m sglang.launch_server --model neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 --tp 8 --trust-remote-code --enable-torch-compile --kv-cache-dtype fp8_e5m2
|
||||||
|
```
|
||||||
|
|
||||||
### Inference with vLLM (recommended)
|
### Inference with vLLM (recommended)
|
||||||
To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.
|
To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.
|
||||||
|
Loading…
Reference in New Issue
Block a user