From e2c15caf041dc0809f616349fa145e56ac2b72aa Mon Sep 17 00:00:00 2001 From: simon-mo Date: Thu, 26 Dec 2024 17:11:31 -0800 Subject: [PATCH] add version Signed-off-by: simon-mo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e44a55c..a8315fe 100644 --- a/README.md +++ b/README.md @@ -307,7 +307,7 @@ For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy ### 6.5 Inference with vLLM (recommended) -[vLLM](https://github.com/vllm-project/vllm) supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well. +[vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well. ### 6.6 Recommended Inference Functionality with AMD GPUs