From 636fdc0a3e0918b24d9379bb80cbf10e734b94ad Mon Sep 17 00:00:00 2001 From: A-transformer Date: Thu, 27 Feb 2025 09:32:59 +0400 Subject: [PATCH] Fix grammatical error in strategy description. Fix grammatical error in strategy description. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3ea43d3..aa3a6c8 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ When using expert parallelism (EP), different experts are assigned to different GPUs. Because the load of different experts may vary depending on the current workload, it is important to keep the load of different GPUs balanced. -As described in the DeepSeek-V3 paper, we adopt **redundant experts** strategy that duplicates heavy-loaded experts. +As described in the DeepSeek-V3 paper, we adopt a **redundant experts** strategy that duplicates heavy-loaded experts. Then, we heuristically pack the duplicated experts to GPUs to ensure load balancing across different GPUs. Moreover, thanks to the **group-limited expert routing** used in DeepSeek-V3, we also attempt to place the experts of the same group to the same node to reduce inter-node data traffic, whenever possible.