From 247ffba7ebad149ee69fd96ab6e2c6b1b4f62271 Mon Sep 17 00:00:00 2001 From: dotrail Date: Thu, 27 Feb 2025 12:01:05 +0000 Subject: [PATCH] fix typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 15e2d51..8b48200 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ Here, we publicly share profiling data from our training and inference framework ![train](assets/train.jpg) The training profile data demonstrates our overlapping strategy for a pair of individual forward and backward chunks in [DualPipe](https://github.com/deepseek-ai/dualpipe). Each chunk contains 4 MoE (Mixture of Experts) layers. -The parallel configuration aligns with DeepSeek-V3 pretraining settings: EP64, TP1 with 4K sequence length. And the PP communication is not included during profilng for simplicity. +The parallel configuration aligns with DeepSeek-V3 pretraining settings: EP64, TP1 with 4K sequence length. And the PP communication is not included during profiling for simplicity. ## Inference