Merge pull request #4 from vatlor/main

fix typo
This commit is contained in:
Chengqi Deng 2025-03-03 18:09:56 +08:00 committed by GitHub
commit e66dec569b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -9,7 +9,7 @@ Here, we publicly share profiling data from our training and inference framework
![train](assets/train.jpg)
The training profile data demonstrates our overlapping strategy for a pair of individual forward and backward chunks in [DualPipe](https://github.com/deepseek-ai/dualpipe). Each chunk contains 4 MoE (Mixture of Experts) layers.
The parallel configuration aligns with DeepSeek-V3 pretraining settings: EP64, TP1 with 4K sequence length. And the PP communication is not included during profilng for simplicity.
The parallel configuration aligns with DeepSeek-V3 pretraining settings: EP64, TP1 with 4K sequence length. And the PP communication is not included during profiling for simplicity.
## Inference