Update README.md

2025-06-26 18:27:03 +00:00 · 2024-01-12 14:46:55 +08:00
parent e3e4f59b82
commit 4b411c7d2b
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -62,7 +62,7 @@

 DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. 
 It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. 
-It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. 
+It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. 
 For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
 The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py).