initial commit

This commit is contained in:
zwd973-deepseek 2024-01-11 10:50:34 +08:00
parent 1c8e7915f5
commit 6ed131d4a2

View File

@ -60,6 +60,7 @@ DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B paramete
It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation.
It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations.
For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization. For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py).
## 2. Evaluation Results ## 2. Evaluation Results