diff --git a/DeepSeekMoE.pdf b/DeepSeekMoE.pdf index e6d45aa..1124d7c 100644 Binary files a/DeepSeekMoE.pdf and b/DeepSeekMoE.pdf differ diff --git a/README.md b/README.md index 265ea70..bd39b2f 100644 --- a/README.md +++ b/README.md @@ -1,280 +1,285 @@ - - - - -
- Model Download | - Evaluation Results | - Quick Start | - License | - Citation -
- -- Paper Preview👁️ -
- - -## 1. Introduction - -DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. -It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. -It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. -For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization. -The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py). - -## 2. Evaluation Results - -### DeepSeekMoE 16B Base - -We evaluate DeepSeekMoE 16B on various benchmarks and compare it with a series of models, as shown in the following. - -- Comparison with open source models on the Open LLM Leaderboard. DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B, which has approximately 2.5 times the activated parameters. - -- -
- -- Comparison with DeepSeek 7B on our internal benchmarks. DeepSeek 7B is a dense model trained on the same corpus as DeepSeekMoE 16B. With only 40.5% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B. - -- -
- -- Comparison with LLaMA2 7B on our internal benchmarks. With only 39.6% of computations, DeepSeekMoE 16B outperforms LLaMA2 7B on the majority of benchmarks. - -- -
- -### DeepSeekMoE 16B Chat - -We also evaluate DeepSeekMoE 16B Chat on various benchmarks and compare it with DeepSeek 7B Chat and LLaMA2 7B SFT. All of the compared models follow the same fine-tuning setting and data for fair comparison. -The evaluation results are shown in the following. With only about 40% of computations, DeepSeekMoE 16B Chat achieves comparable or better performance than DeepSeek 7B Chat and LLaMA2 7B SFT. - -- -
- -## 3. Model Downloads - -We release the DeepSeekMoE 16B, including both base and chat models, to the public. To support a broader and more diverse range of research within both academic and commercial communities. Please **note** that the use of this model is subject to the terms outlined in [License section](#5-license). Commercial usage is permitted under these terms. - -### Huggingface - -| Model | Sequence Length | Download | -|:---------------------:|:---------------:|:-----------------------------------------------------------------------:| -| DeepSeekMoE 16B Base | 4096 | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base) | -| DeepSeekMoE 16B Chat | 4096 | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) | - -## 4. Quick Start -### Installation - -On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command: - -```shell -pip install -r requirements.txt -``` - -### Inference with Huggingface's Transformers - -You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference. - -**Text Completion** - -```python -import torch -from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig - -model_name = "deepseek-ai/deepseek-ai/deepseek-moe-16b-base" -tokenizer = AutoTokenizer.from_pretrained(model_name) -model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") -model.generation_config = GenerationConfig.from_pretrained(model_name) -model.generation_config.pad_token_id = model.generation_config.eos_token_id - -text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is" -inputs = tokenizer(text, return_tensors="pt") -outputs = model.generate(**inputs.to(model.device), max_new_tokens=100) - -result = tokenizer.decode(outputs[0], skip_special_tokens=True) -print(result) -``` - -**Chat Completion** - -```python -import torch -from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig - -model_name = "deepseek-ai/deepseek-moe-16b-chat" -tokenizer = AutoTokenizer.from_pretrained(model_name) -model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") -model.generation_config = GenerationConfig.from_pretrained(model_name) -model.generation_config.pad_token_id = model.generation_config.eos_token_id - -messages = [ - {"role": "user", "content": "Who are you?"} -] -input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") -outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100) - -result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True) -print(result) -``` - -Avoiding the use of the provided function `apply_chat_template`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. - -``` -User: {messages[0]['content']} - -Assistant: {messages[1]['content']}<|end▁of▁sentence|>User: {messages[2]['content']} - -Assistant: -``` - -**Note:** By default (`add_special_tokens=True`), our tokenizer automatically adds a `bos_token` (`<|begin▁of▁sentence|>`) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. - -### How to Fine-tune DeepSeekMoE - -We provide script `fintune/finetune.py` for users to finetune our models on downstream tasks. - -The script supports the training with [DeepSpeed](https://github.com/microsoft/DeepSpeed). You need install required packages by: - -```bash -pip install -r requirements.txt -``` - -Please follow [Sample Dataset Format](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) to prepare your training data. -Each item has two required fields `instruction` and `output`. - -After data preparation, you can use the sample shell script to finetune the DeepSeekMoE model. -Remember to specify `DATA_PATH`, `OUTPUT_PATH`. -And please choose appropriate hyper-parameters(e.g., `learning_rate`, `per_device_train_batch_size`) according to your scenario. - -```bash -DATA_PATH="+ Model Download | + Evaluation Results | + Quick Start | + License | + Citation +
+ ++ Paper Link👁️ +
+ + +## 1. Introduction + +DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. +It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. +It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. +For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization. +The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py). + +## 2. Evaluation Results + +### DeepSeekMoE 16B Base + +We evaluate DeepSeekMoE 16B on various benchmarks and compare it with a series of models, as shown in the following. + +- Comparison with open source models on the Open LLM Leaderboard. DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B, which has approximately 2.5 times the activated parameters. + ++ +
+ +- Comparison with DeepSeek 7B on our internal benchmarks. DeepSeek 7B is a dense model trained on the same corpus as DeepSeekMoE 16B. With only 40.5% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B. + ++ +
+ +- Comparison with LLaMA2 7B on our internal benchmarks. With only 39.6% of computations, DeepSeekMoE 16B outperforms LLaMA2 7B on the majority of benchmarks. + ++ +
+ +### DeepSeekMoE 16B Chat + +We also evaluate DeepSeekMoE 16B Chat on various benchmarks and compare it with DeepSeek 7B Chat and LLaMA2 7B SFT. All of the compared models follow the same fine-tuning setting and data for fair comparison. +The evaluation results are shown in the following. With only about 40% of computations, DeepSeekMoE 16B Chat achieves comparable or better performance than DeepSeek 7B Chat and LLaMA2 7B SFT. + ++ +
+ +## 3. Model Downloads + +We release the DeepSeekMoE 16B, including both base and chat models, to the public. To support a broader and more diverse range of research within both academic and commercial communities. Please **note** that the use of this model is subject to the terms outlined in [License section](#5-license). Commercial usage is permitted under these terms. + +### Huggingface + +| Model | Sequence Length | Download | +|:---------------------:|:---------------:|:-----------------------------------------------------------------------:| +| DeepSeekMoE 16B Base | 4096 | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base) | +| DeepSeekMoE 16B Chat | 4096 | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) | + +## 4. Quick Start +### Installation + +On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command: + +```shell +pip install -r requirements.txt +``` + +### Inference with Huggingface's Transformers + +You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference. + +**Text Completion** + +```python +import torch +from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig + +model_name = "deepseek-ai/deepseek-ai/deepseek-moe-16b-base" +tokenizer = AutoTokenizer.from_pretrained(model_name) +model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") +model.generation_config = GenerationConfig.from_pretrained(model_name) +model.generation_config.pad_token_id = model.generation_config.eos_token_id + +text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is" +inputs = tokenizer(text, return_tensors="pt") +outputs = model.generate(**inputs.to(model.device), max_new_tokens=100) + +result = tokenizer.decode(outputs[0], skip_special_tokens=True) +print(result) +``` + +**Chat Completion** + +```python +import torch +from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig + +model_name = "deepseek-ai/deepseek-moe-16b-chat" +tokenizer = AutoTokenizer.from_pretrained(model_name) +model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") +model.generation_config = GenerationConfig.from_pretrained(model_name) +model.generation_config.pad_token_id = model.generation_config.eos_token_id + +messages = [ + {"role": "user", "content": "Who are you?"} +] +input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") +outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100) + +result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True) +print(result) +``` + +Avoiding the use of the provided function `apply_chat_template`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. + +``` +User: {messages[0]['content']} + +Assistant: {messages[1]['content']}<|end▁of▁sentence|>User: {messages[2]['content']} + +Assistant: +``` + +**Note:** By default (`add_special_tokens=True`), our tokenizer automatically adds a `bos_token` (`<|begin▁of▁sentence|>`) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. + +### How to Fine-tune DeepSeekMoE + +We provide script `fintune/finetune.py` for users to finetune our models on downstream tasks. + +The script supports the training with [DeepSpeed](https://github.com/microsoft/DeepSpeed). You need install required packages by: + +```bash +pip install -r requirements.txt +``` + +Please follow [Sample Dataset Format](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) to prepare your training data. +Each item has two required fields `instruction` and `output`. + +After data preparation, you can use the sample shell script to finetune the DeepSeekMoE model. +Remember to specify `DATA_PATH`, `OUTPUT_PATH`. +And please choose appropriate hyper-parameters(e.g., `learning_rate`, `per_device_train_batch_size`) according to your scenario. + +```bash +DATA_PATH="