init project

2025-06-26 18:25:53 +00:00 · 2023-11-02 22:07:09 +08:00
commit a4ba628dfd
111 changed files with 26064 additions and 0 deletions
--- a/Evaluation/MBPP/README.md
+++ b/Evaluation/MBPP/README.md
@@ -0,0 +1,62 @@
+## 1. Introduction
+
+We provide a test script to evaluate the performance of the **deepseek-coder** model on code generation benchmarks with 3-shot setting, **[MBPP]**(https://huggingface.co/datasets/mbpp).
+
+
+
+## 2. Setup
+
+```
+pip install accelerate
+pip install attrdict
+pip install transformers
+pip install pytorch
+```
+
+
+
+## 3. Evaluation
+
+We've created a sample script, **eval.sh**, that demonstrates how to test the **deepseek-coder-1.3b-base** model on the MBPP dataset leveraging **8** GPUs.
+
+```bash
+MODEL_NAME_OR_PATH="deepseek-ai/deepseek-coder-1.3b-base"
+DATASET_ROOT="data/"
+LANGUAGE="python"
+python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py --logdir ${MODEL_NAME_OR_PATH} --dataroot ${DATASET_ROOT} 
+```
+
+## 4. Experimental Results
+
+We report experimental results here for several models. We set the maximum input length to **4096** and the maximum output length to **500**, and employ the **greedy search strategy**.
+
+
+
+#### (1) Multilingual Base Models
+
+| Model             | Size | Pass@1 | 
+|-------------------|------|--------|
+| CodeShell         | 7B   | 38.6%  | 
+| CodeGeeX2         | 6B   | 36.2%  |
+| StarCoder     | 16B  | 42.8%  | 
+| CodeLLama-Base   | 7B   | 38.6%  | 
+| CodeLLama-Base    | 13B  | 47.0%  | 
+| CodeLLama-Base    | 34B  | 55.0%  | 
+| | | | |  |  |  |  |  |  | |
+| DeepSeek-Coder-Base| 1.3B   | 46.8%  |
+| DeepSeek-Coder-Base| 5.7B   | 57.2%  | 
+| DeepSeek-Coder-Base| 6.7B   | 60.6%  | 
+| DeepSeek-Coder-Base|33B  | **66.0%**  |
+
+#### (2) Instruction-Tuned Models
+| Model               | Size | Pass@1  |
+|---------------------|------|--------|
+| GPT-3.5-Turbo            | -    | 70.8%  | 
+| GPT-4               | -    | **80.0%**  |
+| | | | |  |  |  |  |  |  | |
+| DeepSeek-Coder-Instruct | 1.3B  | 49.4%      |
+| DeepSeek-Coder-Instruct  | 5.7B  | 62.4%     |
+| DeepSeek-Coder-Instruct  | 6.7B  | 65.4%     |
+| DeepSeek-Coder-Instruct  | 33B | **70.0%**     | 
+
+