mirror of
https://github.com/deepseek-ai/DeepSeek-Coder
synced 2025-06-26 18:25:53 +00:00
init project
This commit is contained in:
62
Evaluation/MBPP/README.md
Normal file
62
Evaluation/MBPP/README.md
Normal file
@@ -0,0 +1,62 @@
|
||||
## 1. Introduction
|
||||
|
||||
We provide a test script to evaluate the performance of the **deepseek-coder** model on code generation benchmarks with 3-shot setting, **[MBPP]**(https://huggingface.co/datasets/mbpp).
|
||||
|
||||
|
||||
|
||||
## 2. Setup
|
||||
|
||||
```
|
||||
pip install accelerate
|
||||
pip install attrdict
|
||||
pip install transformers
|
||||
pip install pytorch
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 3. Evaluation
|
||||
|
||||
We've created a sample script, **eval.sh**, that demonstrates how to test the **deepseek-coder-1.3b-base** model on the MBPP dataset leveraging **8** GPUs.
|
||||
|
||||
```bash
|
||||
MODEL_NAME_OR_PATH="deepseek-ai/deepseek-coder-1.3b-base"
|
||||
DATASET_ROOT="data/"
|
||||
LANGUAGE="python"
|
||||
python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py --logdir ${MODEL_NAME_OR_PATH} --dataroot ${DATASET_ROOT}
|
||||
```
|
||||
|
||||
## 4. Experimental Results
|
||||
|
||||
We report experimental results here for several models. We set the maximum input length to **4096** and the maximum output length to **500**, and employ the **greedy search strategy**.
|
||||
|
||||
|
||||
|
||||
#### (1) Multilingual Base Models
|
||||
|
||||
| Model | Size | Pass@1 |
|
||||
|-------------------|------|--------|
|
||||
| CodeShell | 7B | 38.6% |
|
||||
| CodeGeeX2 | 6B | 36.2% |
|
||||
| StarCoder | 16B | 42.8% |
|
||||
| CodeLLama-Base | 7B | 38.6% |
|
||||
| CodeLLama-Base | 13B | 47.0% |
|
||||
| CodeLLama-Base | 34B | 55.0% |
|
||||
| | | | | | | | | | | |
|
||||
| DeepSeek-Coder-Base| 1.3B | 46.8% |
|
||||
| DeepSeek-Coder-Base| 5.7B | 57.2% |
|
||||
| DeepSeek-Coder-Base| 6.7B | 60.6% |
|
||||
| DeepSeek-Coder-Base|33B | **66.0%** |
|
||||
|
||||
#### (2) Instruction-Tuned Models
|
||||
| Model | Size | Pass@1 |
|
||||
|---------------------|------|--------|
|
||||
| GPT-3.5-Turbo | - | 70.8% |
|
||||
| GPT-4 | - | **80.0%** |
|
||||
| | | | | | | | | | | |
|
||||
| DeepSeek-Coder-Instruct | 1.3B | 49.4% |
|
||||
| DeepSeek-Coder-Instruct | 5.7B | 62.4% |
|
||||
| DeepSeek-Coder-Instruct | 6.7B | 65.4% |
|
||||
| DeepSeek-Coder-Instruct | 33B | **70.0%** |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user