DeepSeek-Coder

mirror of https://github.com/deepseek-ai/DeepSeek-Coder synced 2025-06-26 18:25:53 +00:00

History

Dejian Yang c160d96860 fix add_generation_prompt in latest version		2024-01-09 21:55:21 +08:00
..
__pycache__	init project	2023-11-02 22:07:09 +08:00
data	init project	2023-11-02 22:07:09 +08:00
human_eval	add mbpp instruct eval	2023-11-23 15:22:39 +08:00
utils	init project	2023-11-02 22:07:09 +08:00
eval_instruct.py	fix add_generation_prompt in latest version	2024-01-09 21:55:21 +08:00
eval_pal.py	init project	2023-11-02 22:07:09 +08:00
eval.sh	init project	2023-11-02 22:07:09 +08:00
mbpp.py	init project	2023-11-02 22:07:09 +08:00
README.md	Update README.md	2023-11-03 09:27:58 +08:00
test_config.yaml	init project	2023-11-02 22:07:09 +08:00

README.md

1. Introduction

We provide a test script to evaluate the performance of the deepseek-coder model on code generation benchmarks, MBPP, with 3-shot setting.

2. Setup

pip install accelerate
pip install attrdict
pip install transformers
pip install pytorch

3. Evaluation

We've created a sample script, eval.sh, that demonstrates how to test the deepseek-coder-1.3b-base model on the MBPP dataset leveraging 8 GPUs.

MODEL_NAME_OR_PATH="deepseek-ai/deepseek-coder-1.3b-base"
DATASET_ROOT="data/"
LANGUAGE="python"
python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py --logdir ${MODEL_NAME_OR_PATH} --dataroot ${DATASET_ROOT}

4. Experimental Results

We report experimental results here for several models. We set the maximum input length to 4096 and the maximum output length to 500, and employ the greedy search strategy.

(1) Multilingual Base Models

Model	Size	Pass@1
CodeShell	7B	38.6%
CodeGeeX2	6B	36.2%
StarCoder	16B	42.8%
CodeLLama-Base	7B	38.6%
CodeLLama-Base	13B	47.0%
CodeLLama-Base	34B	55.0%

DeepSeek-Coder-Base	1.3B	46.8%
DeepSeek-Coder-Base	5.7B	57.2%
DeepSeek-Coder-Base	6.7B	60.6%
DeepSeek-Coder-Base	33B	66.0%

(2) Instruction-Tuned Models

Model	Size	Pass@1
GPT-3.5-Turbo	-	70.8%
GPT-4	-	80.0%

DeepSeek-Coder-Instruct	1.3B	49.4%
DeepSeek-Coder-Instruct	6.7B	65.4%
DeepSeek-Coder-Instruct	33B	70.0%