DeepSeek-Coder/README.md at main

DeepSeek/DeepSeek-Coder

Fork 0

mirror of https://github.com/deepseek-ai/DeepSeek-Coder synced 2024-12-05 02:24:46 +00:00

ZHU QIHAO 13d5e4864d

Update README.md

2023-11-03 09:27:58 +08:00

1.8 KiB

Raw Permalink Blame History

1. Introduction

We provide a test script to evaluate the performance of the deepseek-coder model on code generation benchmarks, MBPP, with 3-shot setting.

2. Setup

pip install accelerate
pip install attrdict
pip install transformers
pip install pytorch

3. Evaluation

We've created a sample script, eval.sh, that demonstrates how to test the deepseek-coder-1.3b-base model on the MBPP dataset leveraging 8 GPUs.

MODEL_NAME_OR_PATH="deepseek-ai/deepseek-coder-1.3b-base"
DATASET_ROOT="data/"
LANGUAGE="python"
python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py --logdir ${MODEL_NAME_OR_PATH} --dataroot ${DATASET_ROOT}

4. Experimental Results

We report experimental results here for several models. We set the maximum input length to 4096 and the maximum output length to 500, and employ the greedy search strategy.

(1) Multilingual Base Models

Model	Size	Pass@1
CodeShell	7B	38.6%
CodeGeeX2	6B	36.2%
StarCoder	16B	42.8%
CodeLLama-Base	7B	38.6%
CodeLLama-Base	13B	47.0%
CodeLLama-Base	34B	55.0%

DeepSeek-Coder-Base	1.3B	46.8%
DeepSeek-Coder-Base	5.7B	57.2%
DeepSeek-Coder-Base	6.7B	60.6%
DeepSeek-Coder-Base	33B	66.0%

(2) Instruction-Tuned Models

Model	Size	Pass@1
GPT-3.5-Turbo	-	70.8%
GPT-4	-	80.0%

DeepSeek-Coder-Instruct	1.3B	49.4%
DeepSeek-Coder-Instruct	6.7B	65.4%
DeepSeek-Coder-Instruct	33B	70.0%

1.8 KiB Raw Permalink Blame History