mirror of
https://github.com/deepseek-ai/DeepSeek-Coder
synced 2025-02-02 19:15:14 +00:00
30 lines
2.4 KiB
Markdown
30 lines
2.4 KiB
Markdown
|
## 1. Introduction
|
||
|
|
||
|
We provide a test script to evaluate the performance of the **deepseek-coder** model on code completion benchmarks. We select the widely-used benchmarks: [**DS-1000**](https://github.com/xlang-ai/DS-1000).
|
||
|
|
||
|
## 2. Evaluation
|
||
|
|
||
|
We directly use the scripts provided by the DS-1000 repository to evaluate the performance of the models. You can refer to [**DS-1000**](https://github.com/xlang-ai/DS-1000) to find more details about the evaluation.
|
||
|
|
||
|
|
||
|
## 3. Experimental Results
|
||
|
|
||
|
We report experimental results here for the completion mode of DS-1000. We set the maximum length to **2048**, and employ the **greedy search strategy**. To ensure a fair comparison, we apply identical hyper-parameters across all open-source models under evaluation.
|
||
|
|
||
|
| Model | Size | Matplotlib | Numpy | Pandas | Pytorch | Scipy | Scikit-Learn | Tensorflow | Avg |
|
||
|
|------------------------|------|------------|-------|--------|---------|-------|-------------|------------|-------|
|
||
|
| Codex-001 | - | 41.8% | 26.6% | 9.4% | 9.7% | 15.0% | 18.5% | 17.2% | 20.2% |
|
||
|
| Codex-002 | - | **57.0%** | 43.1% | **26.5%** | **41.8%** | 31.8% | **44.8%** | 39.3% | 39.2% |
|
||
|
| CodeShell | 7B | 34.1% | 21.8% | 10.7% | 11.8% | 17.0% | 20.0% | 15.6% | 18.8% |
|
||
|
| CodeGeeX2 | 6B | 38.7% | 26.8% | 14.4% | 11.8% | 19.8% | 27.0% | 17.8% | 22.9% |
|
||
|
| StarCoder | 16B | 47.7% | 31.4% | 12.7% | 25% | 22.6% | 35.7% | 22.2% | 27.2% |
|
||
|
| CodeLLama-Base | 7B | 41.9% | 24.6% | 14.8% | 16.2% | 18.9% | 17.4% | 17.8% | 22.1% |
|
||
|
| CodeLLama-Base | 13B | 46.5% | 28.6% | 18.2% | 19.1% | 18.9% | 27.8% | 33.3% | 26.8% |
|
||
|
| CodeLLama-Base | 34B | 50.3% | 42.7% | 23.0% | 25.0% | 28.3% | 33.9% | 40.0% | 34.3% |
|
||
|
| | | | | | | | | | | |
|
||
|
| DeepSeek-Coder-Base | 1.3B | 32.3% | 21.4% | 9.3% | 8.8% | 8.5% | 16.5% | 8.9% | 16.2% |
|
||
|
| DeepSeek-Coder-Base | 5.7B | 51.1% | 31.8% | 19.9% | 14.7% | 17.0% | 29.6% | 15.6% | 27.7% |
|
||
|
| DeepSeek-Coder-Base | 6.7B | 48.4% | 35.5% | 20.6% | 19.1% | 22.6% | 38.3% | 24.4% | 30.5% |
|
||
|
| DeepSeek-Coder-Base | 33B | 56.1% | **49.6%** | 25.8% | 36.8% | **36.8%** | 40.0% | **46.7%** | **40.2%** |
|
||
|
|