DeepSeek-Math/README.md at 21cc5c6701a708a11cee0af8b1fe884e3294dc7a

mirror of https://github.com/deepseek-ai/DeepSeek-Math synced 2024-11-22 03:27:40 +00:00

ZhihongShao 21cc5c6701 init

2024-02-06 10:27:40 +08:00

1.9 KiB

Raw Blame History

1. Introduction

We provide a test script for both zero-shot and few-shot evaluation on mathematical reasoning benchmarks used in our paper.

2. Setup

First configure the prefix in environment.yml and then run the following command

conda env create -f environment.yml

3. Evaluation

For chain-of-thought evaluation of DeepSeekMath-Instruct and DeepSeekMath-RL, our script (see def markup_question() in run_subset_parallel.py) processes each question as follows:

English questions: {question}\nPlease reason step by step, and put your final answer within \\boxed{}.
Chinese questions: {question}\n请通过逐步推理来解答问题，并把最终答案放置于\\boxed{}中。

For tool-integrated reasoning, we process each question as follows:

English questions: {question}\nPlease integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}.
Chinese questions: {question}\n请结合自然语言和Python程序语言来解答问题，并把最终答案放置于\\boxed{}中。

We provide an example of testing the DeepSeekMath-Base 7B using 8 GPUs.

If you wish to use a different model or dataset, you can modify the configs in submit_eval_jobs.py and configs/*test_configs.json

python submit_eval_jobs.py --n-gpus 8

Wait for all processes to finish, and then run the following command to aggregate results from all processes

python summarize_results.py [--eval-atp]

where the option --eval-atp will invoke unsafe_score_minif2f_isabelle.py to evaluate the informal-to-formal proving results. Please make sure you have set up the PISA server before using this option.

A summary of all evaluation results will be saved as evaluation_results.json

4. Model Outputs

We provide all model outputs in outputs.zip.

1.9 KiB Raw Blame History Unescape Escape

1. Introduction

2. Setup

3. Evaluation

4. Model Outputs

1.9 KiB

Raw Blame History