* fix math hungarian exam eval error

* fix yi math score
This commit is contained in:
DeepSeekPH 2023-12-04 18:11:50 +08:00 committed by GitHub
parent 4f229779f2
commit 99dd5694d2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 2 additions and 2 deletions

View File

@ -126,7 +126,7 @@ In line with Grok-1, we have evaluated the model's mathematical capabilities usi
<img src="images/mathexam.png" alt="result" width="70%">
</div>
**Remark:** Some results are obtained by DeepSeek LLM authors, while others are done by Grok-1 authors. We found some models count the score of the last question (Llemma 34b and Mammoth) while some (MetaMath-7B) are not in the original evaluation. In our evaluation, we count the last question score. Evaluation details are [here](https://github.com/deepseek-ai/DeepSeek-LLM/tree/HEAD/evaluation/hungarian_national_hs_solutions).
**Remark:** We have rectified an error from our initial evaluation. In this revised version, we have omitted the lowest scores for questions 16, 17, 18, as well as for the aforementioned image. Evaluation details are [here](https://github.com/deepseek-ai/DeepSeek-LLM/tree/HEAD/evaluation/hungarian_national_hs_solutions).
---

View File

@ -22,7 +22,7 @@
| Model | DeepSeek LLM 67B Chat | Qwen-14B-Chat | ChatGLM3-6B | Baichuan2-Chat-13B | Yi-Chat-34B | GPT-3.5-Turbo | Grok-1 | Claude 2 | GPT-4 |
|:-----------------------------------:|:---------------------:|:-------------:|:-----------:|:------------------:|:-----------:|:-------------:|:------:|:--------:|:-----:|
| Hungarian National High-School Exam | **65** | 38.5 | 37 | 20.5 | 44 | 41 | 59 | 55 | 68 |
| Hungarian National High-School Exam | **58** | 36.5 | 32 | 19.5 | 39 | 41 | 59 | 55 | 68 |
| Model | Qwen-14B-Chat | ChatGLM3-6B | Baichuan2-Chat-13B | Yi-Chat-34B | PaLM2 Small | DeepSeek LLM 67B Chat | GPT-4 |

Binary file not shown.

Before

Width:  |  Height:  |  Size: 252 KiB

After

Width:  |  Height:  |  Size: 254 KiB