Update README.md

This commit is contained in:
Zhean Xu 2025-02-26 19:20:39 +08:00 committed by GitHub
parent bc989405fe
commit 78cacf70d4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -8,7 +8,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
## Performance
We test all shapes potentially used in DeepSeek-V3/R1 inference (including both prefilling and decoding, but without tensor parallelism) on H800 with NVCC 12.8. All speedup metrics are calculated in comparison to our internally and carefully optimized implementation based on CUTLASS 3.6.
We test all shapes potentially used in DeepSeek-V3/R1 inference (including both prefilling and decoding, but without tensor parallelism) on H800 SXM5 with NVCC 12.8. All speedup metrics are calculated in comparison to our internally and carefully optimized implementation based on CUTLASS 3.6.
DeepGEMM does not behave very well on some shapes, optimization PRs are welcomed if you are interested.