Grouped GEMM skip useless computation for unaligned Ms (#103)

* Grouped GEMM skip useless computation for unaligned Ms

* Update readme.md

* small typo

* Rename variables

* Restore previous indent

* Format

* Refactor tests

* Add `SkipComputation` types

* Bug fixed

* Format

* Fix tests

* Add assertions

* Minor fix

---------

Co-authored-by: yukuai <yukuai@deepseek.com>
Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com>
This commit is contained in:
yukuai26
2025-05-27 13:43:38 +08:00
committed by GitHub
parent 391755ada0
commit 8dfa329827
5 changed files with 106 additions and 93 deletions

View File

@@ -19,7 +19,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
- [ ] Larger block size on N (up to 256)
- [x] MoE scheduler with TMA multicast compatibility
- [x] Fix TMA multicast compatibility for indivisible shapes
- [ ] Skip useless computation on M
- [x] Skip useless computation on M
- [x] NVRTC as a faster compiler
- [ ] Stolen JIT cache
- [ ] Sanitizer for testing