Grouped GEMM skip useless computation for unaligned Ms (#103)

* Grouped GEMM skip useless computation for unaligned Ms * Update readme.md * small typo * Rename variables * Restore previous indent * Format * Refactor tests * Add `SkipComputation` types * Bug fixed * Format * Fix tests * Add assertions * Minor fix --------- Co-authored-by: yukuai <yukuai@deepseek.com> Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com>
2025-06-26 23:15:49 +00:00 · 2025-05-27 13:43:38 +08:00
parent 391755ada0
commit 8dfa329827
5 changed files with 106 additions and 93 deletions
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
 - [ ] Larger block size on N (up to 256)
 - [x] MoE scheduler with TMA multicast compatibility
 - [x] Fix TMA multicast compatibility for indivisible shapes
- [ ] Skip useless computation on M
+- [x] Skip useless computation on M
 - [x] NVRTC as a faster compiler
 - [ ] Stolen JIT cache
 - [ ] Sanitizer for testing