Less stages for small shape K

This commit is contained in:
Chenggang Zhao
2025-04-28 10:36:08 +08:00
parent 86afd0c212
commit d374456787
2 changed files with 1 additions and 3 deletions

View File

@@ -18,8 +18,6 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
- [x] MoE scheduler with TMA multicast compatibility
- [x] Fix TMA multicast compatibility for indivisible shapes
- [ ] Skip useless computation on M
- [ ] Share pipeline stages between scheduled blocks
- [ ] TMA store pipeline
- [ ] NVRTC as a faster compiler
- [ ] Sanitizer for testing
- [ ] Weight gradient kernels for dense models