Add two more optimization TODOs

This commit is contained in:
Chenggang Zhao 2025-04-27 17:51:11 +08:00
parent 33e0c3ce40
commit 86afd0c212

View File

@ -18,6 +18,8 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
- [x] MoE scheduler with TMA multicast compatibility
- [x] Fix TMA multicast compatibility for indivisible shapes
- [ ] Skip useless computation on M
- [ ] Share pipeline stages between scheduled blocks
- [ ] TMA store pipeline
- [ ] NVRTC as a faster compiler
- [ ] Sanitizer for testing
- [ ] Weight gradient kernels for dense models