Merge remote-tracking branch 'upstream/main' into nvrtc

This commit is contained in:
Zihua Wu 2025-04-25 18:56:49 -07:00
commit d473f594be

View File

@ -17,6 +17,9 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
- [ ] Larger block size on N (up to 256)
- [x] MoE scheduler with TMA multicast compatibility
- [x] Fix TMA multicast compatibility for indivisible shapes
- [ ] Skip useless computation on M
- [ ] NVRTC as a faster compiler
- [ ] Sanitizer for testing
- [ ] Weight gradient kernels for dense models
- [ ] Weight gradient kernels for MoE models
- [ ] Utility kernels for MoE models (as a pre-built CUDA library)