Several code lints x2

This commit is contained in:
Chenggang Zhao
2025-04-22 17:24:02 +08:00
parent 902208a17e
commit f4014953ad
3 changed files with 11 additions and 8 deletions

View File

@@ -16,7 +16,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
- [x] Shared memory swizzling for output
- [ ] Larger block size on N (up to 256)
- [x] MoE scheduler with TMA multicast compatibility
- [ ] Fix TMA multicast compatibility for indivisible shapes
- [x] Fix TMA multicast compatibility for indivisible shapes
- [ ] Weight gradient kernels for dense models
- [ ] Weight gradient kernels for MoE models
- [ ] Utility kernels for MoE models (as a pre-built CUDA library)