mirror of
https://github.com/deepseek-ai/DeepGEMM
synced 2025-05-06 19:34:23 +00:00
Add two more optimization TODOs
This commit is contained in:
parent
33e0c3ce40
commit
86afd0c212
@ -18,6 +18,8 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
|
|||||||
- [x] MoE scheduler with TMA multicast compatibility
|
- [x] MoE scheduler with TMA multicast compatibility
|
||||||
- [x] Fix TMA multicast compatibility for indivisible shapes
|
- [x] Fix TMA multicast compatibility for indivisible shapes
|
||||||
- [ ] Skip useless computation on M
|
- [ ] Skip useless computation on M
|
||||||
|
- [ ] Share pipeline stages between scheduled blocks
|
||||||
|
- [ ] TMA store pipeline
|
||||||
- [ ] NVRTC as a faster compiler
|
- [ ] NVRTC as a faster compiler
|
||||||
- [ ] Sanitizer for testing
|
- [ ] Sanitizer for testing
|
||||||
- [ ] Weight gradient kernels for dense models
|
- [ ] Weight gradient kernels for dense models
|
||||||
|
Loading…
Reference in New Issue
Block a user