mirror of
https://github.com/deepseek-ai/DeepGEMM
synced 2025-05-05 21:44:21 +00:00
Update README.md
This commit is contained in:
parent
584b67eebb
commit
857d57d157
@ -22,7 +22,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
|
||||
- [ ] CUDA PDL support
|
||||
- [ ] More scaling granularity support via templates
|
||||
- [ ] Larger TMA multicast size for some shapes
|
||||
- [ ] MMA template refactor with CUTLASS
|
||||
- [x] MMA template refactor with CUTLASS
|
||||
- [ ] Optimizations for unaligned shapes
|
||||
- [ ] Optimizations for power efficiency
|
||||
- [ ] Remove shape limitations on N and K
|
||||
|
Loading…
Reference in New Issue
Block a user