From 86afd0c212ffabae3b15eb4fb4508cd13e7c7a7a Mon Sep 17 00:00:00 2001 From: Chenggang Zhao Date: Sun, 27 Apr 2025 17:51:11 +0800 Subject: [PATCH] Add two more optimization TODOs --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index dab1f05..6abda62 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,8 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert - [x] MoE scheduler with TMA multicast compatibility - [x] Fix TMA multicast compatibility for indivisible shapes - [ ] Skip useless computation on M +- [ ] Share pipeline stages between scheduled blocks +- [ ] TMA store pipeline - [ ] NVRTC as a faster compiler - [ ] Sanitizer for testing - [ ] Weight gradient kernels for dense models