From 33e0c3ce406fd1b82891501f5354ceacc64ff5c8 Mon Sep 17 00:00:00 2001 From: Chenggang Zhao Date: Thu, 24 Apr 2025 14:37:53 +0800 Subject: [PATCH] Update plans --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 5d925da..dab1f05 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,9 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert - [ ] Larger block size on N (up to 256) - [x] MoE scheduler with TMA multicast compatibility - [x] Fix TMA multicast compatibility for indivisible shapes +- [ ] Skip useless computation on M +- [ ] NVRTC as a faster compiler +- [ ] Sanitizer for testing - [ ] Weight gradient kernels for dense models - [ ] Weight gradient kernels for MoE models - [ ] Utility kernels for MoE models (as a pre-built CUDA library)