From ebf3d2f916f4da88834bbe3fe44d9ef7cd0d6f93 Mon Sep 17 00:00:00 2001 From: Chenggang Zhao Date: Wed, 14 May 2025 15:05:24 +0800 Subject: [PATCH] Update plans --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 2aa53ce..170c271 100644 --- a/README.md +++ b/README.md @@ -25,14 +25,14 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert - [ ] Sanitizer for testing - [x] Weight gradient kernels for dense models - [x] Weight gradient kernels for MoE models +- [ ] Better `get_best_configs` modeling - [ ] Utility kernels for MoE models (maybe with [tile-lang](https://github.com/tile-ai/tilelang)) - [ ] CUDA PDL support - [ ] More scaling granularity support via templates - [ ] Larger TMA multicast size for some shapes - [x] MMA template refactor with CUTLASS -- [ ] Optimizations for unaligned shapes - [ ] Optimizations for power efficiency -- [ ] Remove shape limitations on N and K +- [x] Remove shape limitations on N and K - [ ] BF16 kernels - [ ] Split/stream-k optimizations