Commit Graph

15 Commits

Author SHA1 Message Date
yukuai26
891f35adf5 Support TMA multicast on B with m_grouped_gemm_contiguous. (#88) 2025-04-21 09:43:17 +08:00
Chenggang Zhao
fea9309c1e Update README 2025-04-18 11:38:52 +08:00
Zhean Xu
4499c4ccbb Refactor MMA template with CUTLASS (#87)
* Refactor MMA with cutlass

* Update README.md

---------

Co-authored-by: Zhean Xu <xza@deepseek.com>
2025-04-14 17:06:49 +08:00
Chenggang Zhao
37aa127451 Use swizzling instead of padding (#86)
* Add swizzling params

* Add TMA D descriptor

* Always use STSMx2

* Swizzling draft

* Compatible with padding

* Fix bugs

* Optimize swizzle performance

* Optimize expression

* Optimize TMA issues

* Fix README

* Stricter assertions
2025-04-14 15:20:58 +08:00
Chenggang Zhao
327ec92f69 Update roadmap 2025-04-09 11:44:30 +08:00
Chenggang Zhao
677143be64 Update roadmap 2025-04-09 11:41:36 +08:00
Chenggang Zhao
989c9e3694 Update README 2025-04-09 11:17:47 +08:00
Chenggang Zhao
a9967bc27c Update README 2025-04-09 11:14:45 +08:00
Chenggang Zhao
d14962f072 Add DG_NVCC_OVERRIDE_CPP_STANDARD 2025-04-03 15:53:29 +08:00
Chenggang Zhao
8002b769c0 Update README 2025-03-25 18:13:24 +08:00
Chenggang Zhao
55ab91f72f Update performance 2025-03-25 18:06:47 +08:00
Zhean Xu
78cacf70d4 Update README.md 2025-02-26 19:20:39 +08:00
Zepp
7a70b439cd doc: Use permanent link 2025-02-26 16:15:37 +08:00
Antonio Cheong
5da24e229a spelling: README.md
behavior -> behaves
2025-02-26 02:36:04 +00:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00