yukuai26
|
891f35adf5
|
Support TMA multicast on B with m_grouped_gemm_contiguous. (#88)
|
2025-04-21 09:43:17 +08:00 |
|
Chenggang Zhao
|
fea9309c1e
|
Update README
|
2025-04-18 11:38:52 +08:00 |
|
Zhean Xu
|
4499c4ccbb
|
Refactor MMA template with CUTLASS (#87)
* Refactor MMA with cutlass
* Update README.md
---------
Co-authored-by: Zhean Xu <xza@deepseek.com>
|
2025-04-14 17:06:49 +08:00 |
|
Chenggang Zhao
|
37aa127451
|
Use swizzling instead of padding (#86)
* Add swizzling params
* Add TMA D descriptor
* Always use STSMx2
* Swizzling draft
* Compatible with padding
* Fix bugs
* Optimize swizzle performance
* Optimize expression
* Optimize TMA issues
* Fix README
* Stricter assertions
|
2025-04-14 15:20:58 +08:00 |
|
Chenggang Zhao
|
327ec92f69
|
Update roadmap
|
2025-04-09 11:44:30 +08:00 |
|
Chenggang Zhao
|
677143be64
|
Update roadmap
|
2025-04-09 11:41:36 +08:00 |
|
Chenggang Zhao
|
989c9e3694
|
Update README
|
2025-04-09 11:17:47 +08:00 |
|
Chenggang Zhao
|
a9967bc27c
|
Update README
|
2025-04-09 11:14:45 +08:00 |
|
Chenggang Zhao
|
d14962f072
|
Add DG_NVCC_OVERRIDE_CPP_STANDARD
|
2025-04-03 15:53:29 +08:00 |
|
Chenggang Zhao
|
8002b769c0
|
Update README
|
2025-03-25 18:13:24 +08:00 |
|
Chenggang Zhao
|
55ab91f72f
|
Update performance
|
2025-03-25 18:06:47 +08:00 |
|
Zhean Xu
|
78cacf70d4
|
Update README.md
|
2025-02-26 19:20:39 +08:00 |
|
Zepp
|
7a70b439cd
|
doc: Use permanent link
|
2025-02-26 16:15:37 +08:00 |
|
Antonio Cheong
|
5da24e229a
|
spelling: README.md
behavior -> behaves
|
2025-02-26 02:36:04 +00:00 |
|
Chenggang Zhao
|
a6d97a1c1b
|
Initial commit
|
2025-02-25 22:52:41 +08:00 |
|