Chenggang Zhao
|
37aa127451
|
Use swizzling instead of padding (#86)
* Add swizzling params
* Add TMA D descriptor
* Always use STSMx2
* Swizzling draft
* Compatible with padding
* Fix bugs
* Optimize swizzle performance
* Optimize expression
* Optimize TMA issues
* Fix README
* Stricter assertions
|
2025-04-14 15:20:58 +08:00 |
|
Chenggang Zhao
|
d14962f072
|
Add DG_NVCC_OVERRIDE_CPP_STANDARD
|
2025-04-03 15:53:29 +08:00 |
|
Chenggang Zhao
|
3a5539b7db
|
Use c++20
|
2025-04-03 15:47:59 +08:00 |
|
Chenggang Zhao
|
6db7e1863b
|
Solve STSM bank conflict via padding and 3D TMA
|
2025-04-03 15:39:35 +08:00 |
|
YLGH
|
b7db15ce94
|
Update nvcc flag c++20
Needed for fconcepts
|
2025-03-25 14:15:39 -07:00 |
|
Chenggang Zhao
|
7768319ffe
|
Remove unaligned predicates
|
2025-03-25 16:32:40 +08:00 |
|
Chenggang Zhao
|
6e55da296f
|
Fix python -O mode issues
|
2025-02-27 10:42:46 +08:00 |
|
Chenggang Zhao
|
a6d97a1c1b
|
Initial commit
|
2025-02-25 22:52:41 +08:00 |
|