Liang
|
d7ce715118
|
Revert "Update nvcc flag c++20"
|
2025-03-28 10:43:29 +08:00 |
|
YLGH
|
b7db15ce94
|
Update nvcc flag c++20
Needed for fconcepts
|
2025-03-25 14:15:39 -07:00 |
|
Chenggang Zhao
|
7768319ffe
|
Remove unaligned predicates
|
2025-03-25 16:32:40 +08:00 |
|
Chenggang Zhao
|
7ffb118e54
|
Support multicasting on B
|
2025-03-25 14:56:42 +08:00 |
|
Chenggang Zhao
|
bd2a775528
|
Code format
|
2025-03-11 13:26:10 +08:00 |
|
Chenggang Zhao
|
5233bad1e9
|
Merge pull request #55 from sleepcoo/fix-cudagraph
fix cuda_graph rng check error
|
2025-03-11 13:25:35 +08:00 |
|
sleepcoo
|
723a00338e
|
fix cuda_graph rng check error
|
2025-03-11 12:40:42 +08:00 |
|
sazc
|
fcd1dcd99d
|
Performance: reducing the percentage of FFMA interleaving yields a slight performance gain, roughly 0.5%
|
2025-03-05 17:50:22 +08:00 |
|
Chenggang Zhao
|
ca13ce0fab
|
Fix TMA store bugs and code format
|
2025-02-27 17:57:21 +08:00 |
|
Chenggang Zhao
|
6e55da296f
|
Fix python -O mode issues
|
2025-02-27 10:42:46 +08:00 |
|
Chenggang Zhao
|
a6d97a1c1b
|
Initial commit
|
2025-02-25 22:52:41 +08:00 |
|