Commit Graph

11 Commits

Author SHA1 Message Date
Liang
d7ce715118
Revert "Update nvcc flag c++20" 2025-03-28 10:43:29 +08:00
YLGH
b7db15ce94
Update nvcc flag c++20
Needed for fconcepts
2025-03-25 14:15:39 -07:00
Chenggang Zhao
7768319ffe Remove unaligned predicates 2025-03-25 16:32:40 +08:00
Chenggang Zhao
7ffb118e54 Support multicasting on B 2025-03-25 14:56:42 +08:00
Chenggang Zhao
bd2a775528 Code format 2025-03-11 13:26:10 +08:00
Chenggang Zhao
5233bad1e9
Merge pull request #55 from sleepcoo/fix-cudagraph
fix cuda_graph rng check error
2025-03-11 13:25:35 +08:00
sleepcoo
723a00338e fix cuda_graph rng check error 2025-03-11 12:40:42 +08:00
sazc
fcd1dcd99d Performance: reducing the percentage of FFMA interleaving yields a slight performance gain, roughly 0.5% 2025-03-05 17:50:22 +08:00
Chenggang Zhao
ca13ce0fab Fix TMA store bugs and code format 2025-02-27 17:57:21 +08:00
Chenggang Zhao
6e55da296f Fix python -O mode issues 2025-02-27 10:42:46 +08:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00