Commit Graph

13 Commits

Author SHA1 Message Date
Chenggang Zhao
d14962f072 Add DG_NVCC_OVERRIDE_CPP_STANDARD 2025-04-03 15:53:29 +08:00
Chenggang Zhao
3a5539b7db Use c++20 2025-04-03 15:47:59 +08:00
Chenggang Zhao
6db7e1863b Solve STSM bank conflict via padding and 3D TMA 2025-04-03 15:39:35 +08:00
YLGH
b7db15ce94
Update nvcc flag c++20
Needed for fconcepts
2025-03-25 14:15:39 -07:00
Chenggang Zhao
7768319ffe Remove unaligned predicates 2025-03-25 16:32:40 +08:00
Chenggang Zhao
7ffb118e54 Support multicasting on B 2025-03-25 14:56:42 +08:00
Chenggang Zhao
bd2a775528 Code format 2025-03-11 13:26:10 +08:00
Chenggang Zhao
5233bad1e9
Merge pull request #55 from sleepcoo/fix-cudagraph
fix cuda_graph rng check error
2025-03-11 13:25:35 +08:00
sleepcoo
723a00338e fix cuda_graph rng check error 2025-03-11 12:40:42 +08:00
sazc
fcd1dcd99d Performance: reducing the percentage of FFMA interleaving yields a slight performance gain, roughly 0.5% 2025-03-05 17:50:22 +08:00
Chenggang Zhao
ca13ce0fab Fix TMA store bugs and code format 2025-02-27 17:57:21 +08:00
Chenggang Zhao
6e55da296f Fix python -O mode issues 2025-02-27 10:42:46 +08:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00