Commit Graph

8 Commits

Author SHA1 Message Date
A-transformer
92521df34d comment more clear about Memory Consistency and Barrier Visibility
Memory Consistency and Barrier Visibility: Both __syncthreads() and cute::cluster_sync() serve as synchronization points, ensuring that all threads reach the barrier before any proceed. This guarantees that all prior memory operations, including barrier initialization, are visible to all threads within the synchronization scope.
2025-02-27 22:01:50 +04:00
Liang
fbec9e5eee Update get_best_configs
a better strategy to choose config
2025-02-27 23:18:52 +08:00
dotrail
488b5fc467 fix typo 2025-02-27 11:53:33 +00:00
Chenggang Zhao
6da94d2d36 Add extra TMA checks 2025-02-27 18:20:57 +08:00
Chenggang Zhao
ca13ce0fab Fix TMA store bugs and code format 2025-02-27 17:57:21 +08:00
Chenggang Zhao
6e55da296f Fix python -O mode issues 2025-02-27 10:42:46 +08:00
AcraeaTerpsicore
96b31fd6bb fix typo 2025-02-26 18:37:22 +08:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00