Commit Graph

11 Commits

Author SHA1 Message Date
Shangyan Zhou
e255d57bef Use put_nbi_warp. 2025-04-22 12:29:46 +08:00
Shangyan Zhou
20b2aaaf9e Refactor some code. 2025-04-22 10:22:30 +08:00
Chenggang Zhao
42494864ba Remove useless control metadata for low-latency combine 2025-04-07 09:55:39 +08:00
Chenggang Zhao
ffc39ba084 Stronger acquire scope for low-latency kernels 2025-03-27 09:30:36 +08:00
Chenggang Zhao
dcaf73e5ff Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
Shangyan Zhou
2d0cf41dd1 Low latency kernels use rdma atomic to support AR. 2025-03-14 11:04:57 +08:00
Chenggang Zhao
ed7487c15e Support BF16 for low-latency kernels 2025-03-10 17:24:41 +08:00
Chenggang Zhao
1fc40d50f3 Improve AR performance 2025-03-06 21:41:19 +08:00
Chenggang Zhao
6cc3497df8 Remove all raw tensors for better P2P overlapping 2025-03-03 14:25:22 +08:00
Chenggang Zhao
77bb07aa20 Update some comments and docs 2025-02-27 10:27:22 +08:00
Chenggang Zhao
ebfe47e46f Initial commit 2025-02-25 09:07:53 +08:00