Commit Graph

20 Commits

Author SHA1 Message Date
fzyzcjy
fbcf430006
Update internode_ll.cu (#246) 2025-06-23 15:18:10 +08:00
Chenggang Zhao
8aaddf76ae
Remove the low-latency usage flag (#214) 2025-06-16 13:30:14 +08:00
Chenggang Zhao
1b92be8a71
Add automatic warp count control for low-latency kernels (#213)
* Add automatic warp count control for low-latency dispatch

* Add automatic warp count control for low-latency combine

* More assertions
2025-06-16 11:56:43 +08:00
Shifang Xu
21efbe9b48
Support UE8M0 data format. (#206) 2025-06-12 09:38:19 +08:00
Chenggang Zhao
8da2d7b38d
Fully remove barrier FIFO designs (#200)
* Fully remove FIFO slots

* Fully remove FIFO buffers

* Minor fix styles

* Fix some typos

* Bugs fixed

* Cleanup `ibgda_poll_cq`
2025-06-10 16:23:20 +08:00
Chenggang Zhao
5a2e37fa28
Support statistics tensor for low-latency kernels (#196) 2025-06-09 15:50:56 +08:00
Chenggang Zhao
0d1a855d81
Add low-latency kernel PCIe usage flag (#195)
* Add low-latency kernel usage flag

* Update comments
2025-06-09 14:37:13 +08:00
Chenggang Zhao
92405ddf30 Code cleanup and bug fixed 2025-05-23 11:14:16 +08:00
cywork121
68ae8b3d07
Feature: LL nvlink p2p (#173) 2025-05-23 10:37:45 +08:00
Shangyan Zhou
e255d57bef Use put_nbi_warp. 2025-04-22 12:29:46 +08:00
Shangyan Zhou
20b2aaaf9e Refactor some code. 2025-04-22 10:22:30 +08:00
Chenggang Zhao
42494864ba Remove useless control metadata for low-latency combine 2025-04-07 09:55:39 +08:00
Chenggang Zhao
ffc39ba084 Stronger acquire scope for low-latency kernels 2025-03-27 09:30:36 +08:00
Chenggang Zhao
dcaf73e5ff Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
Shangyan Zhou
2d0cf41dd1 Low latency kernels use rdma atomic to support AR. 2025-03-14 11:04:57 +08:00
Chenggang Zhao
ed7487c15e Support BF16 for low-latency kernels 2025-03-10 17:24:41 +08:00
Chenggang Zhao
1fc40d50f3 Improve AR performance 2025-03-06 21:41:19 +08:00
Chenggang Zhao
6cc3497df8 Remove all raw tensors for better P2P overlapping 2025-03-03 14:25:22 +08:00
Chenggang Zhao
77bb07aa20 Update some comments and docs 2025-02-27 10:27:22 +08:00
Chenggang Zhao
ebfe47e46f Initial commit 2025-02-25 09:07:53 +08:00