fzyzcjy
|
fbcf430006
|
Update internode_ll.cu (#246)
|
2025-06-23 15:18:10 +08:00 |
|
Chenggang Zhao
|
8aaddf76ae
|
Remove the low-latency usage flag (#214)
|
2025-06-16 13:30:14 +08:00 |
|
Chenggang Zhao
|
1b92be8a71
|
Add automatic warp count control for low-latency kernels (#213)
* Add automatic warp count control for low-latency dispatch
* Add automatic warp count control for low-latency combine
* More assertions
|
2025-06-16 11:56:43 +08:00 |
|
Shifang Xu
|
21efbe9b48
|
Support UE8M0 data format. (#206)
|
2025-06-12 09:38:19 +08:00 |
|
Chenggang Zhao
|
8da2d7b38d
|
Fully remove barrier FIFO designs (#200)
* Fully remove FIFO slots
* Fully remove FIFO buffers
* Minor fix styles
* Fix some typos
* Bugs fixed
* Cleanup `ibgda_poll_cq`
|
2025-06-10 16:23:20 +08:00 |
|
Chenggang Zhao
|
5a2e37fa28
|
Support statistics tensor for low-latency kernels (#196)
|
2025-06-09 15:50:56 +08:00 |
|
Chenggang Zhao
|
0d1a855d81
|
Add low-latency kernel PCIe usage flag (#195)
* Add low-latency kernel usage flag
* Update comments
|
2025-06-09 14:37:13 +08:00 |
|
Chenggang Zhao
|
92405ddf30
|
Code cleanup and bug fixed
|
2025-05-23 11:14:16 +08:00 |
|
cywork121
|
68ae8b3d07
|
Feature: LL nvlink p2p (#173)
|
2025-05-23 10:37:45 +08:00 |
|
Shangyan Zhou
|
e255d57bef
|
Use put_nbi_warp .
|
2025-04-22 12:29:46 +08:00 |
|
Shangyan Zhou
|
20b2aaaf9e
|
Refactor some code.
|
2025-04-22 10:22:30 +08:00 |
|
Chenggang Zhao
|
42494864ba
|
Remove useless control metadata for low-latency combine
|
2025-04-07 09:55:39 +08:00 |
|
Chenggang Zhao
|
ffc39ba084
|
Stronger acquire scope for low-latency kernels
|
2025-03-27 09:30:36 +08:00 |
|
Chenggang Zhao
|
dcaf73e5ff
|
Support zero-copy for low-latency combine
|
2025-03-18 15:41:50 +08:00 |
|
Shangyan Zhou
|
2d0cf41dd1
|
Low latency kernels use rdma atomic to support AR.
|
2025-03-14 11:04:57 +08:00 |
|
Chenggang Zhao
|
ed7487c15e
|
Support BF16 for low-latency kernels
|
2025-03-10 17:24:41 +08:00 |
|
Chenggang Zhao
|
1fc40d50f3
|
Improve AR performance
|
2025-03-06 21:41:19 +08:00 |
|
Chenggang Zhao
|
6cc3497df8
|
Remove all raw tensors for better P2P overlapping
|
2025-03-03 14:25:22 +08:00 |
|
Chenggang Zhao
|
77bb07aa20
|
Update some comments and docs
|
2025-02-27 10:27:22 +08:00 |
|
Chenggang Zhao
|
ebfe47e46f
|
Initial commit
|
2025-02-25 09:07:53 +08:00 |
|