Commit Graph

12 Commits

Author SHA1 Message Date
Chenggang Zhao
8da2d7b38d
Fully remove barrier FIFO designs (#200)
* Fully remove FIFO slots

* Fully remove FIFO buffers

* Minor fix styles

* Fix some typos

* Bugs fixed

* Cleanup `ibgda_poll_cq`
2025-06-10 16:23:20 +08:00
Chenggang Zhao
1157693c0c Remove useless comments 2025-06-09 17:14:25 +08:00
Chenggang Zhao
5a2e37fa28
Support statistics tensor for low-latency kernels (#196) 2025-06-09 15:50:56 +08:00
Chenggang Zhao
0d1a855d81
Add low-latency kernel PCIe usage flag (#195)
* Add low-latency kernel usage flag

* Update comments
2025-06-09 14:37:13 +08:00
Chenggang Zhao
92405ddf30 Code cleanup and bug fixed 2025-05-23 11:14:16 +08:00
fzyzcjy
adc6e24cb0
Update deep_ep.cpp 2025-05-08 16:01:47 +08:00
fzyzcjy
23ded3bd8d
Update deep_ep.cpp 2025-04-29 09:58:31 +08:00
Chenggang Zhao
dcaf73e5ff Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
Dmytro Dzhulgakov
b3b61ef5ef Allow passing output tensor in low_latency_combine 2025-03-10 22:19:21 +00:00
Chenggang Zhao
ed7487c15e Support BF16 for low-latency kernels 2025-03-10 17:24:41 +08:00
Chenggang Zhao
6cc3497df8 Remove all raw tensors for better P2P overlapping 2025-03-03 14:25:22 +08:00
Chenggang Zhao
ebfe47e46f Initial commit 2025-02-25 09:07:53 +08:00