Chenggang Zhao
|
8da2d7b38d
|
Fully remove barrier FIFO designs (#200)
* Fully remove FIFO slots
* Fully remove FIFO buffers
* Minor fix styles
* Fix some typos
* Bugs fixed
* Cleanup `ibgda_poll_cq`
|
2025-06-10 16:23:20 +08:00 |
|
Chenggang Zhao
|
1157693c0c
|
Remove useless comments
|
2025-06-09 17:14:25 +08:00 |
|
Chenggang Zhao
|
5a2e37fa28
|
Support statistics tensor for low-latency kernels (#196)
|
2025-06-09 15:50:56 +08:00 |
|
Chenggang Zhao
|
0d1a855d81
|
Add low-latency kernel PCIe usage flag (#195)
* Add low-latency kernel usage flag
* Update comments
|
2025-06-09 14:37:13 +08:00 |
|
Chenggang Zhao
|
92405ddf30
|
Code cleanup and bug fixed
|
2025-05-23 11:14:16 +08:00 |
|
fzyzcjy
|
adc6e24cb0
|
Update deep_ep.cpp
|
2025-05-08 16:01:47 +08:00 |
|
fzyzcjy
|
23ded3bd8d
|
Update deep_ep.cpp
|
2025-04-29 09:58:31 +08:00 |
|
Chenggang Zhao
|
dcaf73e5ff
|
Support zero-copy for low-latency combine
|
2025-03-18 15:41:50 +08:00 |
|
Dmytro Dzhulgakov
|
b3b61ef5ef
|
Allow passing output tensor in low_latency_combine
|
2025-03-10 22:19:21 +00:00 |
|
Chenggang Zhao
|
ed7487c15e
|
Support BF16 for low-latency kernels
|
2025-03-10 17:24:41 +08:00 |
|
Chenggang Zhao
|
6cc3497df8
|
Remove all raw tensors for better P2P overlapping
|
2025-03-03 14:25:22 +08:00 |
|
Chenggang Zhao
|
ebfe47e46f
|
Initial commit
|
2025-02-25 09:07:53 +08:00 |
|