DeepEP/csrc/kernels
2025-04-21 17:44:32 +08:00
..
api.cuh Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
buffer.cuh Initial commit 2025-02-25 09:07:53 +08:00
CMakeLists.txt Initial commit 2025-02-25 09:07:53 +08:00
configs.cuh Initial commit 2025-02-25 09:07:53 +08:00
exception.cuh Initial commit 2025-02-25 09:07:53 +08:00
ibgda_device.cuh Revert ibgda_device.cuh and remove some comments. 2025-04-21 17:44:32 +08:00
internode_ll.cu Remove useless control metadata for low-latency combine 2025-04-07 09:55:39 +08:00
internode.cu Revert ibgda_device.cuh and remove some comments. 2025-04-21 17:44:32 +08:00
intranode.cu Fix bugs for intranode EP kernels 2025-03-14 16:09:23 +08:00
launch.cuh Initial commit 2025-02-25 09:07:53 +08:00
runtime.cu In the Internode Normal Kernel, when using nvshmem ibrc for RDMA data transmission, a single QP is used for data transfer between two GPUs, which limits kernel performance in network card dual-port and RoCE network scenarios. 2025-04-21 15:50:39 +08:00
utils.cuh Update some comments and docs 2025-02-27 10:27:22 +08:00