DeepEP/csrc/kernels
2025-03-28 06:43:29 +00:00
..
api.cuh Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
buffer.cuh Initial commit 2025-02-25 09:07:53 +08:00
CMakeLists.txt Initial commit 2025-02-25 09:07:53 +08:00
configs.cuh Initial commit 2025-02-25 09:07:53 +08:00
exception.cuh Initial commit 2025-02-25 09:07:53 +08:00
ibgda_device.cuh Fix style. 2025-03-14 11:22:00 +08:00
internode_ll.cu Stronger acquire scope for low-latency kernels 2025-03-27 09:30:36 +08:00
internode.cu For the SMs which calculate metadata in notify_dispatch, each warp in the SM is used to calculate the metadata of one channel. The default configuration is 8 warps for 10 channels, which needs two rounds of loop. Maybe the number of warps can be configured to the number of the channels so that one loop is enough. 2025-03-28 06:43:29 +00:00
intranode.cu Fix bugs for intranode EP kernels 2025-03-14 16:09:23 +08:00
launch.cuh Initial commit 2025-02-25 09:07:53 +08:00
runtime.cu Low latency kernels use rdma atomic to support AR. 2025-03-14 11:04:57 +08:00
utils.cuh Update some comments and docs 2025-02-27 10:27:22 +08:00