DeepEP/csrc/kernels
sleepcoo a107266a4e support hidden size 4096
Co-authored-by: zhyncs <me@zhyncs.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-05-12 16:41:21 +08:00
..
api.cuh Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
buffer.cuh Initial commit 2025-02-25 09:07:53 +08:00
CMakeLists.txt Initial commit 2025-02-25 09:07:53 +08:00
configs.cuh Initial commit 2025-02-25 09:07:53 +08:00
exception.cuh Initial commit 2025-02-25 09:07:53 +08:00
ibgda_device.cuh Use put_nbi_warp. 2025-04-22 12:29:46 +08:00
internode_ll.cu Use put_nbi_warp. 2025-04-22 12:29:46 +08:00
internode.cu To mitigate incast congestion, shuffle the starting index of target rank for different ranks and channels 2025-05-10 09:55:35 +08:00
intranode.cu Fix bugs for intranode EP kernels 2025-03-14 16:09:23 +08:00
launch.cuh support hidden size 4096 2025-05-12 16:41:21 +08:00
runtime.cu Normal kernels always use IBGDA mode. 2025-04-22 10:36:24 +08:00
utils.cuh Update some comments and docs 2025-02-27 10:27:22 +08:00