DeepEP/csrc/kernels
Chenggang Zhao 1b92be8a71
Add automatic warp count control for low-latency kernels (#213)
* Add automatic warp count control for low-latency dispatch

* Add automatic warp count control for low-latency combine

* More assertions
2025-06-16 11:56:43 +08:00
..
api.cuh Add automatic warp count control for low-latency kernels (#213) 2025-06-16 11:56:43 +08:00
buffer.cuh
CMakeLists.txt Support UE8M0 data format. (#206) 2025-06-12 09:38:19 +08:00
configs.cuh Support Ampere architecture (#204) 2025-06-11 15:48:18 +08:00
exception.cuh
ibgda_device.cuh Fully remove barrier FIFO designs (#200) 2025-06-10 16:23:20 +08:00
internode_ll.cu Add automatic warp count control for low-latency kernels (#213) 2025-06-16 11:56:43 +08:00
internode.cu Update assertion of num_rc_per_pe. 2025-06-13 15:16:23 +08:00
intranode.cu Update intranode.cu (#210) 2025-06-16 11:03:58 +08:00
launch.cuh Support Ampere architecture (#204) 2025-06-11 15:48:18 +08:00
layout.cu Support Ampere architecture (#204) 2025-06-11 15:48:18 +08:00
runtime.cu Support Ampere architecture (#204) 2025-06-11 15:48:18 +08:00
utils.cuh Add automatic warp count control for low-latency kernels (#213) 2025-06-16 11:56:43 +08:00