DeepEP/csrc
Shangyan Zhou 9eb2f84b3e
Optimize intranode combine. (#247)
* Increase the test round.

* Add warp synchronization.

* Shuffle the send warps.

* Add time elapsed into bench result.
2025-06-24 09:10:23 +08:00
..
kernels Optimize intranode combine. (#247) 2025-06-24 09:10:23 +08:00
CMakeLists.txt Use TMA instead of LD/ST for intra-node normal kernels (#191) 2025-06-06 15:40:17 +08:00
config.hpp Add automatic warp count control for low-latency kernels (#213) 2025-06-16 11:56:43 +08:00
deep_ep.cpp Update deep_ep.cpp (#242) 2025-06-23 11:44:06 +08:00
deep_ep.hpp Remove the low-latency usage flag (#214) 2025-06-16 13:30:14 +08:00
event.hpp Initial commit 2025-02-25 09:07:53 +08:00