DeepEP/csrc
Chenggang Zhao c8dceba110
Use TMA instead of LD/ST for intra-node normal kernels (#191)
* Update CMake files

* Use TMA instead of LD/ST for intranode dispatch

* Use TMA instead of LD/ST for intranode combine

* Adjust configs

* Test default configs as well

* More warps for combine

* Add inter-thread fence

* Enable more warps

* Do not use TMA for senders

* Update configs

* Remove useless wait
2025-06-06 15:40:17 +08:00
..
kernels Use TMA instead of LD/ST for intra-node normal kernels (#191) 2025-06-06 15:40:17 +08:00
CMakeLists.txt Use TMA instead of LD/ST for intra-node normal kernels (#191) 2025-06-06 15:40:17 +08:00
config.hpp
deep_ep.cpp Code cleanup and bug fixed 2025-05-23 11:14:16 +08:00
deep_ep.hpp Code cleanup and bug fixed 2025-05-23 11:14:16 +08:00
event.hpp