mirror of
https://github.com/deepseek-ai/DeepEP
synced 2025-06-26 18:28:11 +00:00
* Update CMake files * Use TMA instead of LD/ST for intranode dispatch * Use TMA instead of LD/ST for intranode combine * Adjust configs * Test default configs as well * More warps for combine * Add inter-thread fence * Enable more warps * Do not use TMA for senders * Update configs * Remove useless wait |
||
|---|---|---|
| .. | ||
| api.cuh | ||
| buffer.cuh | ||
| CMakeLists.txt | ||
| configs.cuh | ||
| exception.cuh | ||
| ibgda_device.cuh | ||
| internode_ll.cu | ||
| internode.cu | ||
| intranode.cu | ||
| launch.cuh | ||
| runtime.cu | ||
| utils.cuh | ||