Add automatic warp count control for low-latency kernels (#213)

* Add automatic warp count control for low-latency dispatch

* Add automatic warp count control for low-latency combine

* More assertions
This commit is contained in:
Chenggang Zhao
2025-06-16 11:56:43 +08:00
committed by GitHub
parent 4e923188f7
commit 1b92be8a71
6 changed files with 83 additions and 65 deletions

View File

@@ -41,6 +41,7 @@ private:
// Device info and communication
int device_id;
int num_device_sms;
int rank, rdma_rank, nvl_rank;
int num_ranks, num_rdma_ranks, num_nvl_ranks;
cudaIpcMemHandle_t ipc_handles[NUM_MAX_NVL_PEERS];