mirror of
https://github.com/deepseek-ai/DeepEP
synced 2025-05-03 11:41:13 +00:00
Update buffer.py
This commit is contained in:
parent
c4b8ffc37c
commit
36b5c27993
@ -479,7 +479,7 @@ class Buffer:
|
||||
Moreover, not all tokens are valid, only some of the `num_max_dispatch_tokens_per_rank * num_ranks` are,
|
||||
as we do not synchronize CPU received count with GPU (also not incompatible with CUDA graph if synced).
|
||||
recv_count: a tensor shaped `[num_local_experts]` with type `torch.int`, indicating how many tokens each
|
||||
expert receive. As mentioned before, all not tokens are valid in `recv_x`.
|
||||
expert receive. As mentioned before, not all tokens are valid in `recv_x`.
|
||||
handle: the communication handle to be used in the `low_latency_combine` function.
|
||||
event: the event after executing the kernel (valid only if `async_finish` is set).
|
||||
hook: the receiving hook function (valid only if `return_recv_hook` is set).
|
||||
|
Loading…
Reference in New Issue
Block a user