mirror of
https://github.com/deepseek-ai/DeepEP
synced 2025-05-05 20:44:48 +00:00
Fix the performance data.
This commit is contained in:
parent
edbb1bc3ff
commit
3b1045db43
@ -17,11 +17,11 @@ We test normal kernels on H800 (~160 GB/s NVLink maximum bandwidth), with each c
|
||||
| Type | Dispatch #EP | Bottleneck bandwidth | Combine #EP | Bottleneck bandwidth |
|
||||
|:---------:|:------------:|:--------------------:|:-----------:|:--------------------:|
|
||||
| Intranode | 8 | 153 GB/s (NVLink) | 8 | 158 GB/s (NVLink) |
|
||||
| Internode | 16 | 47 GB/s (RDMA) | 16 | 62 GB/s (RDMA) |
|
||||
| Internode | 32 | 59 GB/s (RDMA) | 32 | 60 GB/s (RDMA) |
|
||||
| Internode | 64 | 49 GB/s (RDMA) | 64 | 51 GB/s (RDMA) |
|
||||
| Internode | 16 | 43 GB/s (RDMA) | 16 | 43 GB/s (RDMA) |
|
||||
| Internode | 32 | 58 GB/s (RDMA) | 32 | 57 GB/s (RDMA) |
|
||||
| Internode | 64 | 51 GB/s (RDMA) | 64 | 50 GB/s (RDMA) |
|
||||
|
||||
**News (2025.04.22)**: the performance is optimized by 5-35% by Tencent Network Platform Department, see [#130](https://github.com/deepseek-ai/DeepEP/pull/130) for more details. Thanks for the contribution!
|
||||
**News (2025.04.22)**: with optimizations from Tencent Network Platform Department, performance was enhanced by up to 30%, see [#130](https://github.com/deepseek-ai/DeepEP/pull/130) for more details. Thanks for the contribution!
|
||||
|
||||
### Low-latency kernels with pure RDMA
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user