fujianhao.fjh
|
0f80da8458
|
fix: not output result in some linux system
|
2025-04-10 18:18:30 +08:00 |
|
Chenggang Zhao
|
42494864ba
|
Remove useless control metadata for low-latency combine
|
2025-04-07 09:55:39 +08:00 |
|
Chenggang Zhao
|
26fa72d80f
|
Fix zero-copy mode tests
|
2025-03-28 16:49:33 +08:00 |
|
Chenggang Zhao
|
ae0eafd2be
|
Remove confusing comments
|
2025-03-25 09:27:34 +08:00 |
|
Chenggang Zhao
|
dcaf73e5ff
|
Support zero-copy for low-latency combine
|
2025-03-18 15:41:50 +08:00 |
|
Dmytro Dzhulgakov
|
b3b61ef5ef
|
Allow passing output tensor in low_latency_combine
|
2025-03-10 22:19:21 +00:00 |
|
Chenggang Zhao
|
ed7487c15e
|
Support BF16 for low-latency kernels
|
2025-03-10 17:24:41 +08:00 |
|
Chenggang Zhao
|
458cdcb22a
|
Fix AR bugs for normal kernels
|
2025-03-05 17:13:35 +08:00 |
|
Chenggang Zhao
|
1553fc42bf
|
Improve EP2/4 performance
|
2025-03-04 15:34:33 +08:00 |
|
Chenggang Zhao
|
c5b4040502
|
Enable intranode kernel tests with EP2 and EP4
|
2025-03-03 15:01:02 +08:00 |
|
Chenggang Zhao
|
ebfe47e46f
|
Initial commit
|
2025-02-25 09:07:53 +08:00 |
|