Commit Graph

10 Commits

Author SHA1 Message Date
fzyzcjy
36b5c27993
Update buffer.py 2025-03-25 09:12:36 +08:00
Chenggang Zhao
dcaf73e5ff Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
Dmytro Dzhulgakov
50ac280ae7 comments 2025-03-13 00:42:08 +00:00
Dmytro Dzhulgakov
b3b61ef5ef Allow passing output tensor in low_latency_combine 2025-03-10 22:19:21 +00:00
Chenggang Zhao
ed7487c15e Support BF16 for low-latency kernels 2025-03-10 17:24:41 +08:00
Chenggang Zhao
458cdcb22a Fix AR bugs for normal kernels 2025-03-05 17:13:35 +08:00
Chenggang Zhao
1553fc42bf Improve EP2/4 performance 2025-03-04 15:34:33 +08:00
Chenggang Zhao
2a3cac903a Add some docs 2025-03-04 10:19:42 +08:00
Chenggang Zhao
3885404ffb Add NVSHMEM_IB_ENABLE_RELAXED_ORDERING 2025-02-26 17:54:12 +08:00
Chenggang Zhao
ebfe47e46f Initial commit 2025-02-25 09:07:53 +08:00