Commit Graph

7 Commits

Author SHA1 Message Date
Chenggang Zhao
42494864ba Remove useless control metadata for low-latency combine 2025-04-07 09:55:39 +08:00
Chenggang Zhao
66465476ae Support zero-copy for low-latency combine 2025-03-18 15:44:26 +08:00
Chenggang Zhao
dcaf73e5ff Support zero-copy for low-latency combine 2025-03-18 15:41:50 +08:00
Chenggang Zhao
ed7487c15e Support BF16 for low-latency kernels 2025-03-10 17:24:41 +08:00
Chenggang Zhao
1fc40d50f3 Improve AR performance 2025-03-06 21:41:19 +08:00
Chenggang Zhao
6cc3497df8 Remove all raw tensors for better P2P overlapping 2025-03-03 14:25:22 +08:00
Chenggang Zhao
ebfe47e46f Initial commit 2025-02-25 09:07:53 +08:00