Chenggang Zhao
|
cbd92fd0fc
|
Update README
|
2025-03-27 15:57:59 +08:00 |
|
Chenggang Zhao
|
ffc39ba084
|
Stronger acquire scope for low-latency kernels
|
2025-03-27 09:30:36 +08:00 |
|
Chenggang Zhao
|
7d52ad7248
|
Merge pull request #89 from fzyzcjy/patch-1
Super tiny fix typo
|
2025-03-25 09:28:44 +08:00 |
|
Chenggang Zhao
|
ae0eafd2be
|
Remove confusing comments
|
2025-03-25 09:27:34 +08:00 |
|
fzyzcjy
|
36b5c27993
|
Update buffer.py
|
2025-03-25 09:12:36 +08:00 |
|
Chenggang Zhao
|
c4b8ffc37c
|
Merge pull request #79 from deepseek-ai/zero-copy-combine
Support zero-copy for low-latency combine
|
2025-03-18 15:46:45 +08:00 |
|
Chenggang Zhao
|
66465476ae
|
Support zero-copy for low-latency combine
|
2025-03-18 15:44:26 +08:00 |
|
Chenggang Zhao
|
dcaf73e5ff
|
Support zero-copy for low-latency combine
|
2025-03-18 15:41:50 +08:00 |
|
Chenggang Zhao
|
82dcf48fd3
|
Fix bugs for intranode EP kernels
|
2025-03-14 16:09:23 +08:00 |
|
Chenggang Zhao
|
043fa5fa99
|
Merge pull request #73 from deepseek-ai/p2p-signal
Low latency kernels use rdma atomic to support AR
|
2025-03-14 11:55:17 +08:00 |
|
Shangyan Zhou
|
38cdaf390c
|
Fix style.
|
2025-03-14 11:22:00 +08:00 |
|
Shangyan Zhou
|
2d0cf41dd1
|
Low latency kernels use rdma atomic to support AR.
|
2025-03-14 11:04:57 +08:00 |
|
Chenggang Zhao
|
7128ba3e39
|
Merge pull request #66 from dzhulgakov/combine-out-arg
Allow passing output tensor in low_latency_combine
|
2025-03-13 09:18:06 +08:00 |
|
Dmytro Dzhulgakov
|
50ac280ae7
|
comments
|
2025-03-13 00:42:08 +00:00 |
|
Chenggang Zhao
|
0008c6755e
|
Merge pull request #67 from deepseek-ai/roce-support
Update NVSHMEM to v3.2.5.
|
2025-03-11 09:30:45 +08:00 |
|
Dmytro Dzhulgakov
|
b3b61ef5ef
|
Allow passing output tensor in low_latency_combine
|
2025-03-10 22:19:21 +00:00 |
|
Chenggang Zhao
|
ed7487c15e
|
Support BF16 for low-latency kernels
|
2025-03-10 17:24:41 +08:00 |
|
Chenggang Zhao
|
1fc40d50f3
|
Improve AR performance
|
2025-03-06 21:41:19 +08:00 |
|
Chenggang Zhao
|
41385ba5b3
|
Merge pull request #45 from deepseek-ai/ar-support
Fix AR bugs for normal kernels
|
2025-03-06 09:48:17 +08:00 |
|
Chenggang Zhao
|
458cdcb22a
|
Fix AR bugs for normal kernels
|
2025-03-05 17:13:35 +08:00 |
|
Shangyan Zhou
|
e995aa22db
|
Update NVSHMEM to v3.2.5.
|
2025-03-05 16:16:52 +08:00 |
|
Chenggang Zhao
|
680e424bdc
|
Bugs fixed
|
2025-03-05 14:27:45 +08:00 |
|
Chenggang Zhao
|
592296cd45
|
Add some plans
|
2025-03-04 15:54:46 +08:00 |
|
Chenggang Zhao
|
1553fc42bf
|
Improve EP2/4 performance
|
2025-03-04 15:34:33 +08:00 |
|
Chenggang Zhao
|
55cdd9a64f
|
Fix typo
|
2025-03-04 14:17:58 +08:00 |
|
Chenggang Zhao
|
2a3cac903a
|
Add some docs
|
2025-03-04 10:19:42 +08:00 |
|
Chenggang Zhao
|
c5b4040502
|
Enable intranode kernel tests with EP2 and EP4
|
2025-03-03 15:01:02 +08:00 |
|
Chenggang Zhao
|
6cc3497df8
|
Remove all raw tensors for better P2P overlapping
|
2025-03-03 14:25:22 +08:00 |
|
Chenggang Zhao
|
f60306409a
|
Merge pull request #32 from youkaichao/youkaichao-patch-1
Update path
|
2025-03-03 09:19:28 +08:00 |
|
youkaichao
|
88b1622e7d
|
update path
|
2025-02-28 17:26:14 +08:00 |
|
Shangyan Zhou
|
231e17ebb7
|
Merge pull request #29 from youkaichao/youkaichao-patch-1
fix installation
|
2025-02-28 17:21:47 +08:00 |
|
youkaichao
|
30e2778d18
|
Update README.md
|
2025-02-28 16:56:09 +08:00 |
|
Chenggang Zhao
|
77bb07aa20
|
Update some comments and docs
|
2025-02-27 10:27:22 +08:00 |
|
Chenggang Zhao
|
3885404ffb
|
Add NVSHMEM_IB_ENABLE_RELAXED_ORDERING
|
2025-02-26 17:54:12 +08:00 |
|
Chenggang Zhao
|
45f481b87b
|
Update figures
|
2025-02-26 16:24:59 +08:00 |
|
haswelliris
|
1a0a8bda09
|
Update prerequisites installation instructions
|
2025-02-25 17:19:07 +08:00 |
|
Chenggang Zhao
|
84d3d6fdee
|
Update README.md
|
2025-02-25 10:59:09 +08:00 |
|
Chenggang Zhao
|
ebfe47e46f
|
Initial commit
|
2025-02-25 09:07:53 +08:00 |
|