Commit Graph

19 Commits

Author SHA1 Message Date
Chenggang Zhao
b8d90fb753
Support Ampere architecture (#204)
* Update README

* Update `setup.py`

* Fix headers

* Add `DISABLE_NVSHMEM` for APIs

* Fix launch

* Fix TMA settings

* Fix TMA usages

* Fix dlink

* Separate layout kernels

* Update version

* Add `is_sm90_compiled`

* Fix tests

* Add NVLink connection checks

* Update README

* Fix tests

* Add some comments

* Minor fix

* Minor fix

* Fix bugs
2025-06-11 15:48:18 +08:00
Chenggang Zhao
a8299ca7c2
Support CUDA graph for intranode normal kernels (#203) 2025-06-11 11:08:54 +08:00
Chenggang Zhao
d8dd185c68 Update README 2025-06-05 14:41:51 +08:00
Shangyan Zhou
de8cfca3cf Update readme. 2025-06-05 09:59:58 +08:00
Shangyan Zhou
1a0c8f6425 Add Infrawaves' fork to README. 2025-04-27 10:37:30 +08:00
Shangyan Zhou
3b1045db43 Fix the performance data. 2025-04-22 11:23:42 +08:00
Chenggang Zhao
edbb1bc3ff Several code lints 2025-04-22 10:52:10 +08:00
moningchen
e0eaaf94fb Add the performance data after internode optimization in the Readme file 2025-04-21 21:30:08 +08:00
Chenggang Zhao
e130cc6e7d Remove NVLink low-latency plan 2025-03-27 17:15:01 +08:00
Chenggang Zhao
cbd92fd0fc Update README 2025-03-27 15:57:59 +08:00
Chenggang Zhao
ed7487c15e Support BF16 for low-latency kernels 2025-03-10 17:24:41 +08:00
Chenggang Zhao
458cdcb22a Fix AR bugs for normal kernels 2025-03-05 17:13:35 +08:00
Chenggang Zhao
680e424bdc Bugs fixed 2025-03-05 14:27:45 +08:00
Chenggang Zhao
592296cd45 Add some plans 2025-03-04 15:54:46 +08:00
Chenggang Zhao
55cdd9a64f Fix typo 2025-03-04 14:17:58 +08:00
Chenggang Zhao
2a3cac903a Add some docs 2025-03-04 10:19:42 +08:00
Chenggang Zhao
77bb07aa20 Update some comments and docs 2025-02-27 10:27:22 +08:00
Chenggang Zhao
84d3d6fdee
Update README.md 2025-02-25 10:59:09 +08:00
Chenggang Zhao
ebfe47e46f Initial commit 2025-02-25 09:07:53 +08:00