Commit Graph

17 Commits

Author SHA1 Message Date
Zihua Wu
f6198492cb feat: drop support for CUDA<12.3
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-25 18:56:40 -07:00
Zihua Wu
46762b6903 feat: make API more general
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-23 02:34:23 -07:00
Gabriel Wu
2d8c4f22d5 fix: windows compat
Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-23 14:47:15 +08:00
Zihua Wu
40c09fb883 feat: fix win compat
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-22 23:01:23 -07:00
Zihua Wu
a3210ac850 feat: save kernel name to file
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-22 22:23:47 -07:00
Zihua Wu
767793bf95 feat: compat for old drivers
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-22 20:57:44 -07:00
Zihua Wu
78c7fa347e fix: compiler version
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-23 00:06:18 +00:00
Zihua Wu
c14cad0c06 refactor: compile to .cubin and add NVRTC option
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-22 10:17:52 +00:00
Zihua Wu
27cd276e19 [wip] refactor: compile to .cubin
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-22 08:08:40 +00:00
Chenggang Zhao
37aa127451
Use swizzling instead of padding (#86)
* Add swizzling params

* Add TMA D descriptor

* Always use STSMx2

* Swizzling draft

* Compatible with padding

* Fix bugs

* Optimize swizzle performance

* Optimize expression

* Optimize TMA issues

* Fix README

* Stricter assertions
2025-04-14 15:20:58 +08:00
Chenggang Zhao
d14962f072 Add DG_NVCC_OVERRIDE_CPP_STANDARD 2025-04-03 15:53:29 +08:00
Chenggang Zhao
3a5539b7db Use c++20 2025-04-03 15:47:59 +08:00
Chenggang Zhao
6db7e1863b Solve STSM bank conflict via padding and 3D TMA 2025-04-03 15:39:35 +08:00
YLGH
b7db15ce94
Update nvcc flag c++20
Needed for fconcepts
2025-03-25 14:15:39 -07:00
Chenggang Zhao
7768319ffe Remove unaligned predicates 2025-03-25 16:32:40 +08:00
Chenggang Zhao
6e55da296f Fix python -O mode issues 2025-02-27 10:42:46 +08:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00