Zihua Wu
|
f6198492cb
|
feat: drop support for CUDA<12.3
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-25 18:56:40 -07:00 |
|
Zihua Wu
|
46762b6903
|
feat: make API more general
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-23 02:34:23 -07:00 |
|
Gabriel Wu
|
2d8c4f22d5
|
fix: windows compat
Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-23 14:47:15 +08:00 |
|
Zihua Wu
|
40c09fb883
|
feat: fix win compat
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-22 23:01:23 -07:00 |
|
Zihua Wu
|
a3210ac850
|
feat: save kernel name to file
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-22 22:23:47 -07:00 |
|
Zihua Wu
|
767793bf95
|
feat: compat for old drivers
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-22 20:57:44 -07:00 |
|
Zihua Wu
|
78c7fa347e
|
fix: compiler version
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-23 00:06:18 +00:00 |
|
Zihua Wu
|
c14cad0c06
|
refactor: compile to .cubin and add NVRTC option
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-22 10:17:52 +00:00 |
|
Zihua Wu
|
27cd276e19
|
[wip] refactor: compile to .cubin
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-04-22 08:08:40 +00:00 |
|
Chenggang Zhao
|
37aa127451
|
Use swizzling instead of padding (#86)
* Add swizzling params
* Add TMA D descriptor
* Always use STSMx2
* Swizzling draft
* Compatible with padding
* Fix bugs
* Optimize swizzle performance
* Optimize expression
* Optimize TMA issues
* Fix README
* Stricter assertions
|
2025-04-14 15:20:58 +08:00 |
|
Chenggang Zhao
|
d14962f072
|
Add DG_NVCC_OVERRIDE_CPP_STANDARD
|
2025-04-03 15:53:29 +08:00 |
|
Chenggang Zhao
|
3a5539b7db
|
Use c++20
|
2025-04-03 15:47:59 +08:00 |
|
Chenggang Zhao
|
6db7e1863b
|
Solve STSM bank conflict via padding and 3D TMA
|
2025-04-03 15:39:35 +08:00 |
|
YLGH
|
b7db15ce94
|
Update nvcc flag c++20
Needed for fconcepts
|
2025-03-25 14:15:39 -07:00 |
|
Chenggang Zhao
|
7768319ffe
|
Remove unaligned predicates
|
2025-03-25 16:32:40 +08:00 |
|
Chenggang Zhao
|
6e55da296f
|
Fix python -O mode issues
|
2025-02-27 10:42:46 +08:00 |
|
Chenggang Zhao
|
a6d97a1c1b
|
Initial commit
|
2025-02-25 22:52:41 +08:00 |
|