ljss
|
9c5dfab6d1
|
update to cutlass 3.9
|
2025-04-29 12:02:57 +08:00 |
|
ljss
|
01a27728e6
|
Fix synchronization issues
|
2025-04-28 18:53:04 +08:00 |
|
Shengyu Liu
|
c2067be3ea
|
Performance Update (2025.04.22) (#71)
* Fix benchmark script
* Performance optimization for compute-bound cases
* Add new testcase (s_k = 16384)
* Update README.md
* Update comment
* Update README.md
* Add the deep-dive blog
* Add background color for MLA Kernel Sched.drawio.svg
* Use relative path for the schedule image
* Move flash_mla.h to kernels/params.h
|
2025-04-22 17:50:57 +08:00 |
|
ljss
|
b31bfe72a8
|
add missing copyright
|
2025-03-01 18:24:24 +08:00 |
|
Sijia Chen
|
a3b74b8574
|
add flag to disable FP16 compile
|
2025-02-24 10:01:59 -08:00 |
|
Sijia Chen
|
65fb7732fc
|
support fp16
|
2025-02-24 01:58:53 -08:00 |
|
Sijia Chen
|
15a82b81b8
|
replace c10 optional with std optional
|
2025-02-24 00:25:40 -08:00 |
|
Jiashi Li
|
414a2f3eed
|
Initial commit
i
|
2025-02-24 09:20:23 +08:00 |
|