Shengyu Liu
|
a9444cd67d
|
Update README.md (#72)
|
2025-04-22 18:03:14 +08:00 |
|
Shengyu Liu
|
c2067be3ea
|
Performance Update (2025.04.22) (#71)
* Fix benchmark script
* Performance optimization for compute-bound cases
* Add new testcase (s_k = 16384)
* Update README.md
* Update comment
* Update README.md
* Add the deep-dive blog
* Add background color for MLA Kernel Sched.drawio.svg
* Use relative path for the schedule image
* Move flash_mla.h to kernels/params.h
|
2025-04-22 17:50:57 +08:00 |
|
Jiashi Li
|
3e123bc93c
|
add community support for [AMD]
|
2025-03-01 17:55:58 +08:00 |
|
hpp
|
1aef31d163
|
reformat Community Support section
|
2025-02-27 09:42:09 +08:00 |
|
hpp
|
77d9d8d21b
|
add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex]
|
2025-02-27 09:40:47 +08:00 |
|
hpp
|
4430e398d9
|
add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex]
|
2025-02-27 09:39:18 +08:00 |
|
Jiashi Li
|
480405ada9
|
fix readme
|
2025-02-26 20:32:39 +08:00 |
|
Jiashi Li
|
966eedc2f7
|
Fix readme
|
2025-02-26 20:30:45 +08:00 |
|
hpp
|
6492cabb28
|
add Community Support of [MetaX] and [Moore Threads]
|
2025-02-26 11:26:42 +08:00 |
|
ljss
|
4edea86f9e
|
cuda12.8 recommendation
|
2025-02-26 00:05:57 +08:00 |
|
Sijia Chen
|
65fb7732fc
|
support fp16
|
2025-02-24 01:58:53 -08:00 |
|
Jiashi Li
|
414a2f3eed
|
Initial commit
i
|
2025-02-24 09:20:23 +08:00 |
|