Commit Graph

31 Commits

Author SHA1 Message Date
Hollow Man
667cf59636
Minor fix to the docs to correct FlashAttention-3's paper link and typos
Thank you for open source FlashMLA! Just read the write up and very amazing
work! Found some very minor mistakes regarding to typos, and the link
to the FlashAttention-3 paper is wrong as that is the original FlashAttention
paper, so I just send the PR here. Thanks again!

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-04-22 16:46:15 +03:00
Shengyu Liu
a9444cd67d
Update README.md (#72) 2025-04-22 18:03:14 +08:00
Shengyu Liu
c2067be3ea
Performance Update (2025.04.22) (#71)
* Fix benchmark script

* Performance optimization for compute-bound cases

* Add new testcase (s_k = 16384)

* Update README.md

* Update comment

* Update README.md

* Add the deep-dive blog

* Add background color for MLA Kernel Sched.drawio.svg

* Use relative path for the schedule image

* Move flash_mla.h to kernels/params.h
2025-04-22 17:50:57 +08:00
ljss
b31bfe72a8 add missing copyright 2025-03-01 18:24:24 +08:00
Jiashi Li
3e123bc93c
add community support for [AMD] 2025-03-01 17:55:58 +08:00
hpp
1aef31d163 reformat Community Support section 2025-02-27 09:42:09 +08:00
hpp
77d9d8d21b add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex] 2025-02-27 09:40:47 +08:00
hpp
4430e398d9 add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex] 2025-02-27 09:39:18 +08:00
Jiashi Li
480405ada9
fix readme 2025-02-26 20:32:39 +08:00
Jiashi Li
966eedc2f7
Fix readme 2025-02-26 20:30:45 +08:00
Jiashi Li
01d6d40062
Merge pull request #45 from yangsijia-serena/main
fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them
2025-02-26 20:14:40 +08:00
hpp
6492cabb28 add Community Support of [MetaX] and [Moore Threads] 2025-02-26 11:26:42 +08:00
yangsijia.614
b67980309b fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them 2025-02-26 00:14:51 +08:00
ljss
4edea86f9e cuda12.8 recommendation 2025-02-26 00:05:57 +08:00
Jiashi Li
b549289fb4
Merge pull request #32 from sijiac/fp16-support
Support FP16 dtype in FlashMLA kenrel
2025-02-25 09:19:42 +08:00
ljss
e1e9fa98f8 Style fix 2025-02-25 09:18:11 +08:00
Sijia Chen
a3b74b8574 add flag to disable FP16 compile 2025-02-24 10:01:59 -08:00
Jiashi Li
18e32770cc
Merge pull request #35 from KnowingNothing/main
feat: add benchmark for flash_infer vs flash_mla
2025-02-25 00:41:23 +08:00
Jiashi Li
7d69520ad4
Merge pull request #37 from chunyang-wen/Update-doc-string
Update docstring
2025-02-25 00:38:31 +08:00
zhengsize
922f63bdaa add gitignore for png and csv files in benchmark 2025-02-25 00:38:02 +08:00
chunyang.wen
c4c5912b05 Update docstring 2025-02-25 00:11:57 +08:00
zhengsize
4da4dbd303 feat: add benchmark for flash_infer vs flash_mla 2025-02-24 22:34:22 +08:00
Sijia Chen
65fb7732fc support fp16 2025-02-24 01:58:53 -08:00
Sijia Chen
15a82b81b8 replace c10 optional with std optional 2025-02-24 00:25:40 -08:00
Jiashi Li
bcb90f2afd
Merge pull request #9 from homorunner/main
support Windows build
2025-02-24 13:21:58 +08:00
Jiashi Li
dd1161e396
Merge pull request #14 from lancerts/minor-fix
minor fix test
2025-02-24 13:13:58 +08:00
lancerts
4fbaa9527c minor fix test 2025-02-23 20:12:49 -08:00
Jiashi Li
accc1695ee
Merge pull request #12 from sazczmh/main
tests: Triton 3.2.0 had remove the fast_flush parameter from do_bench
2025-02-24 11:57:41 +08:00
程元
e62bdb4d3f support Windows build 2025-02-24 11:29:36 +08:00
sazc
051e40e82b tests: Triton had remove the fast_flush parameter from do_bench (#4485) 2025-02-24 10:59:22 +08:00
Jiashi Li
414a2f3eed Initial commit
i
2025-02-24 09:20:23 +08:00