Hollow Man
667cf59636
Minor fix to the docs to correct FlashAttention-3's paper link and typos
...
Thank you for open source FlashMLA! Just read the write up and very amazing
work! Found some very minor mistakes regarding to typos, and the link
to the FlashAttention-3 paper is wrong as that is the original FlashAttention
paper, so I just send the PR here. Thanks again!
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-04-22 16:46:15 +03:00
Shengyu Liu
a9444cd67d
Update README.md ( #72 )
2025-04-22 18:03:14 +08:00
Shengyu Liu
c2067be3ea
Performance Update (2025.04.22) ( #71 )
...
* Fix benchmark script
* Performance optimization for compute-bound cases
* Add new testcase (s_k = 16384)
* Update README.md
* Update comment
* Update README.md
* Add the deep-dive blog
* Add background color for MLA Kernel Sched.drawio.svg
* Use relative path for the schedule image
* Move flash_mla.h to kernels/params.h
2025-04-22 17:50:57 +08:00
ljss
b31bfe72a8
add missing copyright
2025-03-01 18:24:24 +08:00
Jiashi Li
3e123bc93c
add community support for [AMD]
2025-03-01 17:55:58 +08:00
hpp
1aef31d163
reformat Community Support section
2025-02-27 09:42:09 +08:00
hpp
77d9d8d21b
add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex]
2025-02-27 09:40:47 +08:00
hpp
4430e398d9
add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex]
2025-02-27 09:39:18 +08:00
Jiashi Li
480405ada9
fix readme
2025-02-26 20:32:39 +08:00
Jiashi Li
966eedc2f7
Fix readme
2025-02-26 20:30:45 +08:00
Jiashi Li
01d6d40062
Merge pull request #45 from yangsijia-serena/main
...
fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them
2025-02-26 20:14:40 +08:00
hpp
6492cabb28
add Community Support of [MetaX] and [Moore Threads]
2025-02-26 11:26:42 +08:00
yangsijia.614
b67980309b
fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them
2025-02-26 00:14:51 +08:00
ljss
4edea86f9e
cuda12.8 recommendation
2025-02-26 00:05:57 +08:00
Jiashi Li
b549289fb4
Merge pull request #32 from sijiac/fp16-support
...
Support FP16 dtype in FlashMLA kenrel
2025-02-25 09:19:42 +08:00
ljss
e1e9fa98f8
Style fix
2025-02-25 09:18:11 +08:00
Sijia Chen
a3b74b8574
add flag to disable FP16 compile
2025-02-24 10:01:59 -08:00
Jiashi Li
18e32770cc
Merge pull request #35 from KnowingNothing/main
...
feat: add benchmark for flash_infer vs flash_mla
2025-02-25 00:41:23 +08:00
Jiashi Li
7d69520ad4
Merge pull request #37 from chunyang-wen/Update-doc-string
...
Update docstring
2025-02-25 00:38:31 +08:00
zhengsize
922f63bdaa
add gitignore for png and csv files in benchmark
2025-02-25 00:38:02 +08:00
chunyang.wen
c4c5912b05
Update docstring
2025-02-25 00:11:57 +08:00
zhengsize
4da4dbd303
feat: add benchmark for flash_infer vs flash_mla
2025-02-24 22:34:22 +08:00
Sijia Chen
65fb7732fc
support fp16
2025-02-24 01:58:53 -08:00
Sijia Chen
15a82b81b8
replace c10 optional with std optional
2025-02-24 00:25:40 -08:00
Jiashi Li
bcb90f2afd
Merge pull request #9 from homorunner/main
...
support Windows build
2025-02-24 13:21:58 +08:00
Jiashi Li
dd1161e396
Merge pull request #14 from lancerts/minor-fix
...
minor fix test
2025-02-24 13:13:58 +08:00
lancerts
4fbaa9527c
minor fix test
2025-02-23 20:12:49 -08:00
Jiashi Li
accc1695ee
Merge pull request #12 from sazczmh/main
...
tests: Triton 3.2.0 had remove the fast_flush parameter from do_bench
2025-02-24 11:57:41 +08:00
程元
e62bdb4d3f
support Windows build
2025-02-24 11:29:36 +08:00
sazc
051e40e82b
tests: Triton had remove the fast_flush parameter from do_bench ( #4485 )
2025-02-24 10:59:22 +08:00
Jiashi Li
414a2f3eed
Initial commit
...
i
2025-02-24 09:20:23 +08:00