Commit Graph

53 Commits

Author SHA1 Message Date
chenhongmin.will
c7143a7bda Merge branch 'main' into will_fp8_mr 2025-02-28 22:15:46 +08:00
chenhongmin.will
8b939854d8 enable scale 2025-02-28 20:07:32 +08:00
chenhongmin.will
4e055a6142 reorg ut 2025-02-28 19:18:15 +08:00
chenhongmin.will
bfe38ab106 fix combine 2025-02-28 18:45:09 +08:00
chenhongmin.will
fd1e662deb fix mma0 2025-02-28 16:52:30 +08:00
chenhongmin.will
061af5fc56 use fa'3 transv 2025-02-28 14:54:44 +08:00
chenhongmin.will
0337732dc1 reorg 2025-02-28 08:09:02 +08:00
chenhongmin.will
1df91aff33 fix compile 2025-02-27 23:53:23 +08:00
chenhongmin.will
855c985b00 use 64x64 transpose_v 2025-02-27 22:45:00 +08:00
chenhongmin.will
d1689ab64f use mm1's Aregs instead of mma0's Cregs 2025-02-27 11:59:17 +08:00
hpp
1aef31d163 reformat Community Support section 2025-02-27 09:42:09 +08:00
hpp
77d9d8d21b add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex] 2025-02-27 09:40:47 +08:00
hpp
4430e398d9 add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex] 2025-02-27 09:39:18 +08:00
chenhongmin.will
1757a6db07 try fix 2025-02-27 09:11:17 +08:00
chenhongmin.will
dbd8c307eb fix sV 2025-02-27 01:42:58 +08:00
Jiashi Li
480405ada9
fix readme 2025-02-26 20:32:39 +08:00
Jiashi Li
966eedc2f7
Fix readme 2025-02-26 20:30:45 +08:00
Jiashi Li
01d6d40062
Merge pull request #45 from yangsijia-serena/main
fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them
2025-02-26 20:14:40 +08:00
chenhongmin.will
6dcea4952c add TransV 2025-02-26 18:48:24 +08:00
chenhongmin.will
6a4eb631e2 add transv barrier 2025-02-26 17:57:00 +08:00
chenhongmin.will
59f691763e fix Vt illegal 2025-02-26 17:39:29 +08:00
chenhongmin.will
29de9e0c79 debug mode 2025-02-26 16:03:17 +08:00
hpp
6492cabb28 add Community Support of [MetaX] and [Moore Threads] 2025-02-26 11:26:42 +08:00
chenhongmin.will
f6fab1b915 change to use per_tensor 2025-02-26 10:21:09 +08:00
chenhongmin.will
4b314cd655 update fp8 api 2025-02-26 08:33:25 +08:00
chenhongmin.will
ef644a56e0 update ut 2025-02-26 08:20:18 +08:00
chenhongmin.will
870418802a add fp8 ut 2025-02-26 07:57:51 +08:00
yangsijia.614
b67980309b fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them 2025-02-26 00:14:51 +08:00
ljss
4edea86f9e cuda12.8 recommendation 2025-02-26 00:05:57 +08:00
chenhongmin.will
dfe8ffc75a enable fp8 api 2025-02-25 23:02:57 +08:00
chenhongmin.will
c50d29d170 fix compile 2025-02-25 21:52:11 +08:00
chenhongmin.will
7409203f44 enable fp8 compile 2025-02-25 21:12:40 +08:00
chenhongmin.will
fed0499301 fp8 shared mem 2025-02-25 11:26:50 +08:00
chenhongmin.will
b67a18f850 update gmem 2025-02-25 09:45:19 +08:00
Jiashi Li
b549289fb4
Merge pull request #32 from sijiac/fp16-support
Support FP16 dtype in FlashMLA kenrel
2025-02-25 09:19:42 +08:00
ljss
e1e9fa98f8 Style fix 2025-02-25 09:18:11 +08:00
chenhongmin.will
d833dbd711 enable fp8 2025-02-25 09:03:02 +08:00
Sijia Chen
a3b74b8574 add flag to disable FP16 compile 2025-02-24 10:01:59 -08:00
Jiashi Li
18e32770cc
Merge pull request #35 from KnowingNothing/main
feat: add benchmark for flash_infer vs flash_mla
2025-02-25 00:41:23 +08:00
Jiashi Li
7d69520ad4
Merge pull request #37 from chunyang-wen/Update-doc-string
Update docstring
2025-02-25 00:38:31 +08:00
zhengsize
922f63bdaa add gitignore for png and csv files in benchmark 2025-02-25 00:38:02 +08:00
chunyang.wen
c4c5912b05 Update docstring 2025-02-25 00:11:57 +08:00
zhengsize
4da4dbd303 feat: add benchmark for flash_infer vs flash_mla 2025-02-24 22:34:22 +08:00
chenhongmin.will
dae0690055 init fp8 2025-02-24 21:12:36 +08:00
Sijia Chen
65fb7732fc support fp16 2025-02-24 01:58:53 -08:00
Sijia Chen
15a82b81b8 replace c10 optional with std optional 2025-02-24 00:25:40 -08:00
Jiashi Li
bcb90f2afd
Merge pull request #9 from homorunner/main
support Windows build
2025-02-24 13:21:58 +08:00
Jiashi Li
dd1161e396
Merge pull request #14 from lancerts/minor-fix
minor fix test
2025-02-24 13:13:58 +08:00
lancerts
4fbaa9527c minor fix test 2025-02-23 20:12:49 -08:00
Jiashi Li
accc1695ee
Merge pull request #12 from sazczmh/main
tests: Triton 3.2.0 had remove the fast_flush parameter from do_bench
2025-02-24 11:57:41 +08:00