chenhongmin.will
|
9887a5501e
|
update readme
|
2025-02-28 22:29:31 +08:00 |
|
chenhongmin.will
|
c7143a7bda
|
Merge branch 'main' into will_fp8_mr
|
2025-02-28 22:15:46 +08:00 |
|
chenhongmin.will
|
8b939854d8
|
enable scale
|
2025-02-28 20:07:32 +08:00 |
|
chenhongmin.will
|
4e055a6142
|
reorg ut
|
2025-02-28 19:18:15 +08:00 |
|
chenhongmin.will
|
bfe38ab106
|
fix combine
|
2025-02-28 18:45:09 +08:00 |
|
chenhongmin.will
|
fd1e662deb
|
fix mma0
|
2025-02-28 16:52:30 +08:00 |
|
chenhongmin.will
|
061af5fc56
|
use fa'3 transv
|
2025-02-28 14:54:44 +08:00 |
|
chenhongmin.will
|
0337732dc1
|
reorg
|
2025-02-28 08:09:02 +08:00 |
|
chenhongmin.will
|
1df91aff33
|
fix compile
|
2025-02-27 23:53:23 +08:00 |
|
chenhongmin.will
|
855c985b00
|
use 64x64 transpose_v
|
2025-02-27 22:45:00 +08:00 |
|
chenhongmin.will
|
d1689ab64f
|
use mm1's Aregs instead of mma0's Cregs
|
2025-02-27 11:59:17 +08:00 |
|
hpp
|
1aef31d163
|
reformat Community Support section
|
2025-02-27 09:42:09 +08:00 |
|
hpp
|
77d9d8d21b
|
add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex]
|
2025-02-27 09:40:47 +08:00 |
|
hpp
|
4430e398d9
|
add Community Support of [Hygon DCU] [Intellifusion] [Iluvatar Corex]
|
2025-02-27 09:39:18 +08:00 |
|
chenhongmin.will
|
1757a6db07
|
try fix
|
2025-02-27 09:11:17 +08:00 |
|
chenhongmin.will
|
dbd8c307eb
|
fix sV
|
2025-02-27 01:42:58 +08:00 |
|
Jiashi Li
|
480405ada9
|
fix readme
|
2025-02-26 20:32:39 +08:00 |
|
Jiashi Li
|
966eedc2f7
|
Fix readme
|
2025-02-26 20:30:45 +08:00 |
|
Jiashi Li
|
01d6d40062
|
Merge pull request #45 from yangsijia-serena/main
fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them
|
2025-02-26 20:14:40 +08:00 |
|
chenhongmin.will
|
6dcea4952c
|
add TransV
|
2025-02-26 18:48:24 +08:00 |
|
chenhongmin.will
|
6a4eb631e2
|
add transv barrier
|
2025-02-26 17:57:00 +08:00 |
|
chenhongmin.will
|
59f691763e
|
fix Vt illegal
|
2025-02-26 17:39:29 +08:00 |
|
chenhongmin.will
|
29de9e0c79
|
debug mode
|
2025-02-26 16:03:17 +08:00 |
|
hpp
|
6492cabb28
|
add Community Support of [MetaX] and [Moore Threads]
|
2025-02-26 11:26:42 +08:00 |
|
chenhongmin.will
|
f6fab1b915
|
change to use per_tensor
|
2025-02-26 10:21:09 +08:00 |
|
chenhongmin.will
|
4b314cd655
|
update fp8 api
|
2025-02-26 08:33:25 +08:00 |
|
chenhongmin.will
|
ef644a56e0
|
update ut
|
2025-02-26 08:20:18 +08:00 |
|
chenhongmin.will
|
870418802a
|
add fp8 ut
|
2025-02-26 07:57:51 +08:00 |
|
yangsijia.614
|
b67980309b
|
fix(benchmark): store 'compare' and 'one' perf results in csv files and visualize them
|
2025-02-26 00:14:51 +08:00 |
|
ljss
|
4edea86f9e
|
cuda12.8 recommendation
|
2025-02-26 00:05:57 +08:00 |
|
chenhongmin.will
|
dfe8ffc75a
|
enable fp8 api
|
2025-02-25 23:02:57 +08:00 |
|
chenhongmin.will
|
c50d29d170
|
fix compile
|
2025-02-25 21:52:11 +08:00 |
|
chenhongmin.will
|
7409203f44
|
enable fp8 compile
|
2025-02-25 21:12:40 +08:00 |
|
chenhongmin.will
|
fed0499301
|
fp8 shared mem
|
2025-02-25 11:26:50 +08:00 |
|
chenhongmin.will
|
b67a18f850
|
update gmem
|
2025-02-25 09:45:19 +08:00 |
|
Jiashi Li
|
b549289fb4
|
Merge pull request #32 from sijiac/fp16-support
Support FP16 dtype in FlashMLA kenrel
|
2025-02-25 09:19:42 +08:00 |
|
ljss
|
e1e9fa98f8
|
Style fix
|
2025-02-25 09:18:11 +08:00 |
|
chenhongmin.will
|
d833dbd711
|
enable fp8
|
2025-02-25 09:03:02 +08:00 |
|
Sijia Chen
|
a3b74b8574
|
add flag to disable FP16 compile
|
2025-02-24 10:01:59 -08:00 |
|
Jiashi Li
|
18e32770cc
|
Merge pull request #35 from KnowingNothing/main
feat: add benchmark for flash_infer vs flash_mla
|
2025-02-25 00:41:23 +08:00 |
|
Jiashi Li
|
7d69520ad4
|
Merge pull request #37 from chunyang-wen/Update-doc-string
Update docstring
|
2025-02-25 00:38:31 +08:00 |
|
zhengsize
|
922f63bdaa
|
add gitignore for png and csv files in benchmark
|
2025-02-25 00:38:02 +08:00 |
|
chunyang.wen
|
c4c5912b05
|
Update docstring
|
2025-02-25 00:11:57 +08:00 |
|
zhengsize
|
4da4dbd303
|
feat: add benchmark for flash_infer vs flash_mla
|
2025-02-24 22:34:22 +08:00 |
|
chenhongmin.will
|
dae0690055
|
init fp8
|
2025-02-24 21:12:36 +08:00 |
|
Sijia Chen
|
65fb7732fc
|
support fp16
|
2025-02-24 01:58:53 -08:00 |
|
Sijia Chen
|
15a82b81b8
|
replace c10 optional with std optional
|
2025-02-24 00:25:40 -08:00 |
|
Jiashi Li
|
bcb90f2afd
|
Merge pull request #9 from homorunner/main
support Windows build
|
2025-02-24 13:21:58 +08:00 |
|
Jiashi Li
|
dd1161e396
|
Merge pull request #14 from lancerts/minor-fix
minor fix test
|
2025-02-24 13:13:58 +08:00 |
|
lancerts
|
4fbaa9527c
|
minor fix test
|
2025-02-23 20:12:49 -08:00 |
|