cutlass@afa1772203
|
Initial commit
|
2025-02-24 09:20:23 +08:00 |
flash_api.cpp
|
change to use per_tensor
|
2025-02-26 10:21:09 +08:00 |
flash_fwd_mla_bf16_sm90.cu
|
update gmem
|
2025-02-25 09:45:19 +08:00 |
flash_fwd_mla_fp8_sm90.cu
|
update gmem
|
2025-02-25 09:45:19 +08:00 |
flash_fwd_mla_kernel.h
|
fix compile
|
2025-02-27 23:53:23 +08:00 |
flash_mla_utils.cu
|
fix compile
|
2025-02-25 21:52:11 +08:00 |
flash_mla.h
|
update fp8 api
|
2025-02-26 08:33:25 +08:00 |
fp8_transpose_v.h
|
fix compile
|
2025-02-27 23:53:23 +08:00 |
named_barrier.h
|
add transv barrier
|
2025-02-26 17:57:00 +08:00 |
softmax.h
|
Initial commit
|
2025-02-24 09:20:23 +08:00 |
static_switch.h
|
Initial commit
|
2025-02-24 09:20:23 +08:00 |
utils.h
|
use mm1's Aregs instead of mma0's Cregs
|
2025-02-27 11:59:17 +08:00 |