FlashMLA/csrc
chenhongmin.will 1df91aff33 fix compile
2025-02-27 23:53:23 +08:00
..
cutlass@afa1772203 Initial commit 2025-02-24 09:20:23 +08:00
flash_api.cpp change to use per_tensor 2025-02-26 10:21:09 +08:00
flash_fwd_mla_bf16_sm90.cu update gmem 2025-02-25 09:45:19 +08:00
flash_fwd_mla_fp8_sm90.cu update gmem 2025-02-25 09:45:19 +08:00
flash_fwd_mla_kernel.h fix compile 2025-02-27 23:53:23 +08:00
flash_mla_utils.cu fix compile 2025-02-25 21:52:11 +08:00
flash_mla.h update fp8 api 2025-02-26 08:33:25 +08:00
fp8_transpose_v.h fix compile 2025-02-27 23:53:23 +08:00
named_barrier.h add transv barrier 2025-02-26 17:57:00 +08:00
softmax.h Initial commit 2025-02-24 09:20:23 +08:00
static_switch.h Initial commit 2025-02-24 09:20:23 +08:00
utils.h use mm1's Aregs instead of mma0's Cregs 2025-02-27 11:59:17 +08:00