FlashMLA/csrc
IshanaSabrish 927eebc10f fix: Update named barrier thread count to match actual participating threads
- Changed kNThreads (256) to 128 in NamedBarrier::arrive calls to match the actual number of threads in warp group
- Fixed potential deadlock issue where barrier was waiting for more threads than would arrive
- Updated both SReady and SoftmaxReady barrier synchronizations
2025-03-01 21:18:05 +05:30
..
cutlass@afa1772203 Initial commit 2025-02-24 09:20:23 +08:00
flash_api.cpp add flag to disable FP16 compile 2025-02-24 10:01:59 -08:00
flash_fwd_mla_bf16_sm90.cu Initial commit 2025-02-24 09:20:23 +08:00
flash_fwd_mla_fp16_sm90.cu support fp16 2025-02-24 01:58:53 -08:00
flash_fwd_mla_kernel.h fix: Update named barrier thread count to match actual participating threads 2025-03-01 21:18:05 +05:30
flash_fwd_mla_metadata.cu support fp16 2025-02-24 01:58:53 -08:00
flash_mla.h Initial commit 2025-02-24 09:20:23 +08:00
named_barrier.h Initial commit 2025-02-24 09:20:23 +08:00
softmax.h Initial commit 2025-02-24 09:20:23 +08:00
static_switch.h Initial commit 2025-02-24 09:20:23 +08:00
utils.h Initial commit 2025-02-24 09:20:23 +08:00