mirror of
https://github.com/deepseek-ai/FlashMLA
synced 2025-06-26 18:15:54 +00:00
- Changed kNThreads (256) to 128 in NamedBarrier::arrive calls to match the actual number of threads in warp group - Fixed potential deadlock issue where barrier was waiting for more threads than would arrive - Updated both SReady and SoftmaxReady barrier synchronizations |
||
|---|---|---|
| .. | ||
| cutlass@afa1772203 | ||
| flash_api.cpp | ||
| flash_fwd_mla_bf16_sm90.cu | ||
| flash_fwd_mla_fp16_sm90.cu | ||
| flash_fwd_mla_kernel.h | ||
| flash_fwd_mla_metadata.cu | ||
| flash_mla.h | ||
| named_barrier.h | ||
| softmax.h | ||
| static_switch.h | ||
| utils.h | ||