mirror of
https://github.com/deepseek-ai/FlashMLA
synced 2025-06-26 18:15:54 +00:00
- Changed kNThreads (256) to 128 in NamedBarrier::arrive calls to match the actual number of threads in warp group - Fixed potential deadlock issue where barrier was waiting for more threads than would arrive - Updated both SReady and SoftmaxReady barrier synchronizations