Performance Update (2025.04.22) (#71)

* Fix benchmark script * Performance optimization for compute-bound cases * Add new testcase (s_k = 16384) * Update README.md * Update comment * Update README.md * Add the deep-dive blog * Add background color for MLA Kernel Sched.drawio.svg * Use relative path for the schedule image * Move flash_mla.h to kernels/params.h
2025-06-26 18:15:54 +00:00 · 2025-04-22 17:50:57 +08:00
parent b31bfe72a8
commit c2067be3ea
25 changed files with 2757 additions and 1228 deletions
--- a/flash_mla/flash_mla_interface.py
+++ b/flash_mla/flash_mla_interface.py
@@ -55,7 +55,6 @@ def flash_mla_with_kvcache(
    out, softmax_lse = flash_mla_cuda.fwd_kvcache_mla(
        q,
        k_cache,
-        None,
        head_dim_v,
        cache_seqlens,
        block_table,