From c7996e951d37e311057904dfdc0396fc827a704c Mon Sep 17 00:00:00 2001 From: Shengyu Liu Date: Tue, 22 Apr 2025 17:01:49 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 160f5e6..6de1640 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ We're excited to announce the new release of Flash MLA, which delivers 5% ~ 15% Besides, we'd love to share the technical details behind the new kernel! Check out our deep-dive write-up here: -The new kernel primarily targets compute-intensive settings (where the number of q heads $\times$ the number of q sequences per request (if MTP is disabled then it's 1) $\ge 64$). For memory-bound cases, we recommend using version [b31bfe7](https://github.com/deepseek-ai/FlashMLA/tree/b31bfe72a83ea205467b3271a5845440a03ed7cb) for optimal performance. +The new kernel primarily targets compute-intensive settings (where the number of q heads $\times$ the number of q tokens per request (if MTP is disabled then it's 1) $\ge 64$). For memory-bound cases, we recommend using version [b31bfe7](https://github.com/deepseek-ai/FlashMLA/tree/b31bfe72a83ea205467b3271a5845440a03ed7cb) for optimal performance. ## Introduction