Update README.md

2025-06-26 18:15:54 +00:00 · 2025-04-22 18:01:09 +08:00 · 2025-04-22 18:01:09 +08:00 · 60438c1ee4
commit 60438c1ee4
parent c2067be3ea
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -4,7 +4,7 @@

 We're excited to announce the new release of Flash MLA, which delivers 5% ~ 15% performance improvement on compute-bound workloads, achieving up to 660 TFlops on NVIDIA H800 SXM5 GPUs. The interface of the new version is fully compatible with the old one. Just switch to the new version and enjoy the instant speedup! 🚀🚀🚀

-Besides, we'd love to share the technical details behind the new kernel! Check out our deep-dive write-up here: <LINK>
+Besides, we'd love to share the technical details behind the new kernel! Check out our deep-dive write-up [here](docs/20250422-new-kernel-deep-dive.md).

 The new kernel primarily targets compute-intensive settings (where the number of q heads $\times$ the number of q tokens per request (if MTP is disabled then it's 1) $\ge 64$). For memory-bound cases, we recommend using version [b31bfe7](https://github.com/deepseek-ai/FlashMLA/tree/b31bfe72a83ea205467b3271a5845440a03ed7cb) for optimal performance.