FlashMLA/csrc/kernels
Shengyu Liu c2067be3ea
Performance Update (2025.04.22) (#71)
* Fix benchmark script

* Performance optimization for compute-bound cases

* Add new testcase (s_k = 16384)

* Update README.md

* Update comment

* Update README.md

* Add the deep-dive blog

* Add background color for MLA Kernel Sched.drawio.svg

* Use relative path for the schedule image

* Move flash_mla.h to kernels/params.h
2025-04-22 17:50:57 +08:00
..
config.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
get_mla_metadata.cu Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
get_mla_metadata.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
mla_combine.cu Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
mla_combine.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
params.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
splitkv_mla.cu Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
splitkv_mla.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
traits.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00
utils.h Performance Update (2025.04.22) (#71) 2025-04-22 17:50:57 +08:00