Shengyu Liu
|
c2067be3ea
|
Performance Update (2025.04.22) (#71)
* Fix benchmark script
* Performance optimization for compute-bound cases
* Add new testcase (s_k = 16384)
* Update README.md
* Update comment
* Update README.md
* Add the deep-dive blog
* Add background color for MLA Kernel Sched.drawio.svg
* Use relative path for the schedule image
* Move flash_mla.h to kernels/params.h
|
2025-04-22 17:50:57 +08:00 |
|