FlashMLA/mla_combine.h at main - FlashMLA - Gitea: Git with a cup of tea

DeepSeek/FlashMLA

mirror of https://github.com/deepseek-ai/FlashMLA synced 2025-05-14 16:45:59 +00:00

Shengyu Liu c2067be3ea

Performance Update (2025.04.22) (#71 )

* Fix benchmark script

* Performance optimization for compute-bound cases

* Add new testcase (s_k = 16384)

* Update README.md

* Update comment

* Update README.md

* Add the deep-dive blog

* Add background color for MLA Kernel Sched.drawio.svg

* Use relative path for the schedule image

* Move flash_mla.h to kernels/params.h

2025-04-22 17:50:57 +08:00

7 lines

149 B

C++

Raw Permalink Blame History

 #pragma once
 #include "params.h"
 template<typename ElementT>
 void run_flash_mla_combine_kernel(Flash_fwd_mla_params &params, cudaStream_t stream);