mirror of
https://github.com/deepseek-ai/FlashMLA
synced 2025-05-14 16:45:59 +00:00
* Fix benchmark script * Performance optimization for compute-bound cases * Add new testcase (s_k = 16384) * Update README.md * Update comment * Update README.md * Add the deep-dive blog * Add background color for MLA Kernel Sched.drawio.svg * Use relative path for the schedule image * Move flash_mla.h to kernels/params.h
7 lines
149 B
C++
7 lines
149 B
C++
#pragma once
|
|
|
|
#include "params.h"
|
|
|
|
template<typename ElementT>
|
|
void run_flash_mla_combine_kernel(Flash_fwd_mla_params ¶ms, cudaStream_t stream);
|