support fp16

This commit is contained in:
Sijia Chen
2025-02-24 01:58:53 -08:00
parent 15a82b81b8
commit 65fb7732fc
7 changed files with 139 additions and 91 deletions

View File

@@ -3,7 +3,7 @@
FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.
Currently released:
- BF16
- BF16, FP16
- Paged kvcache with block size of 64
## Quick start