Commit Graph

6 Commits

Author SHA1 Message Date
zihan zhou
d626421fff Fix the bug of fma
Hi, I find in scale_apply_exp2, The code comments also mentioned this issue:
https://github.com/pytorch/pytorch/issues/121558

This issue is that the ffma instruction generates some calculation errors
during the flash attention compared to fadd and fmul separated.

For fadd and fmul, the calculation is:
round_fp32(x_i * scale) - round_fp32(x_i * scale)
For max(x), this value is 0.

But For ffma, the calculation is:
x_i * scale - round_fp32(x_i * scale)
Although the accuracy of ffma calculations has actually improved,
there have been errors in the values.

We can raise this issue by changing the initialization value of q k,
and the final outs will all be 0:

q = torch.full((b, s_q, h_q, d), 133120.0)
blocked_k = torch.full((block_table.numel(), block_size, h_kv, d), 133120.0)

If we define UNFUSE_FMA, This problem has been alleviated, but it still
cannot pass the cal-diff test. I am not sure if it is an accuracy issue,
but I think it is necessary to fix the fma bug first.
2025-03-19 11:11:05 +08:00
ljss
4edea86f9e cuda12.8 recommendation 2025-02-26 00:05:57 +08:00
Sijia Chen
a3b74b8574 add flag to disable FP16 compile 2025-02-24 10:01:59 -08:00
Sijia Chen
65fb7732fc support fp16 2025-02-24 01:58:53 -08:00
程元
e62bdb4d3f support Windows build 2025-02-24 11:29:36 +08:00
Jiashi Li
414a2f3eed Initial commit
i
2025-02-24 09:20:23 +08:00