DeepGEMM

mirror of https://github.com/deepseek-ai/DeepGEMM synced 2025-04-04 19:52:35 +00:00

Author	SHA1	Message	Date
ademeure	6cbff5778f	Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way. The previous behaviour is potentially representative of some use cases (e.g. previous kernel filling L2 with the data in a very specific way) but not standard benchmarking practice.	2025-03-15 20:46:24 +00:00
Chenggang Zhao	39c10e6c31	Revert "Merge pull request #49 from A-transformer/maximum_fp8_e4m3_value" This reverts commit `4d4f2342fe`, reversing changes made to `9d3222a93e`.	2025-03-10 09:47:02 +08:00
A-transformer	629857685e	Maximum representable value in FP8 E4M3 format Replace Hardcoded 448.0 with Global Constant FP8_E4M3_MAX for FP8 E4M3 Format	2025-03-07 19:58:02 +04:00
AcraeaTerpsicore	96b31fd6bb	fix typo	2025-02-26 18:37:22 +08:00
xuzhean	bc989405fe	fix: prevent expected_m from exceeding m in test_core	2025-02-26 16:55:47 +08:00
Chenggang Zhao	a6d97a1c1b	Initial commit	2025-02-25 22:52:41 +08:00