Commit Graph

6 Commits

Author SHA1 Message Date
ademeure
6cbff5778f Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way.
The previous behaviour is potentially representative of some use cases (e.g. previous kernel filling L2 with the data in a very specific way) but not standard benchmarking practice.
2025-03-15 20:46:24 +00:00
Chenggang Zhao
39c10e6c31 Revert "Merge pull request #49 from A-transformer/maximum_fp8_e4m3_value"
This reverts commit 4d4f2342fe, reversing
changes made to 9d3222a93e.
2025-03-10 09:47:02 +08:00
A-transformer
629857685e
Maximum representable value in FP8 E4M3 format
Replace Hardcoded 448.0 with Global Constant FP8_E4M3_MAX for FP8 E4M3 Format
2025-03-07 19:58:02 +04:00
AcraeaTerpsicore
96b31fd6bb
fix typo 2025-02-26 18:37:22 +08:00
xuzhean
bc989405fe fix: prevent expected_m from exceeding m in test_core 2025-02-26 16:55:47 +08:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00