DeepGEMM/deep_gemm
ademeure 6cbff5778f Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way.
The previous behaviour is potentially representative of some use cases (e.g. previous kernel filling L2 with the data in a very specific way) but not standard benchmarking practice.
2025-03-15 20:46:24 +00:00
..
include/deep_gemm Add some notes for promotion 2025-03-04 11:42:20 +08:00
jit Code format 2025-03-11 13:26:10 +08:00
jit_kernels Merge pull request #65 from Z-NAVY/main 2025-03-14 13:50:08 +08:00
__init__.py fix typo 2025-02-26 18:37:22 +08:00
utils.py Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way. 2025-03-15 20:46:24 +00:00