DeepGEMM

mirror of https://github.com/deepseek-ai/DeepGEMM synced 2025-04-03 13:10:43 +00:00

History

ademeure 6cbff5778f Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way. The previous behaviour is potentially representative of some use cases (e.g. previous kernel filling L2 with the data in a very specific way) but not standard benchmarking practice.		2025-03-15 20:46:24 +00:00
..
test_core.py	Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way.	2025-03-15 20:46:24 +00:00
test_jit.py	Initial commit	2025-02-25 22:52:41 +08:00