DeepGEMM/deep_gemm
Zihua Wu 78c7fa347e fix: compiler version
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-23 00:06:18 +00:00
..
include/deep_gemm refactor: compile to .cubin and add NVRTC option 2025-04-22 10:17:52 +00:00
jit fix: compiler version 2025-04-23 00:06:18 +00:00
jit_kernels [wip] refactor: compile to .cubin 2025-04-22 08:08:40 +00:00
__init__.py fix typo 2025-02-26 18:37:22 +08:00
utils.py Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way. 2025-03-15 20:46:24 +00:00