DeepGEMM/deep_gemm
Zihua Wu 27cd276e19 [wip] refactor: compile to .cubin
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-22 08:08:40 +00:00
..
include/deep_gemm [wip] refactor: compile to .cubin 2025-04-22 08:08:40 +00:00
jit [wip] refactor: compile to .cubin 2025-04-22 08:08:40 +00:00
jit_kernels [wip] refactor: compile to .cubin 2025-04-22 08:08:40 +00:00
__init__.py fix typo 2025-02-26 18:37:22 +08:00
utils.py Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way. 2025-03-15 20:46:24 +00:00