DeepGEMM

mirror of https://github.com/deepseek-ai/DeepGEMM synced 2025-06-26 23:15:49 +00:00

History

Zhean Xu 4499c4ccbb Refactor MMA template with CUTLASS (#87 ) * Refactor MMA with cutlass * Update README.md --------- Co-authored-by: Zhean Xu <xza@deepseek.com>		2025-04-14 17:06:49 +08:00
..
include/deep_gemm	Refactor MMA template with CUTLASS (#87 )	2025-04-14 17:06:49 +08:00
jit	Use swizzling instead of padding (#86 )	2025-04-14 15:20:58 +08:00
jit_kernels	Use swizzling instead of padding (#86 )	2025-04-14 15:20:58 +08:00
__init__.py	fix typo	2025-02-26 18:37:22 +08:00
utils.py	Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way.	2025-03-15 20:46:24 +00:00