DeepGEMM

mirror of https://github.com/deepseek-ai/DeepGEMM synced 2025-05-05 22:24:22 +00:00

History

Chenggang Zhao 37aa127451 Use swizzling instead of padding (#86 ) * Add swizzling params * Add TMA D descriptor * Always use STSMx2 * Swizzling draft * Compatible with padding * Fix bugs * Optimize swizzle performance * Optimize expression * Optimize TMA issues * Fix README * Stricter assertions		2025-04-14 15:20:58 +08:00
..
include/deep_gemm	Use swizzling instead of padding (#86 )	2025-04-14 15:20:58 +08:00
jit	Use swizzling instead of padding (#86 )	2025-04-14 15:20:58 +08:00
jit_kernels	Use swizzling instead of padding (#86 )	2025-04-14 15:20:58 +08:00
__init__.py	fix typo	2025-02-26 18:37:22 +08:00
utils.py	Correctly flush L2, as reconstructing the tensors on every iteration effectively put them in the L2, and gave the GPU enough idle time to avoid thermal throttling in a potentially unrealistic way.	2025-03-15 20:46:24 +00:00