mirror of
https://github.com/deepseek-ai/DeepGEMM
synced 2025-04-22 04:44:13 +00:00
Update README
This commit is contained in:
parent
a5645d7afa
commit
8002b769c0
@ -152,7 +152,7 @@ The [Tensor Memory Accelerator](https://docs.nvidia.com/cuda/hopper-tuning-guide
|
|||||||
|
|
||||||
- TMA load for LHS, LHS scaling factors, and RHS matrices
|
- TMA load for LHS, LHS scaling factors, and RHS matrices
|
||||||
- TMA store for the output matrix
|
- TMA store for the output matrix
|
||||||
- TMA multicast (exclusive to the LHS matrix)
|
- TMA multicast (automatically decide LHS or RHS to broadcast)
|
||||||
- TMA descriptor prefetching
|
- TMA descriptor prefetching
|
||||||
|
|
||||||
#### Common detail optimizations
|
#### Common detail optimizations
|
||||||
|
Loading…
Reference in New Issue
Block a user