Update README

This commit is contained in:
Chenggang Zhao 2025-03-25 18:13:24 +08:00
parent a5645d7afa
commit 8002b769c0

View File

@ -152,7 +152,7 @@ The [Tensor Memory Accelerator](https://docs.nvidia.com/cuda/hopper-tuning-guide
- TMA load for LHS, LHS scaling factors, and RHS matrices - TMA load for LHS, LHS scaling factors, and RHS matrices
- TMA store for the output matrix - TMA store for the output matrix
- TMA multicast (exclusive to the LHS matrix) - TMA multicast (automatically decide LHS or RHS to broadcast)
- TMA descriptor prefetching - TMA descriptor prefetching
#### Common detail optimizations #### Common detail optimizations