mirror of
https://github.com/deepseek-ai/DeepSeek-V3
synced 2025-06-26 18:17:55 +00:00
* handle missing scale_inv_name Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict. * sort filename to reduce memory costs * Add CUDA cache clearing in memory management Added torch.cuda.empty_cache() to free up unused memory on the GPU, |
||
|---|---|---|
| .. | ||
| configs | ||
| convert.py | ||
| fp8_cast_bf16.py | ||
| generate.py | ||
| kernel.py | ||
| model.py | ||
| requirements.txt | ||