mirror of
https://github.com/deepseek-ai/DeepSeek-V3
synced 2025-01-22 12:25:30 +00:00
8f1c9488b5
* handle missing scale_inv_name Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict. * sort filename to reduce memory costs * Add CUDA cache clearing in memory management Added torch.cuda.empty_cache() to free up unused memory on the GPU, |
||
---|---|---|
.. | ||
configs | ||
convert.py | ||
fp8_cast_bf16.py | ||
generate.py | ||
kernel.py | ||
model.py | ||
requirements.txt |