DeepSeek-V3

mirror of https://github.com/deepseek-ai/DeepSeek-V3 synced 2025-06-26 18:17:55 +00:00

Files

Yang Wang 8f1c9488b5 handle missing scale_inv_name (#2 )

* handle missing scale_inv_name

Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

* sort filename to reduce memory costs

* Add CUDA cache clearing in memory management

Added torch.cuda.empty_cache() to free up unused memory on the GPU,

2024-12-27 09:34:38 +08:00

configs

Release DeepSeek-V3

2024-12-26 19:01:57 +08:00

convert.py

Release DeepSeek-V3

2024-12-26 19:01:57 +08:00

fp8_cast_bf16.py

handle missing scale_inv_name (#2 )

2024-12-27 09:34:38 +08:00

generate.py

Release DeepSeek-V3

2024-12-26 19:01:57 +08:00

kernel.py

Release DeepSeek-V3

2024-12-26 19:01:57 +08:00

model.py

Release DeepSeek-V3

2024-12-26 19:01:57 +08:00

requirements.txt

Release DeepSeek-V3

2024-12-26 19:01:57 +08:00