mirror of
https://github.com/deepseek-ai/DeepSeek-LLM
synced 2025-01-22 10:36:03 +00:00
Update README.md (#5)
This commit is contained in:
parent
cb08b21e0b
commit
20d316b940
11
README.md
11
README.md
@ -324,6 +324,17 @@ python convert-hf-to-gguf.py <MODEL_PATH> --outfile <GGUF_PATH> --model-name dee
|
|||||||
|
|
||||||
`UPDATE:`[exllamav2](https://github.com/turboderp/exllamav2) has been able to support HuggingFace Tokenizer. Please pull the latest version and try out.
|
`UPDATE:`[exllamav2](https://github.com/turboderp/exllamav2) has been able to support HuggingFace Tokenizer. Please pull the latest version and try out.
|
||||||
|
|
||||||
|
### GPU Memory Usage
|
||||||
|
|
||||||
|
We profile the peak memory usage of inference for 7B and 67B models at different batch size and sequence length settings.
|
||||||
|
|
||||||
|
For DeepSeek LLM 7B, we utilize **1 NVIDIA A100-PCIE-40GB GPU** for inference.
|
||||||
|
|
||||||
|
<table><thead><tr><th rowspan="2">Batch Size</th><th colspan="5">Sequence Length</th></tr><tr><th>256</th><th>512</th><th>1024</th><th>2048</th><th>4096</th></tr></thead><tbody><tr><td>1</td><td>13.29 GB</td><td>13.63 GB</td><td>14.47 GB</td><td>16.37 GB</td><td>21.25 GB</td></tr><tr><td>2</td><td>13.63 GB</td><td>14.39 GB</td><td>15.98 GB</td><td>19.82 GB</td><td>29.59 GB</td></tr><tr><td>4</td><td>14.47 GB</td><td>15.82 GB</td><td>19.04 GB</td><td>26.65 GB</td><td>OOM</td></tr><tr><td>8</td><td>15.99 GB</td><td>18.71 GB</td><td>25.14 GB</td><td>35.19 GB</td><td>OOM</td></tr><tr><td>16</td><td>19.06 GB</td><td>24.52 GB</td><td>37.28 GB</td><td>OOM</td><td>OOM</td></tr></tbody></table>
|
||||||
|
|
||||||
|
For DeepSeek LLM 67B, we utilize **8 NVIDIA A100-PCIE-40GB GPUs** for inference.
|
||||||
|
|
||||||
|
<table><thead><tr><th rowspan="2">Batch Size</th><th colspan="5">Sequence Length</th></tr><tr><th>256</th><th>512</th><th>1024</th><th>2048</th><th>4096</th></tr></thead><tbody><tr><td>1</td><td>16.92 GB</td><td>17.11 GB</td><td>17.66 GB</td><td>20.01 GB</td><td>33.23 GB</td></tr><tr><td>2</td><td>17.04 GB</td><td>17.28 GB</td><td>18.55 GB</td><td>25.27 GB</td><td>OOM</td></tr><tr><td>4</td><td>17.20 GB</td><td>17.80 GB</td><td>21.28 GB</td><td>33.71 GB</td><td>OOM</td></tr><tr><td>8</td><td>17.59 GB</td><td>19.25 GB</td><td>25.69 GB</td><td>OOM</td><td>OOM</td></tr><tr><td>16</td><td>18.17 GB</td><td>21.69 GB</td><td>34.54 GB</td><td>OOM</td><td>OOM</td></tr></tbody></table>
|
||||||
|
|
||||||
## 7. Limitation
|
## 7. Limitation
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user