Update README.md

2024-02-04 18:39:12 +08:00 · 2024-02-04 18:39:12 +08:00 · c1bb6d15a5
parent b22ca95e2b
commit c1bb6d15a5
1 changed files with 55 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -328,6 +328,57 @@ The reproducible code for the following evaluation results can be found in the [
 #### 4) Program-Aid Math Reasoning Benchmark
 ![Math](pictures/Math.png)
 ### Inference with vLLM
 You can also employ [vLLM](https://github.com/vllm-project/vllm) for high-throughput inference.
 **Text Completion**
 ```python
 from vllm import LLM, SamplingParams
 tp_size = 4 # Tensor Parallelism
 sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
 model_name = "deepseek-ai/deepseek-coder-6.7b-base"
 llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
 prompts = [
    "If everyone in a country loves one another,",
    "The research should also focus on the technologies",
    "To determine if the label is correct, we need to"
 ]
 outputs = llm.generate(prompts, sampling_params)
 generated_text = [output.outputs[0].text for output in outputs]
 print(generated_text)
 ```
 **Chat Completion**
 ```python
 from transformers import AutoTokenizer
 from vllm import LLM, SamplingParams
 tp_size = 4 # Tensor Parallelism
 sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
 model_name = "deepseek-ai/deepseek-coder-6.7b-instruct"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
 messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "What can you do?"}],
    [{"role": "user", "content": "Explain Transformer briefly."}],
 ]
 prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]
 sampling_params.stop = [tokenizer.eos_token]
 outputs = llm.generate(prompts, sampling_params)
 generated_text = [output.outputs[0].text for output in outputs]
 print(generated_text)
 ```
 ### 7. Q&A
 #### Could You Provide the tokenizer.model File for Model Quantization?
@ -359,6 +410,10 @@ python convert-hf-to-gguf.py <MODEL_PATH> --outfile <GGUF_PATH> --model-name dee
 Remember to set RoPE scaling to 4 for correct output, more discussion could be found in this [PR](https://github.com/turboderp/exllamav2/pull/189).
 #### How to use the deepseek-coder-instruct to complete the code?
 Although the deepseek-coder-instruct models are not specifically trained for code completion tasks during supervised fine-tuning (SFT), they retain the capability to perform code completion effectively. To enable this functionality, you simply need to adjust the eos_token_id parameter. Set the eos_token_id to 32014, as opposed to its default value of 32021 in the deepseek-coder-instruct configuration. This modification prompts the model to recognize the end of a sequence differently, thereby facilitating code completion tasks.
 ### 8. Resources
 [awesome-deepseek-coder](https://github.com/deepseek-ai/awesome-deepseek-coder) is a curated list of open-source projects related to DeepSeek Coder.