1.首先去魔塔社区下载量化后的llama3模型
git clone https://www.modelscope.cn/huangjintao/Meta-Llama-3-8B-Instruct-AWQ.git
2.跑起来模型
1)python -m vllm.entrypoints.openai.api_server --model /home/cxh/Meta-Llama-3-8B-Instruct-AWQ --dtype auto --api-key token-abc123
2)from openai import OpenAI
 client = OpenAI(
     base_url="http://localhost:8000/v1",
     api_key="token-abc123",
 )
completion = client.chat.completions.create(
   model="Meta-Llama-3-8B-Instruct",
   messages=[
     {"role": "user", "content": "Hello!"}
   ]
 )
print(completion.choices[0].message)
