Instructions to use charent/Phi2-Chinese-0.2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use charent/Phi2-Chinese-0.2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="charent/Phi2-Chinese-0.2B")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("charent/Phi2-Chinese-0.2B") model = AutoModelForMultimodalLM.from_pretrained("charent/Phi2-Chinese-0.2B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use charent/Phi2-Chinese-0.2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "charent/Phi2-Chinese-0.2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "charent/Phi2-Chinese-0.2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/charent/Phi2-Chinese-0.2B
- SGLang
How to use charent/Phi2-Chinese-0.2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "charent/Phi2-Chinese-0.2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "charent/Phi2-Chinese-0.2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "charent/Phi2-Chinese-0.2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "charent/Phi2-Chinese-0.2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use charent/Phi2-Chinese-0.2B with Docker Model Runner:
docker model run hf.co/charent/Phi2-Chinese-0.2B
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,6 +8,15 @@ library_name: transformers
|
|
| 8 |
tags:
|
| 9 |
- text-generation-inference
|
| 10 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
# Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型
|
| 13 |
|
|
@@ -62,7 +71,8 @@ text = f"##提问:\n{example['instruction']}\n##回答:\n{example['output'][EOS]
|
|
| 62 |
记得添加`EOS`句子结束特殊标记,否则模型`decode`的时候不知道要什么时候停下来。`BOS`句子开始标记可填可不填。
|
| 63 |
|
| 64 |
|
| 65 |
-
# 5. 📝
|
|
|
|
| 66 |
代码:[dpo.ipynb](https://github.com/charent/Phi2-mini-Chinese/blob/main/4.dpo.ipynb)
|
| 67 |
|
| 68 |
根据个人喜好对SFT模型微调,数据集要构造三列`prompt`、`chosen`和 `rejected`,`rejected`这一列有部分数据我是从sft阶段初级模型(比如sft训练4个`epoch`,取0.5个`epoch`检查点的模型)生成,如果生成的`rejected`和`chosen`相似度在0.9以上,则不要这条数据。
|
|
|
|
| 8 |
tags:
|
| 9 |
- text-generation-inference
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
+
widget:
|
| 12 |
+
- text: "##提问:\n感冒了要怎么办?\n##回答:\n"
|
| 13 |
+
example_title: "感冒了要怎么办?"
|
| 14 |
+
- text: "##提问:\n介绍一下Apple公司\n##回答:\n"
|
| 15 |
+
example_title: "介绍一下Apple公司"
|
| 16 |
+
- text: "##提问:\n现在外面天气怎么样\n##回答:\n"
|
| 17 |
+
example_title: "介绍一下Apple公司?"
|
| 18 |
+
- text: "##提问:\n推荐一份可口的午餐\n##回答:\n"
|
| 19 |
+
example_title: "推荐一份可口的午餐"
|
| 20 |
---
|
| 21 |
# Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型
|
| 22 |
|
|
|
|
| 71 |
记得添加`EOS`句子结束特殊标记,否则模型`decode`的时候不知道要什么时候停下来。`BOS`句子开始标记可填可不填。
|
| 72 |
|
| 73 |
|
| 74 |
+
# 5. 📝RLHF优化
|
| 75 |
+
本项目使用dpo优化方法
|
| 76 |
代码:[dpo.ipynb](https://github.com/charent/Phi2-mini-Chinese/blob/main/4.dpo.ipynb)
|
| 77 |
|
| 78 |
根据个人喜好对SFT模型微调,数据集要构造三列`prompt`、`chosen`和 `rejected`,`rejected`这一列有部分数据我是从sft阶段初级模型(比如sft训练4个`epoch`,取0.5个`epoch`检查点的模型)生成,如果生成的`rejected`和`chosen`相似度在0.9以上,则不要这条数据。
|