Instructions to use charent/Phi2-Chinese-0.2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use charent/Phi2-Chinese-0.2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="charent/Phi2-Chinese-0.2B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("charent/Phi2-Chinese-0.2B")
model = AutoModelForMultimodalLM.from_pretrained("charent/Phi2-Chinese-0.2B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use charent/Phi2-Chinese-0.2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "charent/Phi2-Chinese-0.2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "charent/Phi2-Chinese-0.2B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/charent/Phi2-Chinese-0.2B

SGLang

How to use charent/Phi2-Chinese-0.2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "charent/Phi2-Chinese-0.2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "charent/Phi2-Chinese-0.2B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "charent/Phi2-Chinese-0.2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "charent/Phi2-Chinese-0.2B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use charent/Phi2-Chinese-0.2B with Docker Model Runner:
```
docker model run hf.co/charent/Phi2-Chinese-0.2B
```

charent commited on Jan 4, 2024

Commit

7700418

1 Parent(s): a5f918f

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -8,6 +8,15 @@ library_name: transformers
 tags:
 - text-generation-inference
 pipeline_tag: text-generation
 ---
 # Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型
@@ -62,7 +71,8 @@ text = f"##提问:\n{example['instruction']}\n##回答:\n{example['output'][EOS]
 记得添加`EOS`句子结束特殊标记，否则模型`decode`的时候不知道要什么时候停下来。`BOS`句子开始标记可填可不填。
-# 5. 📝dpo偏好优化
 代码：[dpo.ipynb](https://github.com/charent/Phi2-mini-Chinese/blob/main/4.dpo.ipynb)
 根据个人喜好对SFT模型微调，数据集要构造三列`prompt`、`chosen`和 `rejected`，`rejected`这一列有部分数据我是从sft阶段初级模型（比如sft训练4个`epoch`，取0.5个`epoch`检查点的模型）生成，如果生成的`rejected`和`chosen`相似度在0.9以上，则不要这条数据。

 tags:
 - text-generation-inference
 pipeline_tag: text-generation
+widget:
+- text: "##提问:\n感冒了要怎么办？\n##回答:\n"
+  example_title: "感冒了要怎么办？"
+- text: "##提问:\n介绍一下Apple公司\n##回答:\n"
+  example_title: "介绍一下Apple公司"
+- text: "##提问:\n现在外面天气怎么样\n##回答:\n"
+  example_title: "介绍一下Apple公司？"
+- text: "##提问:\n推荐一份可口的午餐\n##回答:\n"
+  example_title: "推荐一份可口的午餐"
 ---
 # Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型
 记得添加`EOS`句子结束特殊标记，否则模型`decode`的时候不知道要什么时候停下来。`BOS`句子开始标记可填可不填。
+# 5. 📝RLHF优化
+本项目使用dpo优化方法
 代码：[dpo.ipynb](https://github.com/charent/Phi2-mini-Chinese/blob/main/4.dpo.ipynb)
 根据个人喜好对SFT模型微调，数据集要构造三列`prompt`、`chosen`和 `rejected`，`rejected`这一列有部分数据我是从sft阶段初级模型（比如sft训练4个`epoch`，取0.5个`epoch`检查点的模型）生成，如果生成的`rejected`和`chosen`相似度在0.9以上，则不要这条数据。