frnka/dmps
Viewer • Updated • 760 • 2
How to use frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards", dtype="auto")How to use frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards
How to use frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards with Docker Model Runner:
docker model run hf.co/frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards
docker model run hf.co/frnka/Qwen2.5-14B-Instruct-dmp-generate-backwardsPEFT Weigths for Qwen/Qwen2.5-14B-Instruct. Finetuned for the task of generating the preceding sentence of a Data Management Plans.
Model loading:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE_MODEL_NAME = 'Qwen/Qwen2.5-14B-Instruct'
PEFT_MODEL_NAME = 'frnka/qwen14b-backwards-peft'
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_NAME,
device_map="auto",
torch_dtype=torch.float16,
output_attentions=True,
return_dict_in_generate=True,
)
model = PeftModel.from_pretrained(base_model, PEFT_MODEL_NAME).cuda()
And inference:
def message_generic():
return (f"You are Data management plan expert. "
f"Please generate a sentence preceding the following Data Management Plan snippet. ")
def message_specific(topic):
return message_generic() + f"You may talk about '{topic}'"
topic_to_talk_about = "How will the data be stored?"
context = "Some part of a DMP that we want to generate the previous sentence for."
messages = [
{"role": "system",
"content": message_specific(topic_to_talk_about)}, # or message_generic()
{"role": "user", "content": context},
]
with torch.no_grad():
tokenized = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
input_ids = tokenized['input_ids'].cuda()
output = model.generate(
input_ids,
attention_mask=tokenized['attention_mask'].cuda(),
max_new_tokens=200,
num_return_sequences=1,
do_sample=True,
temperature=1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
use_cache=True,
)
answer_ids = output[0][len(input_ids[0]):]
generated_text = tokenizer.decode(answer_ids, skip_special_tokens=True)
print(generated_text + context)
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "frnka/Qwen2.5-14B-Instruct-dmp-generate-backwards", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'