Open-Orca/OpenOrca
Viewer • Updated • 2.94M • 25.5k • 1.55k
How to use lumicero/Qwen2.5-bilingual-xlora with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "lumicero/Qwen2.5-bilingual-xlora")How to use lumicero/Qwen2.5-bilingual-xlora with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="lumicero/Qwen2.5-bilingual-xlora")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("lumicero/Qwen2.5-bilingual-xlora", dtype="auto")How to use lumicero/Qwen2.5-bilingual-xlora with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lumicero/Qwen2.5-bilingual-xlora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lumicero/Qwen2.5-bilingual-xlora",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/lumicero/Qwen2.5-bilingual-xlora
How to use lumicero/Qwen2.5-bilingual-xlora with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "lumicero/Qwen2.5-bilingual-xlora" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lumicero/Qwen2.5-bilingual-xlora",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "lumicero/Qwen2.5-bilingual-xlora" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lumicero/Qwen2.5-bilingual-xlora",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use lumicero/Qwen2.5-bilingual-xlora with Docker Model Runner:
docker model run hf.co/lumicero/Qwen2.5-bilingual-xlora
One-paragraph summary of what this repo contains (LLM, LoRA adapter, diffusion model, etc.), what it does, and what makes it different.
Replace placeholders, then copy/paste the relevant section for your artifact type.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
ADAPTER_ID = "<YOUR_ORG/YOUR_ADAPTER_REPO>" # this repo
peft_cfg = PeftConfig.from_pretrained(ADAPTER_ID)
BASE_ID = peft_cfg.base_model_name_or_path
tokenizer = AutoTokenizer.from_pretrained(BASE_ID, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
BASE_ID,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()
prompt = "Write a short Indonesian summary about LoRA adapters:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
print(tokenizer.decode(out[0], skip_special_tokens=True))
# WARNING: Merging changes the weights; verify license compatibility of the base model.
merged = model.merge_and_unload()
merged.save_pretrained("./merged_model", safe_serialization=True)
tokenizer.save_pretrained("./merged_model")
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "<YOUR_ORG/YOUR_MODEL_REPO>" # this repo
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.float16,
device_map="auto",
).eval()
prompt = "Explain Mixture of Experts in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
print(tokenizer.decode(out[0], skip_special_tokens=True))
import torch
from diffusers import DiffusionPipeline
BASE_ID = "<ORG/BASE_DIFFUSION_MODEL>" # e.g., stabilityai/stable-diffusion-xl-base-1.0
REPO_ID = "<YOUR_ORG/YOUR_REPO>" # this repo (full model or LoRA)
dtype = torch.float16
pipe = DiffusionPipeline.from_pretrained(BASE_ID, torch_dtype=dtype).to("cuda")
# If this repo is a LoRA:
# - Upload your weights (often *.safetensors) to this repo
# - Then load them like this:
pipe.load_lora_weights(REPO_ID) # optionally: weight_name="my_lora.safetensors"
# Some pipelines support:
# pipe.fuse_lora()
image = pipe("a cinematic photo of a rainy Jakarta street at night", num_inference_steps=30).images[0]
image.save("sample.png")
Describe what you uploaded and where:
README.md (this file)adapter_config.json, adapter_model.safetensors (or .bin)config.json, model weights (e.g., model.safetensors), tokenizer filesPrimary use cases
Users & contexts
Provide at least one of the following:
| Task | Dataset | Metric | Score |
|---|---|---|---|
Add a few short examples:
If you used or built on prior work, add citations.
@misc{your_model_2025,
title = {<Model Name>},
author = {<Author/Org>},
year = {2025},
howpublished = {\\url{<REPO_URL>}},
}