Instructions to use lumicero/Qwen2.5-bilingual-xlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lumicero/Qwen2.5-bilingual-xlora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "lumicero/Qwen2.5-bilingual-xlora")

Transformers

How to use lumicero/Qwen2.5-bilingual-xlora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lumicero/Qwen2.5-bilingual-xlora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("lumicero/Qwen2.5-bilingual-xlora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lumicero/Qwen2.5-bilingual-xlora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lumicero/Qwen2.5-bilingual-xlora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lumicero/Qwen2.5-bilingual-xlora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lumicero/Qwen2.5-bilingual-xlora

SGLang

How to use lumicero/Qwen2.5-bilingual-xlora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lumicero/Qwen2.5-bilingual-xlora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lumicero/Qwen2.5-bilingual-xlora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lumicero/Qwen2.5-bilingual-xlora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lumicero/Qwen2.5-bilingual-xlora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lumicero/Qwen2.5-bilingual-xlora with Docker Model Runner:
```
docker model run hf.co/lumicero/Qwen2.5-bilingual-xlora
```

<Model / Adapter Name>

One-paragraph summary of what this repo contains (LLM, LoRA adapter, diffusion model, etc.), what it does, and what makes it different.

Highlights

<Key capability 1>
<Key capability 2>
<Key constraint or requirement (e.g., base model needed, GPU recommended)>

Quickstart

Replace placeholders, then copy/paste the relevant section for your artifact type.

Option A — Use as a LoRA / PEFT Adapter (Transformers + PEFT)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

ADAPTER_ID = "<YOUR_ORG/YOUR_ADAPTER_REPO>"      # this repo
peft_cfg = PeftConfig.from_pretrained(ADAPTER_ID)
BASE_ID = peft_cfg.base_model_name_or_path

tokenizer = AutoTokenizer.from_pretrained(BASE_ID, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE_ID,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

prompt = "Write a short Indonesian summary about LoRA adapters:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)

print(tokenizer.decode(out[0], skip_special_tokens=True))

Optional: Merge adapter into the base model (produce a standalone merged model)

# WARNING: Merging changes the weights; verify license compatibility of the base model.
merged = model.merge_and_unload()
merged.save_pretrained("./merged_model", safe_serialization=True)
tokenizer.save_pretrained("./merged_model")

Option B — Use as a Full LLM (Transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "<YOUR_ORG/YOUR_MODEL_REPO>"  # this repo
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
).eval()

prompt = "Explain Mixture of Experts in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)

print(tokenizer.decode(out[0], skip_special_tokens=True))

Option C — Use as a Diffusion Model / LoRA (Diffusers)

import torch
from diffusers import DiffusionPipeline

BASE_ID = "<ORG/BASE_DIFFUSION_MODEL>"           # e.g., stabilityai/stable-diffusion-xl-base-1.0
REPO_ID = "<YOUR_ORG/YOUR_REPO>"                 # this repo (full model or LoRA)
dtype = torch.float16

pipe = DiffusionPipeline.from_pretrained(BASE_ID, torch_dtype=dtype).to("cuda")

# If this repo is a LoRA:
# - Upload your weights (often *.safetensors) to this repo
# - Then load them like this:
pipe.load_lora_weights(REPO_ID)                  # optionally: weight_name="my_lora.safetensors"
# Some pipelines support:
# pipe.fuse_lora()

image = pipe("a cinematic photo of a rainy Jakarta street at night", num_inference_steps=30).images[0]
image.save("sample.png")

What’s in this repository?

Describe what you uploaded and where:

README.md (this file)
For PEFT adapters: adapter_config.json, adapter_model.safetensors (or .bin)
For full LLMs: config.json, model weights (e.g., model.safetensors), tokenizer files
For diffusion: model weights and scheduler/config files, plus example images (optional)

Model details

Model type

Artifact: <LoRA adapter | merged model | full model | diffusion model | diffusion LoRA>
Architecture: <e.g., Transformer decoder-only | U-Net | DiT | VAE>
Base model: <ORG/MODEL> (if applicable)
Languages: <en, id, ...>
License: <...>

Intended use

Primary use cases

<use case 1>
<use case 2>

Users & contexts

<recommended guardrails (if any)>

Out-of-scope use

<Clearly list disallowed / not recommended uses>

Training

Training data

Source(s): <datasets / corpora / synthetic data details>
Data filtering / processing: <dedup, profanity filtering, caption cleaning, etc.>
Known dataset limitations: <coverage gaps, language imbalance, etc.>

Training procedure

Objective: <SFT | DPO | fine-tune | LoRA fine-tune | DreamBooth | etc.>
Key hyperparameters:
- epochs: <...>
- batch size: <...>
- learning rate: <...>
- max sequence length / image size: <...>
- LoRA config (if applicable): r=<...>, alpha=<...>, dropout=<...>, target_modules=<...>
Compute:
- GPUs: <type/count>
- mixed precision: <fp16/bf16>
- training time: <...>
Reproducibility:
- seed(s): <...>
- training code:

Evaluation

Provide at least one of the following:

Automatic metrics (BLEU/ROUGE/BERTScore, perplexity, FID/CLIP score, etc.)
Human evaluation (protocol + summary)
Task-based qualitative examples

Results

Task	Dataset	Metric	Score

Example outputs

Add a few short examples:

Prompt: “...”
Output: “...”

How to cite

If you used or built on prior work, add citations.

@misc{your_model_2025,
  title        = {<Model Name>},
  author       = {<Author/Org>},
  year         = {2025},
  howpublished = {\\url{<REPO_URL>}},
}

Downloads last month: 4

Model tree for lumicero/Qwen2.5-bilingual-xlora

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(602)

this model

lumicero
/

Qwen2.5-bilingual-xlora