Instructions to use Zaytron40k/Qwen-Image-Edit-2511-Scene2CharRef-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Inference
Qwen-Image-Edit-2511 LoRA - Scene -> Character Reference (reverse)
The reverse of CharSheet2Art: takes a finished illustrated scene and extracts its character(s) as a clean character reference - each character in a neutral standing pose on a plain light-gray studio background, eye-level, preserving identity, art style, outfit and orientation, with the scene/background removed.
Purpose: a moderation-free reference extractor for building synthetic reference->scene datasets without hitting hosted-model safety filters.
Prompt format (same as training):
Using Image 1, create a clean character reference of the character(s) in it: ... neutral pose on a plain light-gray studio background ...
Usage
import torch
from PIL import Image
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
pipe = QwenImagePipeline.from_pretrained(
torch_dtype=torch.bfloat16, device="cuda",
model_configs=[
ModelConfig(model_id="Qwen/Qwen-Image-Edit-2511", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
],
tokenizer_config=None,
processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
pipe.load_lora(pipe.dit, "checkpoints/epoch-4.safetensors")
scene = Image.open("scene.jpeg")
img = pipe("Using Image 1, create a clean character reference of the character in it: "
"the character standing in a neutral, relaxed pose on a plain light-gray "
"studio background at eye level, preserving their exact identity, art style, "
"outfit and the same head and body orientation and framing. Remove the original scene and background.",
edit_image=[scene], seed=0, num_inference_steps=40,
height=1536, width=1024, zero_cond_t=True) # zero_cond_t REQUIRED for 2511
Training config
| base | Qwen/Qwen-Image-Edit-2511 (DiT only) |
| rank / lr | 64 / 5e-5 |
| epochs x steps | 5 x 513 (171 pairs, repeat 3) |
| resolution | dynamic, max_pixels 1048576 (native AR) |
| precision | bf16 + gradient checkpointing |
| special | --zero_cond_t (2511-specific, also required at inference) |
Full args: training_config.json
Loss
| epoch | step | EMA loss min |
|---|---|---|
| 0 | 321 | 0.0359 |
| 1 | 824 | 0.0351 |
| 2 | 1443 | 0.0349 |
| 3 | 2043 | 0.0353 |
| 4 | 2122 | 0.0337 |
| 5 | 2654 | 0.0332 |
| 6 | 3564 | 0.0324 |
| 7 | 4033 | 0.0347 |
| 8 | 4281 | 0.0309 <- global min |
| 9 | 4920 | 0.0329 |
Validation samples
8 held-out scenes x 5 checkpoints in val_samples/ - scenes not seen in training; each generates the extracted reference. Examples (epoch-4):
Dataset sample
One reverse pair in dataset_example/: input scene, target reference, and the prompt (pair.json).
Model tree for Zaytron40k/Qwen-Image-Edit-2511-Scene2CharRef-LoRA
Base model
Qwen/Qwen-Image-Edit-2511





