Qwen-Image-Edit-2511 LoRA - Scene -> Character Reference (reverse)

The reverse of CharSheet2Art: takes a finished illustrated scene and extracts its character(s) as a clean character reference - each character in a neutral standing pose on a plain light-gray studio background, eye-level, preserving identity, art style, outfit and orientation, with the scene/background removed.

Purpose: a moderation-free reference extractor for building synthetic reference->scene datasets without hitting hosted-model safety filters.

Prompt format (same as training): Using Image 1, create a clean character reference of the character(s) in it: ... neutral pose on a plain light-gray studio background ...

Usage

import torch
from PIL import Image
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16, device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image-Edit-2511", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=None,
    processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
pipe.load_lora(pipe.dit, "checkpoints/epoch-4.safetensors")
scene = Image.open("scene.jpeg")
img = pipe("Using Image 1, create a clean character reference of the character in it: "
           "the character standing in a neutral, relaxed pose on a plain light-gray "
           "studio background at eye level, preserving their exact identity, art style, "
           "outfit and the same head and body orientation and framing. Remove the original scene and background.",
           edit_image=[scene], seed=0, num_inference_steps=40,
           height=1536, width=1024, zero_cond_t=True)  # zero_cond_t REQUIRED for 2511

Training config


base	Qwen/Qwen-Image-Edit-2511 (DiT only)
rank / lr	64 / 5e-5
epochs x steps	5 x 513 (171 pairs, repeat 3)
resolution	dynamic, max_pixels 1048576 (native AR)
precision	bf16 + gradient checkpointing
special	`--zero_cond_t` (2511-specific, also required at inference)

Full args: training_config.json

Loss

epoch	step	EMA loss min
0	321	0.0359
1	824	0.0351
2	1443	0.0349
3	2043	0.0353
4	2122	0.0337
5	2654	0.0332
6	3564	0.0324
7	4033	0.0347
8	4281	0.0309 <- global min
9	4920	0.0329

Validation samples

8 held-out scenes x 5 checkpoints in val_samples/ - scenes not seen in training; each generates the extracted reference. Examples (epoch-4):

Dataset sample

One reverse pair in dataset_example/: input scene, target reference, and the prompt (pair.json).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Zaytron40k/Qwen-Image-Edit-2511-Scene2CharRef-LoRA

Base model

Qwen/Qwen-Image-Edit-2511

Adapter

(139)

this model