Instructions to use mudasir13cs/qwen25-vl-3b-floorplan-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use mudasir13cs/qwen25-vl-3b-floorplan-grpo with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct") model = PeftModel.from_pretrained(base_model, "mudasir13cs/qwen25-vl-3b-floorplan-grpo") - Notebooks
- Google Colab
- Kaggle
Qwen2.5-VL floor plan GRPO adapter (stage 2)
Hub: mudasir13cs/qwen25-vl-3b-floorplan-grpo
Improved using Qwen — LoRA adapter continuing from mudasir13cs/qwen25-vl-3b-floorplan-sft; trained with GRPO and geometric rewards. Base checkpoint: Qwen2.5-VL-3B-Instruct (LICENSE).
Intended non-commercial / research use, consistent with CubiCasa5K (CC BY‑NC 4.0) and the Qwen research license.
Paper & upstream training material
- Method: FloorplanVLM (arXiv:2602.06507)
- Original Hub collection: manitocross/floorplan-vlm-training — readme, CubiCasa5K wiring, GRPO overview, JSON schema.
- GRPO recipe (canonical source files on Hub):
train_floorplan_grpo.py— rewardR = 0.1·R_val + 0.5·R_ext + α·0.4·R_int;SFT_MODEL_ID,HUB_MODEL_ID, andOUTPUT_DIRare edited at the top of that script. - SFT stage (prompts, dataset → JSON targets):
train_floorplan_vlm.py.
If you are working inside a clone of your training repo, the same files may exist locally beside this folder (relative paths).
Quick install
pip install torch torchvision transformers trl peft accelerate pillow
(Training also uses Shapely, datasets, numpy; inference does not strictly need Shapely.)
Loading the adapter
Use Hub IDs (recommended):
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
BASE = "Qwen/Qwen2.5-VL-3B-Instruct"
ADAPTER = "mudasir13cs/qwen25-vl-3b-floorplan-grpo"
processor = AutoProcessor.from_pretrained(BASE)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
BASE, torch_dtype="auto", device_map="auto"
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
If you cloned weights into this folder locally: set ADAPTER = "./floorplan-vlm-grpo" (or an absolute path) instead of the Hub repo id.
Using the model (inference)
For best alignment with training, reuse the system prompt that includes the full JSON schema from stage 1 (see SYSTEM_PROMPT in train_floorplan_vlm.py). GRPO training uses a shorter system string in train_floorplan_grpo.py; either works, but the schema-explicit SFT prompt below usually yields more stable JSON.
User text (same as both scripts): “Vectorize this floor plan into structured JSON with all walls, doors, windows, and rooms.”
Minimal pattern (aligned with the inference test in train_floorplan_vlm.py):
import json, re, torch
from PIL import Image
# Same strings as train_floorplan_vlm.py (schema-in-the-prompt; recommended for decoding).
SYSTEM_PROMPT = (
"You are a floor plan vectorization expert. Extract wall, door, window geometry "
"from floor plan images into structured JSON.\n\n"
"Output ONLY valid JSON with this schema:\n"
'{"walls":[{"id":"wall_N","start":[x,y],"end":[x,y],"thickness":T,"curvature":0,'
'"openings":[{"type":"door"|"window","center":D,"width":W}]}],'
'"rooms":[{"label":"room_type","walls":["wall_N",...]}]}\n\n'
"Coordinates normalized so longer image edge = 1024."
)
USER_PROMPT = "Vectorize this floor plan into structured JSON with all walls, doors, windows, and rooms."
image = Image.open("plan.png").convert("RGB")
messages = [
{"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": USER_PROMPT}]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True)
inputs = {k: v.to(model.device) if hasattr(v, "to") else v for k, v in inputs.items()}
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
raw = processor.batch_decode(out[:, inputs.input_ids.shape[1] :], skip_special_tokens=True)[0]
m = re.search(r"\{[\s\S]*\}", raw)
plan = json.loads(m.group()) if m else None
Output shape: top-level walls (with optional openings) and rooms. Example JSON shape is spelled out under Output JSON Schema in the manitocross training README.
Reproducing stage 2
- Finish or download stage 1:
mudasir13cs/qwen25-vl-3b-floorplan-sft. - Run
train_floorplan_grpo.pyfrom a checkout with the environment described there (CubiCasa5K under./cubicasa_data,huggingface-cli loginifPUSH_TO_HUB = True; config block at top of file).
Citation
@article{floorplanvlm2026,
title={FloorplanVLM: A Vision-Language Model for Floorplan Vectorization},
journal={arXiv preprint arXiv:2602.06507},
year={2026}
}
Acknowledgments
- FloorplanVLM (arXiv:2602.06507)
- CubiCasa5K (arXiv:1904.01920)
- Qwen2.5-VL-3B-Instruct
- Upstream training reference: manitocross/floorplan-vlm-training
- Stage 1 adapter: mudasir13cs/qwen25-vl-3b-floorplan-sft
Author / contact
Mudasir — multimodal AI, VLM fine-tuning, retrieval/RAG research, and engineering; MS AI Convergence, 숭실대학교 — Soongsil University, Seoul. More credentials, publications, and projects: mudasir13cs.github.io
- Hugging Face: @mudasir13cs
- GitHub: @mudasir13cs
- Email: mudasir13cs@gmail.com
- Downloads last month
- 605
Model tree for mudasir13cs/qwen25-vl-3b-floorplan-grpo
Base model
Qwen/Qwen2.5-VL-3B-Instruct