arxiv:2605.01296

SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-On

Published on May 2

Authors:

Abstract

SIFT-VTON enhances diffusion-based virtual try-on by using SIFT keypoint matching to provide explicit geometric guidance, improving detail preservation and spatial alignment.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Diffusion-based virtual try-on methods achieve photorealistic synthesis through cross-attention mechanisms that transfer garment features to target body regions. However, these approaches rely on implicit learning of spatial correspondences, struggling to preserve fine details such as text and illustrations. We propose a novel approach, which we call SIFT-VTON, that utilizes SIFT keypoint matching to provide explicit geometric guidance for diffusion-based virtual try-on. Our method applies domain-specific filtering to SIFT keypoint matches between garment and person images, then converts these correspondences into spatial probability distributions that supervise cross-attention layers during training. This explicit supervision guides the model to learn precise spatial alignment, concentrating attention on geometrically consistent garment regions. Experiments on the VITON-HD dataset demonstrate significant improvements on unpaired metrics while maintaining competitive paired reconstruction metrics. Qualitative comparisons show superior preservation of text clarity and pattern alignment. Attention visualizations confirm that our method produces sharply focused attention on relevant garment details. This work demonstrates that classical geometric correspondence methods can effectively enhance modern diffusion models for conditional synthesis tasks. The source code will be available at https://github.com/takesukeDS/SIFT-VTON.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.01296

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.01296 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.01296 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.