VIBE-Image-Edit

@@ -1,302 +1,9 @@
 ---
-language:
-- en
-pipeline_tag: image-to-image
-tags:
-- image-editing
-- text-guided-editing
-- diffusion
-- sana
-- qwen-vl
-- multimodal
 base_model:
-- Efficient-Large-Model/SANA1.5_1.6B_1024px
-- Qwen/Qwen3-VL-2B-Instruct
 library_name: diffusers
 ---
-# VIBE: Visual Instruction Based Editor
-<div align="center">
-  <img src="VIBE.png" width="800" alt="VIBE"/>
-</div>
-<p style="text-align: center;">
-  <div align="center">
-  </div>
-  <p align="center">
-  <a href="https://riko0.github.io/VIBE"> 🌐 Project Page </a> |
-  <a href="https://arxiv.org/abs/2601.02242"> 📜 Paper on arXiv </a> |
-  <a href="https://github.com/ai-forever/vibe"> Github </a> |
-  <a href="https://huggingface.co/spaces/iitolstykh/VIBE-Image-Edit-DEMO">🤗 Space | </a>
-  <a href="https://huggingface.co/iitolstykh/VIBE-Image-Edit-DistilledCFG">🤗 VIBE-Image-Edit-DistilledCFG | </a>
-</p>
-**VIBE** is a powerful open-source framework for text-guided image editing. It leverages the efficiency of the [Sana1.5-1.6B](https://github.com/NVlabs/Sana) diffusion model and the visual understanding capabilities of [Qwen3-VL-2B-Instruct](https://github.com/QwenLM/Qwen3-VL) to provide **exceptionally fast** and high-quality, instruction-based image manipulation.
-We also provide a faster, **CFG-distilled** version of this model available at [VIBE-Image-Edit-DistilledCFG](https://huggingface.co/iitolstykh/VIBE-Image-Edit-DistilledCFG).
-## Model Details
-- **Name:** VIBE
-- **Task:** Text-Guided Image Editing
-- **Architecture:**
-  - **Diffusion Backbone:** Sana1.5 (1.6B parameters) with Linear Attention.
-  - **Condition Encoder:** Qwen3-VL (2B parameters) for multimodal understanding.
-- **Framework:** Built on `diffusers` and `transformers`.
-- **Model precision**: torch.bfloat16 (BF16)
-- **Model resolution**: This model is developed to edit up to 2048px images with multi-scale heigh and width.
-## Features
-- **Text-Guided Editing:** Edit images using natural language instructions (e.g., "Add a cat on the sofa").
-- **Compact & Efficient:** Combines a 1.6B parameter diffusion model with a 2B parameter encoder for a lightweight footprint.
-- **High-Speed Inference:** Utilizes Sana1.5's linear attention mechanism for rapid generation.
-- **Multimodal Understanding:** Qwen3-VL ensures strong alignment between visual content and text instructions.
-- **Text-to-Image** support.
-# Inference Requirements
-- `vibe` library
-```bash
-pip install git+https://github.com/ai-forever/VIBE
-```
-- requirements for `vibe` library:
-```bash
-pip install transformers==4.57.1 torchvision==0.21.0 torch==2.6.0 diffusers==0.33.1 loguru==0.7.3
-```
-# Quick start
-```python
-from PIL import Image
-import requests
-from io import BytesIO
-from huggingface_hub import snapshot_download
-from vibe.editor import ImageEditor
-# Download model
-model_path = snapshot_download(
-    repo_id="iitolstykh/VIBE-Image-Edit",
-    repo_type="model",
-)
-# Load model
-editor = ImageEditor(
-    checkpoint_path=model_path,
-    image_guidance_scale=1.2,
-    guidance_scale=4.5,
-    num_inference_steps=20,
-    device="cuda:0",
-)
-# Download test image
-resp = requests.get('https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/3f58a82a-b4b4-40c3-a318-43f9350fcd02/original=true,quality=90/115610275.jpeg')
-image = Image.open(BytesIO(resp.content))
-# Generate edited image
-edited_image = editor.generate_edited_image(
-    instruction="let this case swim in the river",
-    conditioning_image=image,
-    num_images_per_prompt=1,
-)[0]
-edited_image.save(f"edited_image.jpg", quality=100)
-```
-## T2I Examples
-<details open>
-<summary>(<b>Seed:</b> 234) <b>Prompt:</b> View through the clouds at Earth from a plane</summary>
-![Image 1](images/other/1.png)
-</details>
-<details open>
-<summary>(<b>Seed:</b> 2) <b>Prompt:</b> Medieval castle at sunset surrounded by dense forest and mist</summary>
-![Image 7](images/other/4.png)
-</details>
-<details open>
-<summary>(<b>Seed:</b> 666) <b>Prompt:</b> Portrait of an old wise man with a long white beard surrounded by books and candles</summary>
-![Image 4](images/other/8.png)
-</details>
-<details>
-<summary>(<b>Seed:</b> 9513) <b>Prompt:</b> Night urban street with wet asphalt reflections and neon signs</summary>
-![Image 5](images/other/9.png)
-</details>
-<details>
-<summary>(<b>Seed:</b> 142) <b>Prompt:</b> Futuristic sports car racing in the desert</summary>
-![Image 2](images/other/10.png)
-</details>
-<details>
-<summary>(<b>Seed:</b> 1325) <b>Prompt:</b> Pirate boat in ocean</summary>
-![Image 3](images/other/2.png)
-</details>
-<details>
-<summary>(<b>Seed:</b> 4241) <b>Prompt:</b> Davy Jones portrait</summary>
-![Image 6](images/other/3.png)
-</details>
-<details>
-<summary>(<b>Seed:</b> 142) <b>Prompt:</b> Epic cosmic scene with a huge space station and distant stars</summary>
-![Image 8](images/other/5.png)
-</details>
-<details>
-<summary>(<b>Seed:</b> 42) <b>Prompt:</b> Cherry blossom park in spring with petals falling to the ground</summary>
-![Image 9](images/other/6.png)
-</details>
-## Comparison with SANA1.5_1.6B_1024px
-**Prompt:** Generate an interior of a rustic cabin workshop during winter evening. The viewpoint is from the doorway, showing a workbench with tools, wood shavings on the floor, and a cast-iron stove glowing softly. Place shelves with jars of nails, coils of rope, and folded blankets. Through a small window, show snow falling and pine trees in the twilight. Add warm lamplight creating soft gradients and a gentle vignette. Include a person in a thick sweater sanding a wooden object at the bench, but keep the person small in frame
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_3.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 4411)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_3.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 1521)</div>
-  </div>
-</div>
----
-**Prompt:** Generate an ancient jungle temple ruin partially covered in moss and vines, with a waterfall cascading nearby into a shallow pool. Show broken stone steps, carved patterns that are abstract, and damp surfaces with realistic moss detail. Add mist, shafts of sunlight through leaves, and small floating insects. Include a human explorer in the mid-ground, small in frame, wearing a backpack. Lush, cinematic realism.
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_4.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 1995)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_4.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 9842)</div>
-  </div>
-</div>
----
-**Prompt:** Create a science-fiction interior of a space greenhouse module with hydroponic racks, glowing grow lights, and condensation on transparent walls. Plants include leafy greens and flowering specimens. Tools and tablets have UI elements. Add soft floating dust or microgravity droplets. Clean, detailed, plausible sci-fi aesthetic.
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_5.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 2203)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_5.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 143)</div>
-  </div>
-</div>
----
-**Prompt:** Beautiful tropical beach with guinea pig swimming in the water and human drinking wine
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_6.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 132142)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_6.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 132142)</div>
-  </div>
-</div>
----
-**Prompt:** Create a cinematic, rainy night scene in a narrow backstreet of an old downtown area. The camera is at street level, slightly tilted upward, emphasizing wet cobblestones reflecting neon-like colored lights without readable text. Show a small ramen stall with steam rising from pots, hanging paper lanterns that are blank or patterned (no letters), and acouple of stools under a simple awning. Add puddles, scattered trash like crumpled paper, and subtle mist. Include a passerby in the mid-ground seen from behind wearing a hooded jacket and carrying an umbrella, face not visible. Use a moody color palette of deep blues and warm oranges, with soft bokeh highlights and realistic rain streaks
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_2.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 1003)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_2.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 3114)</div>
-  </div>
-</div>
----
-**Prompt:** Depict a volcanic lava field at twilight with cooled black rock, glowing cracks of magma in the distance, and heat shimmer. The sky is darkening with faint stars emerging. Add thin smoke plumes and red-orange reflections on nearby rocks. Cinematic realism, dramatic contrast
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_7.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 1520)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_7.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 1267)</div>
-  </div>
-</div>
----
-**Prompt:** Portrait from back of a young woman dressed in Victorian attire standing in an ancient library filled with mirrors and stained glass windows, softly illuminated by sunlight streaming through
-<div style="display: flex; gap: 24px; justify-content: center; align-items: flex-start;">
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/vibe/image_1.png" alt="VIBE" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>VIBE (Seed: 4152)</div>
-  </div>
-  <div style="text-align: center; flex: 1; min-width: 0;">
-    <img src="images/sana/image_1.png" alt="SANA1.5_1.6B_1024px" style="width: 100%; max-width: 500px; height: auto; display: block; margin: 0 auto 8px auto;">
-    <div>SANA1.5_1.6B_1024px (Seed: 6742)</div>
-  </div>
-</div>
-## License
-This project is built upon the SANA. Please refer to the original SANA license for usage terms:
-[SANA License](https://huggingface.co/Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers/blob/main/LICENSE.txt)
-## Citation
-If you use this model in your research or applications, please acknowledge the original projects:
-- [SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer](https://github.com/NVlabs/Sana)
-- [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL)
-```bibtex
-@misc{vibe2026,
-  Author = {Grigorii Alekseenko and Aleksandr Gordeev and Irina Tolstykh and Bulat Suleimanov and Vladimir Dokholyan and Georgii Fedorov and Sergey Yakubson and Aleksandra Tsybina and Mikhail Chernyshov and Maksim Kuprashevich},
-  Title = {VIBE: Visual Instruction Based Editor},
-  Year = {2026},
-  Eprint = {arXiv:2601.02242},
-}
-```

 ---
+license: other
 base_model:
+- iitolstykh/VIBE-Image-Edit
+pipeline_tag: image-to-image
 library_name: diffusers
 ---
+Modified copy of [iitolstykh/VIBE-Image-Edit](https://huggingface.co/iitolstykh/VIBE-Image-Edit) to avoid unnecessary references to custom code and allow clean usage in SD.Next