Any-to-Any
Bagel
Safetensors
nielsr HF Staff commited on
Commit
0416d72
ยท
verified ยท
1 Parent(s): b4c7783

Improve model card: Update pipeline tag, library name, add usage example and specific code link

Browse files

This PR improves the model card for `BAGEL-NHR-Edit` by:
* Updating the `pipeline_tag` from `any-to-any` to `image-to-image` for more accurate categorization and discoverability on the Hub.
* Setting the `library_name` to `transformers` as the model is compatible with the Hugging Face Transformers library (based on Qwen2 architecture).
* Adding `image-editing` and `text-guided-image-editing` to the `tags` metadata for better model discoverability.
* Including a direct link to the specific NHR-Edit GitHub repository (`https://github.com/Riko0/No-Humans-Required-Dataset`) in the header links.
* Adding a comprehensive Python code snippet demonstrating how to use the model for inference with the `transformers` library.
* Adding the Hugging Face Papers link for the paper alongside the existing arXiv link in the header for easy access.

Please review and merge this pull request if the changes align with the repository's guidelines.

Files changed (1) hide show
  1. README.md +70 -10
README.md CHANGED
@@ -1,29 +1,89 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - ByteDance-Seed/BAGEL-7B-MoT
5
- pipeline_tag: any-to-any
6
- library_name: bagel-mot
 
7
  arxiv: 2507.14119
 
 
 
8
  ---
9
 
10
-
11
  # ๐Ÿฅฏ BAGEL-NHR-Edit
12
 
13
  <p align="left">
14
- <a href="https://riko0.github.io/No-Humans-Required/"> ๐ŸŒ NHR Website </a> |
15
- <a href="https://arxiv.org/abs/2507.14119"> ๐Ÿ“œ NHR Paper on arXiv </a> |
16
- <a href="https://huggingface.co/datasets/iitolstykh/NHR-Edit"> ๐Ÿค— NHR-Edit Dataset </a> |
 
 
17
  </p>
18
 
19
- This repository hosts the model weights for **BAGEL**, fine-tuned on the **[NHR-Edit](https://huggingface.co/datasets/iitolstykh/NHR-Edit)** dataset. For installation, usage instructions, and further documentation, please visit the [official BAGEL GitHub repository](https://github.com/bytedance-seed/BAGEL).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ### ๐Ÿ› ๏ธ Training Setup
23
 
24
  We performed parameter-efficient adaptation on the generation expertโ€™s attention and FFN projection layers using LoRA.
25
 
26
- LoRA parameters:
27
  ```
28
  r = 16
29
  lora_alpha = 16
@@ -83,4 +143,4 @@ Results comparison between original Bagel-7B-MoT and BAGEL-NHR-EDIT on samples f
83
  url = {https://arxiv.org/abs/2507.14119},
84
  journal={arXiv preprint arXiv:2507.14119}
85
  }
86
- ```
 
1
  ---
 
2
  base_model:
3
  - ByteDance-Seed/BAGEL-7B-MoT
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ pipeline_tag: image-to-image
7
  arxiv: 2507.14119
8
+ tags:
9
+ - image-editing
10
+ - text-guided-image-editing
11
  ---
12
 
 
13
  # ๐Ÿฅฏ BAGEL-NHR-Edit
14
 
15
  <p align="left">
16
+ <a href="https://riko0.github.io/No-Humans-Required/"> ๐ŸŒ NHR Website </a> |
17
+ <a href="https://huggingface.co/papers/2507.14119"> ๐Ÿ“œ NHR Paper </a> |
18
+ <a href="https://arxiv.org/abs/2507.14119"> ๐Ÿ“œ NHR Paper on arXiv </a> |
19
+ <a href="https://huggingface.co/datasets/iitolstykh/NHR-Edit"> ๐Ÿค— NHR-Edit Dataset </a> |
20
+ <a href="https://github.com/Riko0/No-Humans-Required-Dataset"> ๐Ÿ’ป NHR-Edit Code </a>
21
  </p>
22
 
23
+ This repository hosts the model weights for **BAGEL-NHR-Edit**, a **BAGEL** model fine-tuned on the **[NHR-Edit](https://huggingface.co/datasets/iitolstykh/NHR-Edit)** dataset, as presented in the paper [NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining](https://huggingface.co/papers/2507.14119). For installation, usage instructions, and further documentation specific to NHR-Edit, please visit the [NHR-Edit GitHub repository](https://github.com/Riko0/No-Humans-Required-Dataset).
24
+
25
+ ### ๐Ÿš€ Sample Usage
26
+
27
+ You can use the model with the Hugging Face `transformers` library.
28
+
29
+ ```python
30
+ from transformers import AutoProcessor, AutoModelForCausalLM
31
+ import torch
32
+ from PIL import Image
33
+ import requests # Required to load image from URL
34
+
35
+ # Load model and processor
36
+ # It's recommended to use bfloat16 for better performance and memory efficiency if supported.
37
+ model = AutoModelForCausalLM.from_pretrained(
38
+ "iitolstykh/BAGEL-NHR-Edit",
39
+ torch_dtype=torch.bfloat16,
40
+ device_map="auto",
41
+ trust_remote_code=True # Needed for custom model architecture/components
42
+ )
43
+ processor = AutoProcessor.from_pretrained("iitolstykh/BAGEL-NHR-Edit")
44
+
45
+ # Example image input (replace with your actual image path or PIL Image object)
46
+ image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"
47
+ image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
48
+
49
+ # Example text instruction for editing
50
+ instruction = "Change the car's color to red."
51
+
52
+ # Prepare the conversation in the model's chat template format
53
+ messages = [
54
+ {
55
+ "role": "user",
56
+ "content": [
57
+ {"type": "image", "image": image},
58
+ {"type": "text", "text": instruction},
59
+ ],
60
+ }
61
+ ]
62
+
63
+ # Apply chat template and prepare inputs for the model
64
+ text = processor.apply_chat_template(
65
+ messages, tokenize=False, add_generation_prompt=True
66
+ )
67
+ inputs = processor(text=[text], images=[image], return_tensors="pt")
68
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
69
 
70
+ # Generate the textual output describing the image edit
71
+ # This output typically includes bounding box or quad information for the edited regions.
72
+ generated_ids = model.generate(**inputs, max_new_tokens=1024)
73
+ generated_text = processor.batch_decode(generated_ids[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
74
+
75
+ print("Generated Textual Description of Edit:")
76
+ print(generated_text)
77
+
78
+ # To visualize the actual image edit, further post-processing (e.g., using a diffusion model conditioned on this output)
79
+ # would be required, which is beyond the scope of this basic inference example.
80
+ ```
81
 
82
  ### ๐Ÿ› ๏ธ Training Setup
83
 
84
  We performed parameter-efficient adaptation on the generation expertโ€™s attention and FFN projection layers using LoRA.
85
 
86
+ LoRA parameters:
87
  ```
88
  r = 16
89
  lora_alpha = 16
 
143
  url = {https://arxiv.org/abs/2507.14119},
144
  journal={arXiv preprint arXiv:2507.14119}
145
  }
146
+ ```