Instructions to use ifmain/blip-image2promt-stable-diffusion-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ifmain/blip-image2promt-stable-diffusion-base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="ifmain/blip-image2promt-stable-diffusion-base")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("ifmain/blip-image2promt-stable-diffusion-base") model = AutoModelForMultimodalLM.from_pretrained("ifmain/blip-image2promt-stable-diffusion-base") - Notebooks
- Google Colab
- Kaggle
File size: 2,228 Bytes
3c1e400 c3c0528 3c1e400 c3c0528 3c1e400 27818d1 3c1e400 ca29119 3c1e400 f153265 3c1e400 b00d4f0 52c9ecd 3c1e400 b00d4f0 52c9ecd b00d4f0 52c9ecd b00d4f0 3c1e400 462976e 3c1e400 b00d4f0 3c1e400 b00d4f0 3c1e400 b00d4f0 52c9ecd b00d4f0 3c1e400 ca26771 c3c0528 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
datasets:
- Ar4ikov/civitai-sd-337k
language:
- en
pipeline_tag: image-to-text
base_model: Salesforce/blip-image-captioning-base
---
# Licence
license inherited from [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
# Overview
`ifmain/blip-image2promt-stable-diffusion-base` is a model based on [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), trained on the [Ar4ikov/civitai-sd-337k](https://huggingface.co/datasets/Ar4ikov/civitai-sd-337k) dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.
I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer)
# Example Usage
```python
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re
def prepare(text):
text = text.replace('. ','.').replace(' .','.')
text = text.replace('( ','(').replace(' (','(')
text = text.replace(') ',')').replace(' )',')')
text = text.replace(': ',':').replace(' :',':')
text = text.replace('_ ','_').replace(' _','_')
text = text.replace(',(())','').replace('(()),','')
for i in range(10):
text = text.replace(')))','))').replace('(((','((')
text = re.sub(r'<[^>]*>', '', text)
return text
path_to_model = "ifmain/blip-image2promt-stable-diffusion-base"
processor = BlipProcessor.from_pretrained(path_to_model)
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
out = model.generate(**inputs, max_new_tokens=100)
out_txt = processor.decode(out[0], skip_special_tokens=True)
print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((
```
## Addition
This model support SFW and NSFW content |