Instructions to use ifmain/blip-image2promt-stable-diffusion-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ifmain/blip-image2promt-stable-diffusion-base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="ifmain/blip-image2promt-stable-diffusion-base")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("ifmain/blip-image2promt-stable-diffusion-base") model = AutoModelForMultimodalLM.from_pretrained("ifmain/blip-image2promt-stable-diffusion-base") - Notebooks
- Google Colab
- Kaggle
| datasets: | |
| - Ar4ikov/civitai-sd-337k | |
| language: | |
| - en | |
| pipeline_tag: image-to-text | |
| base_model: Salesforce/blip-image-captioning-base | |
| # Licence | |
| license inherited from [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) | |
| # Overview | |
| `ifmain/blip-image2promt-stable-diffusion-base` is a model based on [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), trained on the [Ar4ikov/civitai-sd-337k](https://huggingface.co/datasets/Ar4ikov/civitai-sd-337k) dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models. | |
| I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer) | |
| # Example Usage | |
| ```python | |
| import torch | |
| import requests | |
| from PIL import Image | |
| from transformers import BlipProcessor, BlipForConditionalGeneration | |
| import re | |
| def prepare(text): | |
| text = text.replace('. ','.').replace(' .','.') | |
| text = text.replace('( ','(').replace(' (','(') | |
| text = text.replace(') ',')').replace(' )',')') | |
| text = text.replace(': ',':').replace(' :',':') | |
| text = text.replace('_ ','_').replace(' _','_') | |
| text = text.replace(',(())','').replace('(()),','') | |
| for i in range(10): | |
| text = text.replace(')))','))').replace('(((','((') | |
| text = re.sub(r'<[^>]*>', '', text) | |
| return text | |
| path_to_model = "ifmain/blip-image2promt-stable-diffusion-base" | |
| processor = BlipProcessor.from_pretrained(path_to_model) | |
| model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda") | |
| img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' | |
| raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') | |
| # unconditional image captioning | |
| inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16) | |
| out = model.generate(**inputs, max_new_tokens=100) | |
| out_txt = processor.decode(out[0], skip_special_tokens=True) | |
| print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),(( | |
| ``` | |
| ## Addition | |
| This model support SFW and NSFW content |