ifmain
/

blip-image2promt-stable-diffusion-base

image-text-to-text

Model card Files Files and versions

blip-image2promt-stable-diffusion-base / README.md

ifmain's picture

Update README.md

ca29119 verified almost 2 years ago

|

History Blame Contribute Delete

2.23 kB

	---
	datasets:
	- Ar4ikov/civitai-sd-337k
	language:
	- en
	pipeline_tag: image-to-text
	base_model: Salesforce/blip-image-captioning-base
	---

	# Licence

	license inherited from [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)

	# Overview
	`ifmain/blip-image2promt-stable-diffusion-base` is a model based on [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), trained on the [Ar4ikov/civitai-sd-337k](https://huggingface.co/datasets/Ar4ikov/civitai-sd-337k) dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.

	I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer)

	# Example Usage
	```python
	import torch
	import requests
	from PIL import Image
	from transformers import BlipProcessor, BlipForConditionalGeneration
	import re

	def prepare(text):
	text = text.replace('. ','.').replace(' .','.')
	text = text.replace('( ','(').replace(' (','(')
	text = text.replace(') ',')').replace(' )',')')
	text = text.replace(': ',':').replace(' :',':')
	text = text.replace('_ ','_').replace(' _','_')
	text = text.replace(',(())','').replace('(()),','')
	for i in range(10):
	text = text.replace(')))','))').replace('(((','((')
	text = re.sub(r'<[^>]*>', '', text)
	return text

	path_to_model = "ifmain/blip-image2promt-stable-diffusion-base"

	processor = BlipProcessor.from_pretrained(path_to_model)
	model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")

	img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
	raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

	# unconditional image captioning
	inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)

	out = model.generate(**inputs, max_new_tokens=100)

	out_txt = processor.decode(out[0], skip_special_tokens=True)

	print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((
	```

	## Addition

	This model support SFW and NSFW content