How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "samgreen/Qwen2.5-VL-7B-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samgreen/Qwen2.5-VL-7B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/samgreen/Qwen2.5-VL-7B-Instruct-GGUF:
Quick Links

Qwen2.5-VL-7B-Instruct

Converted and quantized using HimariO's fork using this procedure. No IMatrix.

The fork is currently required to run inference and there's no guarantee these checkpoints will work with future builds. Temporary builds are available here. The latest tested build as of writing is qwen25-vl-b4899-bc4163b.

Edit:

As of 1-April-2025 inference support has been added to koboldcpp.

Original model

Usage

./llama-qwen2vl-cli -m Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf --mmproj qwen2.5-vl-7b-instruct-vision-f16.gguf -p "Please describe this image." --image ./image.jpg
Downloads last month
221
GGUF
Model size
8B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for samgreen/Qwen2.5-VL-7B-Instruct-GGUF

Quantized
(141)
this model