Instructions to use joyfox/JoyFox-PawScope-VL-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use joyfox/JoyFox-PawScope-VL-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="joyfox/JoyFox-PawScope-VL-AWQ")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("joyfox/JoyFox-PawScope-VL-AWQ")
model = AutoModelForMultimodalLM.from_pretrained("joyfox/JoyFox-PawScope-VL-AWQ")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use joyfox/JoyFox-PawScope-VL-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "joyfox/JoyFox-PawScope-VL-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "joyfox/JoyFox-PawScope-VL-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/joyfox/JoyFox-PawScope-VL-AWQ

SGLang

How to use joyfox/JoyFox-PawScope-VL-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "joyfox/JoyFox-PawScope-VL-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "joyfox/JoyFox-PawScope-VL-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "joyfox/JoyFox-PawScope-VL-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "joyfox/JoyFox-PawScope-VL-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use joyfox/JoyFox-PawScope-VL-AWQ with Docker Model Runner:
```
docker model run hf.co/joyfox/JoyFox-PawScope-VL-AWQ
```

JoyFox-PawScope-VL-AWQ

JoyFox-PawScope-VL-AWQ is the 4-bit AWQ release of joyfox/JoyFox-PawScope-VL, a pet-focused visual language model for cat and dog breed understanding. It keeps the original model's response style: first describe visible pet traits, then provide a natural-language breed judgement and a concise visual rationale.

The model is designed for image-based pet breed demos, pet-care assistants, data annotation workflows, and product prototypes where lower memory usage is preferred. It is not a veterinary diagnostic system and should not be used as the sole source of truth for breed certification.

What The Model Does

Given a pet image and an instruction, the model produces a Chinese response covering:

visible appearance traits such as coat color, coat length, face shape, ears, eyes, muzzle, body proportion, and posture,
age-stage cues such as adult cat/dog, kitten, or puppy when visually inferable,
the most likely cat or dog breed, with a concise reason grounded in the image.

A typical response follows this structure:

从图片看，...

判断结果：这只猫/狗更可能是...。

理由：...

Model Details

Item	Description
Model name	JoyFox-PawScope-VL-AWQ
Source model	`joyfox/JoyFox-PawScope-VL`
Foundation model	`openbmb/MiniCPM-V-4_6`
Model family	MiniCPM-V multimodal model
Released format	AWQ 4-bit checkpoint, Safetensors
Quantization	AWQ, 4-bit weights, group size 128, zero point enabled, GEMM backend
Primary modality	Image + text instruction
Main task	Cat and dog breed image understanding
Primary output language	Chinese
Recommended image detail mode	`downsample_mode="4x"`, `max_slice_nums=36`
Remote code	Required: `trust_remote_code=True`

Highlights

Lower-memory deployment: AWQ 4-bit weights reduce model size while preserving the original pet-focused behavior.
Pet-focused visual intelligence: specialized for cat and dog image understanding rather than generic image captioning.
Natural judgement format: describes visible traits first, then outputs 判断结果 and 理由 in a stable Chinese style.
Fine-grained breed grounding: supports detailed cat and dog breed judgement from visible features.
Age-stage awareness: can mention puppy, kitten, or adult cues when they are visually inferable.
Practical inference script: the included infer_pet_vision_awq.py uses AutoAWQ loading for direct image inference.

Intended Use

JoyFox-PawScope-VL-AWQ is intended for applications such as:

cat and dog breed-recognition demos,
pet-care assistants that need image-aware breed explanations,
pet image dataset annotation and review workflows,
structured labeling of cat/dog image collections,
educational tools for comparing common pet breed traits,
lower-memory deployment experiments based on the JoyFox-PawScope-VL model family.

The model should be used as an assistive interpretation layer. It can summarize likely visual cues and suggest a likely breed, but it should not replace pedigree documents, professional breed assessment, veterinary care, or direct owner knowledge.

Quick Start With AutoAWQ

This AWQ release is intended to be loaded with an AWQ-compatible runtime. The included example uses AutoAWQForCausalLM.from_quantized() together with the MiniCPM-V processor.

import torch
from awq import AutoAWQForCausalLM
from transformers import AutoProcessor

model_path = "joyfox/JoyFox-PawScope-VL-AWQ"
image_path = "your_pet_image.jpg"

prompt = """请观察图片中的宠物，先自然说明可见外观特征，再判断它最可能的具体品种，并给出理由。

回答格式：
从图片看，...

判断结果：这只猫/狗更可能是...。

理由：..."""

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
awq_model = AutoAWQForCausalLM.from_quantized(
    model_path,
    trust_remote_code=True,
    fuse_layers=False,
)
model = awq_model.model
model.eval()

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "url": image_path},
        {"type": "text", "text": prompt},
    ],
}]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
    downsample_mode="4x",
    max_slice_nums=36,
    enable_thinking=False,
)
inputs = inputs.to(next(model.parameters()).device)

with torch.inference_mode():
    output_ids = model.generate(
        **inputs,
        downsample_mode="4x",
        max_new_tokens=512,
        do_sample=False,
    )

output_ids = [out[len(inp):] for inp, out in zip(inputs.input_ids, output_ids)]
answer = processor.batch_decode(
    output_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)[0]
print(answer.strip())

Using `infer_pet_vision_awq.py`

The included inference script is designed for direct file-based testing with this AWQ checkpoint. Edit the configuration block at the top of infer_pet_vision_awq.py:

MODEL_PATH = str(MODEL_DIR)
IMAGE_PATH = str(MODEL_DIR / "assets" / "dog_pug_08238.png")
MAX_NEW_TOKENS = 512
DOWNSAMPLE_MODE = "4x"
MAX_SLICE_NUMS = 36
DO_SAMPLE = False
FUSE_LAYERS = False

Then run:

python infer_pet_vision_awq.py

The script resolves local image paths, converts file-based images to a standard temporary JPEG for robust decoding, loads the checkpoint with AutoAWQ, builds a MiniCPM-V image-text message, runs deterministic generation, and prints the decoded answer.

Quantization Notes

This checkpoint uses the MiniCPM-V 4.6 AWQ path with 4-bit GEMM quantization. The small GatedDeltaNet projection modules linear_attn.in_proj_a and linear_attn.in_proj_b are kept unquantized because they participate in scale compensation but are not suitable for AWQ GEMM weight packing. This is reflected in the model quantization_config.

Depending on the Transformers and quantization backend versions in a given environment, direct AutoModelForImageTextToText.from_pretrained() loading may require an AWQ backend such as gptqmodel. When in doubt, use the AutoAWQ loading path shown above or the included infer_pet_vision_awq.py script.

Qualitative Examples

The following examples illustrate the preferred response style: the model first describes visible pet traits, then gives a breed judgement and a short visual rationale. These examples are intended to demonstrate output format and qualitative behavior, not to serve as a benchmark.

Pug

Image: dog_pug_08238.png

Example output

这只狗拥有典型的短毛外观，毛色呈现出浅黄褐色，面部有着明显的黑色区域，覆盖了眼睛周围及口鼻部。它的眼睛大而圆，眼神深邃，面部皮肤有明显的褶皱，这是该品种非常显著的特征。从其面部骨骼结构和成熟的眼神来看，这应该是一只成年犬。

判断结果：这只狗更可能是巴哥犬。

理由：该犬具备巴哥犬最核心的品种特征，包括标志性的短毛、浅黄褐色的被毛颜色、深色的面部斑纹以及深陷的褶皱。其大而圆的眼睛和紧凑的口鼻部结构与巴哥犬的品种标准高度吻合，这些特征共同支持了这一判断。

Border Collie

Image: dog_border_collie_07425.png

Example output

这只狗拥有黑白相间的中长毛，面部中央有明显的白色条纹，眼神专注，耳朵半立，整体身体线条匀称且显得灵活。从头部比例、毛色分布和神态来看，它呈现出成年犬的外观特征。

判断结果：这只狗更可能是边境牧羊犬。

理由：黑白双色被毛、额头到鼻梁的白色纹路、半立耳和专注的表情都是边境牧羊犬常见的视觉特征，这些线索共同支持该判断。

Sphynx Cat

Image: cat_sphynx_01647.png

Example output

这只猫几乎没有明显被毛，皮肤有清晰褶皱，耳朵大而直立，头部轮廓偏楔形，眼睛呈杏仁状，整体体态较成熟，正在柔软的毯子上休息。

判断结果：这只猫更可能是斯芬克斯猫。

理由：近乎无毛的皮肤、大耳、楔形头部和明显皮肤褶皱都是斯芬克斯猫非常突出的品种特征，因此该判断更符合图中可见外观。

Notes On Warnings

You may see warnings about the MiniCPM-V fast path not being available, AutoAWQ deprecation notices, or processor keyword messages. These warnings generally do not block image inference.

For detailed breed recognition, downsample_mode="4x" and max_slice_nums=36 preserve more visual detail. For faster but less detailed inference, use downsample_mode="16x".

Limitations

The model is focused on cat and dog images. Other animals, mixed scenes, toys, drawings, or non-pet images may produce unreliable outputs.
Breed identification from a single image is inherently uncertain, especially for mixed-breed pets and visually similar breeds.
Quantization may introduce small wording or judgement differences compared with the source checkpoint.
The model may overstate confidence when the image lacks clear breed-specific features.
Lighting, occlusion, grooming style, age, camera angle, and partial body visibility can reduce reliability.
The model primarily follows a Chinese response format. English output may require a separate prompt and has not been the main tuning target.
The model should not be used for veterinary diagnosis, legal breed certification, insurance decisions, shelter intake decisions, or safety-critical judgement.

License

This model follows the license terms of the source model and the released model metadata. Please also review the license and usage terms of joyfox/JoyFox-PawScope-VL and openbmb/MiniCPM-V-4_6 before redistribution or commercial use.

Acknowledgements

JoyFox-PawScope-VL-AWQ is based on JoyFox-PawScope-VL and OpenBMB's MiniCPM-V-4.6 multimodal model.

Downloads last month: 30

Safetensors

Model size

1B params

Tensor type

I32

BF16

Model tree for joyfox/JoyFox-PawScope-VL-AWQ

Base model

openbmb/MiniCPM-V-4.6

Finetuned

joyfox/JoyFox-PawScope-VL

Quantized

(1)

this model

JoyFox-PawScope-VL-AWQ

What The Model Does

Model Details

Highlights

Intended Use

Quick Start With AutoAWQ

Using infer_pet_vision_awq.py

Quantization Notes

Qualitative Examples

Pug

Border Collie

Sphynx Cat

Notes On Warnings

Limitations

License

Acknowledgements

Model tree for joyfox/JoyFox-PawScope-VL-AWQ

Using `infer_pet_vision_awq.py`