Instructions to use joyfox/JoyFox-PawScope-VL-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use joyfox/JoyFox-PawScope-VL-AWQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="joyfox/JoyFox-PawScope-VL-AWQ") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("joyfox/JoyFox-PawScope-VL-AWQ") model = AutoModelForMultimodalLM.from_pretrained("joyfox/JoyFox-PawScope-VL-AWQ") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use joyfox/JoyFox-PawScope-VL-AWQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "joyfox/JoyFox-PawScope-VL-AWQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "joyfox/JoyFox-PawScope-VL-AWQ", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/joyfox/JoyFox-PawScope-VL-AWQ
- SGLang
How to use joyfox/JoyFox-PawScope-VL-AWQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "joyfox/JoyFox-PawScope-VL-AWQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "joyfox/JoyFox-PawScope-VL-AWQ", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "joyfox/JoyFox-PawScope-VL-AWQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "joyfox/JoyFox-PawScope-VL-AWQ", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use joyfox/JoyFox-PawScope-VL-AWQ with Docker Model Runner:
docker model run hf.co/joyfox/JoyFox-PawScope-VL-AWQ
JoyFox-PawScope-VL-AWQ
JoyFox-PawScope-VL-AWQ is the 4-bit AWQ release of joyfox/JoyFox-PawScope-VL, a pet-focused visual language model for cat and dog breed understanding. It keeps the original model's response style: first describe visible pet traits, then provide a natural-language breed judgement and a concise visual rationale.
The model is designed for image-based pet breed demos, pet-care assistants, data annotation workflows, and product prototypes where lower memory usage is preferred. It is not a veterinary diagnostic system and should not be used as the sole source of truth for breed certification.
What The Model Does
Given a pet image and an instruction, the model produces a Chinese response covering:
- visible appearance traits such as coat color, coat length, face shape, ears, eyes, muzzle, body proportion, and posture,
- age-stage cues such as adult cat/dog, kitten, or puppy when visually inferable,
- the most likely cat or dog breed, with a concise reason grounded in the image.
A typical response follows this structure:
从图片看,...
判断结果:这只猫/狗更可能是...。
理由:...
Model Details
| Item | Description |
|---|---|
| Model name | JoyFox-PawScope-VL-AWQ |
| Source model | joyfox/JoyFox-PawScope-VL |
| Foundation model | openbmb/MiniCPM-V-4_6 |
| Model family | MiniCPM-V multimodal model |
| Released format | AWQ 4-bit checkpoint, Safetensors |
| Quantization | AWQ, 4-bit weights, group size 128, zero point enabled, GEMM backend |
| Primary modality | Image + text instruction |
| Main task | Cat and dog breed image understanding |
| Primary output language | Chinese |
| Recommended image detail mode | downsample_mode="4x", max_slice_nums=36 |
| Remote code | Required: trust_remote_code=True |
Highlights
- Lower-memory deployment: AWQ 4-bit weights reduce model size while preserving the original pet-focused behavior.
- Pet-focused visual intelligence: specialized for cat and dog image understanding rather than generic image captioning.
- Natural judgement format: describes visible traits first, then outputs
判断结果and理由in a stable Chinese style. - Fine-grained breed grounding: supports detailed cat and dog breed judgement from visible features.
- Age-stage awareness: can mention puppy, kitten, or adult cues when they are visually inferable.
- Practical inference script: the included
infer_pet_vision_awq.pyuses AutoAWQ loading for direct image inference.
Intended Use
JoyFox-PawScope-VL-AWQ is intended for applications such as:
- cat and dog breed-recognition demos,
- pet-care assistants that need image-aware breed explanations,
- pet image dataset annotation and review workflows,
- structured labeling of cat/dog image collections,
- educational tools for comparing common pet breed traits,
- lower-memory deployment experiments based on the JoyFox-PawScope-VL model family.
The model should be used as an assistive interpretation layer. It can summarize likely visual cues and suggest a likely breed, but it should not replace pedigree documents, professional breed assessment, veterinary care, or direct owner knowledge.
Quick Start With AutoAWQ
This AWQ release is intended to be loaded with an AWQ-compatible runtime. The included example uses AutoAWQForCausalLM.from_quantized() together with the MiniCPM-V processor.
import torch
from awq import AutoAWQForCausalLM
from transformers import AutoProcessor
model_path = "joyfox/JoyFox-PawScope-VL-AWQ"
image_path = "your_pet_image.jpg"
prompt = """请观察图片中的宠物,先自然说明可见外观特征,再判断它最可能的具体品种,并给出理由。
回答格式:
从图片看,...
判断结果:这只猫/狗更可能是...。
理由:..."""
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
awq_model = AutoAWQForCausalLM.from_quantized(
model_path,
trust_remote_code=True,
fuse_layers=False,
)
model = awq_model.model
model.eval()
messages = [{
"role": "user",
"content": [
{"type": "image", "url": image_path},
{"type": "text", "text": prompt},
],
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
downsample_mode="4x",
max_slice_nums=36,
enable_thinking=False,
)
inputs = inputs.to(next(model.parameters()).device)
with torch.inference_mode():
output_ids = model.generate(
**inputs,
downsample_mode="4x",
max_new_tokens=512,
do_sample=False,
)
output_ids = [out[len(inp):] for inp, out in zip(inputs.input_ids, output_ids)]
answer = processor.batch_decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(answer.strip())
Using infer_pet_vision_awq.py
The included inference script is designed for direct file-based testing with this AWQ checkpoint. Edit the configuration block at the top of infer_pet_vision_awq.py:
MODEL_PATH = str(MODEL_DIR)
IMAGE_PATH = str(MODEL_DIR / "assets" / "dog_pug_08238.png")
MAX_NEW_TOKENS = 512
DOWNSAMPLE_MODE = "4x"
MAX_SLICE_NUMS = 36
DO_SAMPLE = False
FUSE_LAYERS = False
Then run:
python infer_pet_vision_awq.py
The script resolves local image paths, converts file-based images to a standard temporary JPEG for robust decoding, loads the checkpoint with AutoAWQ, builds a MiniCPM-V image-text message, runs deterministic generation, and prints the decoded answer.
Quantization Notes
This checkpoint uses the MiniCPM-V 4.6 AWQ path with 4-bit GEMM quantization. The small GatedDeltaNet projection modules linear_attn.in_proj_a and linear_attn.in_proj_b are kept unquantized because they participate in scale compensation but are not suitable for AWQ GEMM weight packing. This is reflected in the model quantization_config.
Depending on the Transformers and quantization backend versions in a given environment, direct AutoModelForImageTextToText.from_pretrained() loading may require an AWQ backend such as gptqmodel. When in doubt, use the AutoAWQ loading path shown above or the included infer_pet_vision_awq.py script.
Qualitative Examples
The following examples illustrate the preferred response style: the model first describes visible pet traits, then gives a breed judgement and a short visual rationale. These examples are intended to demonstrate output format and qualitative behavior, not to serve as a benchmark.
Pug
Image: dog_pug_08238.png
Example output
这只狗拥有典型的短毛外观,毛色呈现出浅黄褐色,面部有着明显的黑色区域,覆盖了眼睛周围及口鼻部。它的眼睛大而圆,眼神深邃,面部皮肤有明显的褶皱,这是该品种非常显著的特征。从其面部骨骼结构和成熟的眼神来看,这应该是一只成年犬。
判断结果:这只狗更可能是巴哥犬。
理由:该犬具备巴哥犬最核心的品种特征,包括标志性的短毛、浅黄褐色的被毛颜色、深色的面部斑纹以及深陷的褶皱。其大而圆的眼睛和紧凑的口鼻部结构与巴哥犬的品种标准高度吻合,这些特征共同支持了这一判断。
Border Collie
Image: dog_border_collie_07425.png
Example output
这只狗拥有黑白相间的中长毛,面部中央有明显的白色条纹,眼神专注,耳朵半立,整体身体线条匀称且显得灵活。从头部比例、毛色分布和神态来看,它呈现出成年犬的外观特征。
判断结果:这只狗更可能是边境牧羊犬。
理由:黑白双色被毛、额头到鼻梁的白色纹路、半立耳和专注的表情都是边境牧羊犬常见的视觉特征,这些线索共同支持该判断。
Sphynx Cat
Image: cat_sphynx_01647.png
Example output
这只猫几乎没有明显被毛,皮肤有清晰褶皱,耳朵大而直立,头部轮廓偏楔形,眼睛呈杏仁状,整体体态较成熟,正在柔软的毯子上休息。
判断结果:这只猫更可能是斯芬克斯猫。
理由:近乎无毛的皮肤、大耳、楔形头部和明显皮肤褶皱都是斯芬克斯猫非常突出的品种特征,因此该判断更符合图中可见外观。
Notes On Warnings
You may see warnings about the MiniCPM-V fast path not being available, AutoAWQ deprecation notices, or processor keyword messages. These warnings generally do not block image inference.
For detailed breed recognition, downsample_mode="4x" and max_slice_nums=36 preserve more visual detail. For faster but less detailed inference, use downsample_mode="16x".
Limitations
- The model is focused on cat and dog images. Other animals, mixed scenes, toys, drawings, or non-pet images may produce unreliable outputs.
- Breed identification from a single image is inherently uncertain, especially for mixed-breed pets and visually similar breeds.
- Quantization may introduce small wording or judgement differences compared with the source checkpoint.
- The model may overstate confidence when the image lacks clear breed-specific features.
- Lighting, occlusion, grooming style, age, camera angle, and partial body visibility can reduce reliability.
- The model primarily follows a Chinese response format. English output may require a separate prompt and has not been the main tuning target.
- The model should not be used for veterinary diagnosis, legal breed certification, insurance decisions, shelter intake decisions, or safety-critical judgement.
License
This model follows the license terms of the source model and the released model metadata. Please also review the license and usage terms of joyfox/JoyFox-PawScope-VL and openbmb/MiniCPM-V-4_6 before redistribution or commercial use.
Acknowledgements
JoyFox-PawScope-VL-AWQ is based on JoyFox-PawScope-VL and OpenBMB's MiniCPM-V-4.6 multimodal model.
- Downloads last month
- 30


