InternVL3.5-8B — W4A16 INT4 (BF16 Vision)
RTN W4A16 INT4 quantization of OpenGVLab/InternVL3_5-8B targeting Ampere GPUs (RTX 3000/4000 series, A100, etc.).
- Language model: INT4 pack-quantized (
weight_packedint32 + BF16 group scales, group_size=128) - Vision tower (
vision_model.*): BF16 — untouched - Vision projector (
mlp1.*): BF16 — untouched - Format:
compressed-tensorspack-quantized— loaded natively by vllm
Verified results
| Test | Result |
|---|---|
| sqrt(144) + closest planet to Sun | ✓ 12, Mercury |
| Golden Gate Bridge image (Wikimedia) | ✓ Correct ID, no hallucination |
Usage — vllm (recommended)
vllm serve useful-quants/InternVL3-5-8B-W4A16-INT4 \
--dtype bfloat16 \
--max-model-len 8192 \
--max-num-seqs 16 \
--gpu-memory-utilization 0.90
Usage — transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"useful-quants/InternVL3-5-8B-W4A16-INT4",
dtype=torch.bfloat16,
device_map="cuda:0",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"useful-quants/InternVL3-5-8B-W4A16-INT4",
trust_remote_code=True,
)
# Text-only
response = model.chat(tokenizer, pixel_values=None,
question="Hello!", generation_config=dict(max_new_tokens=256))
# With image
from PIL import Image
import torchvision.transforms as T
from torchvision.transforms.functional import InterpolationMode
MEAN, STD = (0.485, 0.456, 0.406), (0.229, 0.224, 0.225)
tf = T.Compose([
T.Resize((448, 448), interpolation=InterpolationMode.BICUBIC),
T.ToTensor(), T.Normalize(MEAN, STD),
])
pixel_values = tf(Image.open("image.jpg").convert("RGB")).unsqueeze(0)
pixel_values = pixel_values.to(model.device, dtype=torch.bfloat16)
response = model.chat(tokenizer, pixel_values=pixel_values,
question="<image>\nDescribe this image.",
generation_config=dict(max_new_tokens=256))
Notes
- Quantization tool: llmcompressor with
QuantizationModifier(scheme="W4A16") trust_remote_code=Truerequired for both loading and inference- The tokenizer regex warning (
fix_mistral_regex) is cosmetic and does not affect output quality
- Downloads last month
- 3
Model tree for useful-quants/InternVL3-5-8B-W4A16-INT4
Base model
OpenGVLab/InternVL3_5-8B-Pretrained Finetuned
OpenGVLab/InternVL3_5-8B-Instruct Finetuned
OpenGVLab/InternVL3_5-8B-MPO Finetuned
OpenGVLab/InternVL3_5-8B