Instructions to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC")
model = AutoModelForMultimodalLM.from_pretrained("shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC

SGLang

How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with Docker Model Runner:
```
docker model run hf.co/shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC
```

Qwen3.5-122B-A10B-int4-AutoRound-EC

Extended Calibration (EC) INT4 AutoRound quantization of Qwen/Qwen3.5-122B-A10B, a 122B MoE (10B active) multimodal model. Drop-in replacement for Intel/Qwen3.5-122B-A10B-int4-AutoRound with wider calibration settings for improved quality on long-context and reasoning-heavy workloads.

Calibration — Extended vs Intel default

	Intel (v0.12.0)	EC (this model)
`iters`	200	400
`nsamples`	128	256
`seqlen`	512	4096
`batch_size`	1	8 (default)
`grad_accum`	8	1 (default)
`ignore_layers`	shared_expert	shared_expert
`bits / group`	4 / 128	4 / 128

Effective calibration batch size is 8 in both cases (1×8 vs 8×1) — mathematically equivalent signal per optimizer step, just different memory/latency profile during quantization.

Environment

Component	Version
auto-round	0.12.2
transformers	5.5.3
torch	2.11.0
safetensors	0.7.0
huggingface_hub	1.10.1
Hardware	RunPod H200 SXM (1x)
Wall time	~15 hrs

Files

Path	What
`model-000{01..13}-of-00013.*`	Quantized language-model shards (INT4 GPTQ, w4g128)
`model_visual.safetensors`	Visual encoder (BF16, base-model passthrough)
`model_extra_tensors.safetensors`	MTP (multi-token prediction) weights (BF16 passthrough)
`config.json`	Multimodal config with embedded `quantization_config`
`chat_template.jinja`	Qwen3 chat template (reasoning-enabled)

Layout note — building a custom FP8 hybrid

If you build an FP8-dense hybrid on top of this checkpoint (e.g. via albond's build-hybrid-checkpoint.py), the hybrid builder will report more unmatched FP8 tensors than on Intel's checkpoint (≈ 741 vs 408) because our visual encoder lives in a separate model_visual.safetensors file rather than inline in the main shards. The builder only scans main shards, so the FP8 visual tensors go unmatched and visual stays at BF16 in the resulting hybrid. This is functionally harmless — visual adds only ~0.9 GB at BF16 vs ~0.45 GB at FP8, no quality difference, no impact on text throughput. Pass --force to the hybrid builder to proceed.

If you only serve text (vast majority of use cases) or run vLLM against this checkpoint directly without building a hybrid, this note does not apply.

License

Apache 2.0 (inherits from Qwen/Qwen3.5-122B-A10B).

Shoutouts

Qwen team for the base Qwen3.5-122B-A10B model.
Intel for the reference AutoRound INT4 recipe and post-quant checkpoint layout we built on.
auto-round for the quantization tooling.
albond for the DGX Spark SM121 vLLM patches and hybrid INT4+FP8 serving stack.

Downloads last month: 43,459

Safetensors

Model size

21B params

Tensor type

I32

BF16

F16

Model tree for shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(121)

this model