Instructions to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC") model = AutoModelForMultimodalLM.from_pretrained("shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC
- SGLang
How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC with Docker Model Runner:
docker model run hf.co/shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC
Use Docker
docker model run hf.co/shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-ECQwen3.5-122B-A10B-int4-AutoRound-EC
Extended Calibration (EC) INT4 AutoRound quantization of Qwen/Qwen3.5-122B-A10B, a 122B MoE (10B active) multimodal model. Drop-in replacement for Intel/Qwen3.5-122B-A10B-int4-AutoRound with wider calibration settings for improved quality on long-context and reasoning-heavy workloads.
Calibration — Extended vs Intel default
| Intel (v0.12.0) | EC (this model) | |
|---|---|---|
iters |
200 | 400 |
nsamples |
128 | 256 |
seqlen |
512 | 4096 |
batch_size |
1 | 8 (default) |
grad_accum |
8 | 1 (default) |
ignore_layers |
shared_expert | shared_expert |
bits / group |
4 / 128 | 4 / 128 |
Effective calibration batch size is 8 in both cases (1×8 vs 8×1) — mathematically equivalent signal per optimizer step, just different memory/latency profile during quantization.
Environment
| Component | Version |
|---|---|
| auto-round | 0.12.2 |
| transformers | 5.5.3 |
| torch | 2.11.0 |
| safetensors | 0.7.0 |
| huggingface_hub | 1.10.1 |
| Hardware | RunPod H200 SXM (1x) |
| Wall time | ~15 hrs |
Files
| Path | What |
|---|---|
model-000{01..13}-of-00013.* |
Quantized language-model shards (INT4 GPTQ, w4g128) |
model_visual.safetensors |
Visual encoder (BF16, base-model passthrough) |
model_extra_tensors.safetensors |
MTP (multi-token prediction) weights (BF16 passthrough) |
config.json |
Multimodal config with embedded quantization_config |
chat_template.jinja |
Qwen3 chat template (reasoning-enabled) |
Layout note — building a custom FP8 hybrid
If you build an FP8-dense hybrid on top of this checkpoint (e.g. via
albond's build-hybrid-checkpoint.py),
the hybrid builder will report more unmatched FP8 tensors than on Intel's
checkpoint (≈ 741 vs 408) because our visual encoder lives in a separate
model_visual.safetensors file rather than inline in the main shards. The
builder only scans main shards, so the FP8 visual tensors go unmatched and
visual stays at BF16 in the resulting hybrid. This is functionally harmless —
visual adds only ~0.9 GB at BF16 vs ~0.45 GB at FP8, no quality difference,
no impact on text throughput. Pass --force to the hybrid builder to proceed.
If you only serve text (vast majority of use cases) or run vLLM against this checkpoint directly without building a hybrid, this note does not apply.
License
Apache 2.0 (inherits from Qwen/Qwen3.5-122B-A10B).
Shoutouts
- Qwen team for the base Qwen3.5-122B-A10B model.
- Intel for the reference AutoRound INT4 recipe and post-quant checkpoint layout we built on.
- auto-round for the quantization tooling.
- albond for the DGX Spark SM121 vLLM patches and hybrid INT4+FP8 serving stack.
- Downloads last month
- 43,459
Model tree for shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC
Base model
Qwen/Qwen3.5-122B-A10B
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shieldstar/Qwen3.5-122B-A10B-int4-AutoRound-EC", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'