Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-int4-AutoRound

Int4 AutoRound quantized version of TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2.

  • Base model: TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2
  • Quantization: INT4 symmetric, group_size=128 (W4A16)
  • Algorithm: AutoRound
  • Format: AutoRound (compatible with vLLM, SGLang, compressed-tensors)

Usage

Serve with vLLM

vllm serve CoreWorxLab/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-int4-AutoRound \
    --tensor-parallel-size 1 \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.95 \
    --reasoning-parser qwen3

With speculative decoding (Qwen3 MTP)

vllm serve CoreWorxLab/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-int4-AutoRound \
    --tensor-parallel-size 1 \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.95 \
    --reasoning-parser qwen3 \
    --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

Load in Python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "CoreWorxLab/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-int4-AutoRound"
model = AutoModelForCausalLM.from_pretrained(model_name, dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

Quantization Details

Parameter Value
Bits 4
Group size 128
Symmetric Yes
Calibration NeelNanda/pile-10k
Seq length 2048

License

Please follow the license of the original model (TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2).

Downloads last month
1,049
Safetensors
Model size
6B params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for CoreWorxLab/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-int4-AutoRound

Base model

Qwen/Qwen3.6-27B
Quantized
(10)
this model