Qwen3.6-27B TQ (Turbo Quant GGUF)

Full suite of WHT-rotated turbo quant GGUFs for Qwen/Qwen3.6-27B,
quantized with imatrix calibration on Python coding data. Day-of-release quants.

About Turbo Quants

Turbo quants use Walsh-Hadamard Transform rotation before quantization, significantly reducing quantization error versus standard GGUF quants at the same bit width. TQ3_1S and TQ4_1S consistently outperform their Q4/Q5 equivalents. TQ1_0 and TQ2_0 use
ternarization and outperform standard IQ1/IQ2 formats.

Quant Details

File Format Type Size bpw
Qwen3.6-27B-TQ1_0.gguf TQ1_0 Ternarization ~5.7GB 1.69
Qwen3.6-27B-TQ2_0.gguf TQ2_0 Ternarization ~7GB 2.06
Qwen3.6-27B-TQ3_1S.gguf TQ3_1S WHT-rotated ~13.5GB 4.00
Qwen3.6-27B-TQ4_1S.gguf TQ4_1S WHT-rotated ~17GB 5.00

Quantized using a custom llama.cpp fork with
optimized ROCm/HIP TQ kernel support. Imatrix calibrated on ~1500 Python coding examples sampled from:

  • ajibawa-2023/Python-Code-23k-ShareGPT
  • iamtarun/python_code_instructions_18k_alpaca
  • flytech/python-codes-25k

Usage

llama-server \
  --model Qwen3.6-27B-TQ4_1S.gguf \
  -ngl 999 \
  --ctx-size 32768 \
  --cache-type-k q8_0 \                                                                                                                    
  --cache-type-v q8_0 \
  --flash-attn \                                                                                                                           
  --no-mmap     

Speculative Decoding                                                                                                                       
 
Qwen3.6-27B shares tokenizer (n_vocab=248320) and architecture family (qwen35) with                                                        
Qwen3.5 models, making them compatible for speculative decoding. Pair with a Qwen3.5
draft model for accelerated inference:                                                                                                     
                
llama-server \
  --model Qwen3.6-27B-TQ4_1S.gguf \
  --model-draft Qwen3.5-9B-TQ3_1S.gguf \                                                                                                   
  -ngl 999 -ngld 999 \
  --parallel 1 \                                                                                                                           
  --draft-max 12 --draft-p-min 0.75 \
  --flash-attn --no-mmap                                                                                                                   
                
Hardware Tested                                                                                                                            
 
- AMD Radeon AI PRO R9700 (32GB, gfx1201 RDNA4) via ROCm      
Downloads last month
511
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mad-lab-ai/Qwen3.6-27B-tq-gguf

Base model

Qwen/Qwen3.6-27B
Quantized
(476)
this model