How to use from
Docker Model Runner
docker model run hf.co/YTan2000/Huihui-Qwen35-A35B-Ablit-TQ3_4S:F16
Quick Links

model card

Huihui-Qwen35-A35B-Ablit-TQ3_4S

GGUF TurboQuant conversion of huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated.

This is a 35B-A3B Qwen3.5 MoE abliterated model quantized for the public TurboQuant llama.cpp fork with TQ3_4S weights. The token embedding and output tensors are kept at Q6_K for compatibility and quality on this 35B MoE architecture.

Files

File Description
Huihui-Qwen35-A35B-Ablit-TQ3_4S.gguf Main GGUF model, mostly TQ3_4S, 4.09 BPW
mmproj-Qwen35-A35B-f16.gguf Compatible Qwen3.5-35B-A35B multimodal projector
chat_template.jinja Qwen3.5 chat template
chat_template-vl-think.jinja Vision-language thinking template from the upstream Huihui repo
config.json Source model config metadata
generation_config.json Source generation config
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt Tokenizer files
preprocessor_config.json, video_preprocessor_config.json Vision/video preprocessor metadata
model-card.png Repository thumbnail/card image

Quantization

  • Source: huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
  • Converter: llama.cpp HF-to-GGUF converter
  • Quantizer: llama.cpp TQ3 build ba3357e93
  • Main tensor type: TQ3_4S
  • Embedding/output tensor type: Q6_K
  • Output size: 16.52 GiB
  • Reported quant size: 16912.31 MiB, 4.09 BPW

Quantization command:

./build/bin/llama-quantize \
  --token-embedding-type q6_K --output-tensor-type q6_K \
  Huihui-Qwen3.5-35B-A35B-abliterated-BF16.gguf \
  Huihui-Qwen35-A35B-Ablit-TQ3_4S.gguf \
  TQ3_4S 16

llama.cpp Usage

Runtime Requirement

This model requires the public TurboQuant runtime fork:

It will not load correctly on stock llama.cpp or other runtimes that do not include TQ3_4S.

Text-only:

./build/bin/llama-server \
  -m /path/to/Huihui-Qwen35-A35B-Ablit-TQ3_4S.gguf \
  -ngl 30 -c 2048 -np 1 \
  --jinja \
  --chat-template-file /path/to/chat_template.jinja

With the multimodal projector:

./build/bin/llama-server \
  -m /path/to/Huihui-Qwen35-A35B-Ablit-TQ3_4S.gguf \
  --mmproj /path/to/mmproj-Qwen35-A35B-f16.gguf \
  -ngl 30 -c 2048 -np 1 \
  --jinja \
  --chat-template-file /path/to/chat_template-vl-think.jinja

For text-only chat, use the embedded template or pass --chat-template-file chat_template.jinja. For vision-language use, keep mmproj-Qwen35-A35B-f16.gguf, chat_template-vl-think.jinja, preprocessor_config.json, and video_preprocessor_config.json in the repository.

On an RTX 5060 Ti 16GB, full offload did not fit. The highest tested offload for this GGUF was -ngl 30; -ngl 40 and -ngl 99 failed to load due VRAM.

Validation

Smoke tests were run with the TurboQuant llama.cpp fork listed above.

Runtime:

  • GPU: RTX 5060 Ti 16GB
  • Context: -c 2048
  • Offload: -ngl 30
  • Chat settings: temperature=0, max_tokens=256, chat_template_kwargs.enable_thinking=false

Load probe:

Offload Result pp128
ngl=0 loads 127.93 tok/s
ngl=20 loads 210.56 tok/s
ngl=30 loads 298.77 tok/s
ngl=40 fails on 16GB VRAM n/a
ngl=99 fails on 16GB VRAM n/a

10-question chat smoke:

Model Manual pass Notes
Huihui-Qwen35-A35B-Ablit-TQ3_4S 9/10 Same score as normal Qwen3.5-35B-A35B Q4_K_M on this smoke set
Normal Qwen3.5-35B-A35B Q4_K_M 9/10 Used as local quality baseline

Both models missed the same Python list prompt in the lightweight smoke suite. Treat this as a smoke-pass, not a full release quality benchmark.

Safety Notice

This is an abliterated/uncensored model. It may produce sensitive, controversial, unsafe, or otherwise inappropriate outputs. Use in controlled environments, review outputs carefully, and follow applicable laws and policies.

Acknowledgements

Downloads last month
151
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YTan2000/Huihui-Qwen35-A35B-Ablit-TQ3_4S

Collection including YTan2000/Huihui-Qwen35-A35B-Ablit-TQ3_4S