---
license: apache-2.0
base_model: HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
base_model_relation: quantized
library_name: transformers
pipeline_tag: image-text-to-text
model_type: qwen3_5_moe
tags:
- qwen3.6
- qwen3_5_moe
- nvfp4
- compressed-tensors
- quantized
- vllm
- blackwell
- rtx-5090
- sm120
- moe
- multimodal
- agentic
- tool-calling
- coding
- uncensored
- conversational
---

# Qwen3.6 35B A3B HauhauCS Uncensored NVFP4

Uncensored Qwen3.6 35B A3B MoE quantized to NVFP4 `compressed-tensors` for vLLM on NVIDIA Blackwell / RTX 5090.

- **35B total / 3B active MoE**
- **HauhauCS Aggressive uncensored source**
- **Conservative NVFP4 profile**: linear attention and MTP kept in bf16 for quality
- **NVFP4 W4A4 compressed-tensors**
- **~22 GB**
- **Runs on one RTX 5090**
- **100K-131K text context target**
- **vLLM native loading**

The model files are placed at the repository root so Hugging Face shows the weights in the right-side download panel and `vllm serve` can load the repo directly. The repo intentionally keeps a single root weight set to avoid full-repo snapshot downloads pulling multiple profile variants.

## Download

```bash
hf download lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 \
  --local-dir ./qwen36-35b-a3b-hauhaucs-nvfp4
```

## vLLM quickstart

```bash
VLLM_NVFP4_GEMM_BACKEND=marlin \
vllm serve lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 \
  --served-model-name qwen36-35b-a3b-hauhaucs-nvfp4 \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --max-model-len 131072 \
  --max-num-seqs 1 \
  --max-num-batched-tokens 4096 \
  --gpu-memory-utilization 0.90 \
  --enable-prefix-caching \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3 \
  --trust-remote-code
```

Local path quickstart:

```bash
hf download lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 \
  --local-dir ./qwen36-35b-a3b-hauhaucs-nvfp4

VLLM_NVFP4_GEMM_BACKEND=marlin \
vllm serve ./qwen36-35b-a3b-hauhaucs-nvfp4 \
  --served-model-name qwen36-35b-a3b-hauhaucs-nvfp4 \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --max-model-len 131072 \
  --max-num-seqs 1 \
  --max-num-batched-tokens 4096 \
  --gpu-memory-utilization 0.90 \
  --enable-prefix-caching \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3 \
  --trust-remote-code
```

## Quantization recipe

```python
recipe = QuantizationModifier(
    targets="Linear", scheme="NVFP4",
    ignore=["lm_head", "re:.*visual.*", "re:.*mlp.gate$",
            "re:.*mlp.shared_expert_gate$", "re:.*linear_attn.*", "re:^mtp.*"],
)
oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    max_seq_length=1024,
    num_calibration_samples=128,
    moe_calibrate_all_experts=True,
    pipeline="basic",
)
```

- Calibration: `HuggingFaceH4/ultrachat_200k`, 128 samples x 1024 tokens
- MTP tensors copied from [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
- Converted using [li-yifei/gguf-to-nvfp4](https://github.com/li-yifei/gguf-to-nvfp4)

Pipeline:

```text
Q8_K_P GGUF -> step1_convert_qwen36_moe.py -> HF bf16 -> step2_quantize_qwen36_moe.py -> NVFP4
```

## Source models

- Uncensored source: [HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive](https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive)
- Original base: [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

## Acknowledgments

- [HauhauCS](https://huggingface.co/HauhauCS) for the uncensored GGUF source
- [Qwen](https://huggingface.co/Qwen) for the base model and MTP weights
- [AEON-7](https://huggingface.co/AEON-7) and [RedHatAI](https://huggingface.co/RedHatAI) for conservative quantization approach reference