---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
tags:
  - qwen3.5
  - moe
  - nvfp4
  - w4a4
  - compressed-tensors
  - vllm
pipeline_tag: text-generation
---

# Qwen3.6-35B-A3B-PRISM-NVFP4

NVFP4 (W4A4) quantization of a **PRISM**-tuned Qwen3.6-35B-A3B. ~24 GB on disk,
multimodal + MTP draft head preserved. Designed for NVIDIA Blackwell (SM120/SM121).

PRISM softens over-refusal behaviour and removes bias / propaganda
patterns while maintaining and enhancing task performance, coherence,
and multimodal capability.

## Model details

- **Base:** Qwen/Qwen3.6-35B-A3B (35B total, ~3B active per token, 256 routed experts)
- **PRISM:** refusal-softening, bias + propaganda removal
- **Format:** compressed-tensors NVFP4 (FP4 E2M1 weights + activations, UE4M3 per-block-16 scales)
- **Kept BF16:** vision encoder, `lm_head`, router gates, embeddings, linear-attention SSM state
- **Runtime targets:** vLLM (`--quantization compressed-tensors`), Blackwell tensor cores

## Files

| File | Purpose |
|---|---|
| `model.safetensors` | language-model + vision encoder weights |
| `model_mtp.safetensors` | MTP draft head (optional, for speculative decode) |
| `model.safetensors.index.json` | weight map |
| `config.json`, `generation_config.json` | model + generation config |
| `tokenizer*`, `processor_config.json`, `chat_template.jinja` | tokenizer + chat template |

## Serving (vLLM)

```bash
vllm serve Ex0bit/Qwen3.6-35B-A3B-PRISM-NVFP4 \
  --quantization compressed-tensors \
  --dtype auto \
  --max-model-len 32768 \
  --trust-remote-code
```

Requires vLLM with Blackwell NVFP4 kernels. On SM121 (DGX Spark), use a
vLLM build with SM121-aware patches — stock PyPI wheels will fault on the
missing `cvt.rn.satfinite.e2m1x2.f32` PTX instruction.

Known-working community Docker images (Apache 2.0, tested on GB10):
- `ghcr.io/aeon-7/vllm-spark-omni-q36` — vLLM HEAD + GB10 patches + flashinfer
  sm_120 kernels; also supports DFlash speculative decoding.
- `avarok/dgx-vllm-nvfp4-kernel` — generic NVFP4 MoE image with software-E2M1
  conversion and Marlin-MoE default.

## License

Apache 2.0, inherited from the base model.

<div align="center">

### ☕ Support Our Work

If you enjoy our work and find it useful, please consider sponsoring or supporting us!

[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ex0bit)

</div>