--- license: apache-2.0 base_model: Qwen/Qwen3.6-35B-A3B tags: - qwen3.5 - moe - nvfp4 - w4a4 - compressed-tensors - vllm pipeline_tag: text-generation --- # Qwen3.6-35B-A3B-PRISM-NVFP4 NVFP4 (W4A4) quantization of a **PRISM**-tuned Qwen3.6-35B-A3B. ~24 GB on disk, multimodal + MTP draft head preserved. Designed for NVIDIA Blackwell (SM120/SM121). PRISM softens over-refusal behaviour and removes bias / propaganda patterns while maintaining and enhancing task performance, coherence, and multimodal capability. ## Model details - **Base:** Qwen/Qwen3.6-35B-A3B (35B total, ~3B active per token, 256 routed experts) - **PRISM:** refusal-softening, bias + propaganda removal - **Format:** compressed-tensors NVFP4 (FP4 E2M1 weights + activations, UE4M3 per-block-16 scales) - **Kept BF16:** vision encoder, `lm_head`, router gates, embeddings, linear-attention SSM state - **Runtime targets:** vLLM (`--quantization compressed-tensors`), Blackwell tensor cores ## Files | File | Purpose | |---|---| | `model.safetensors` | language-model + vision encoder weights | | `model_mtp.safetensors` | MTP draft head (optional, for speculative decode) | | `model.safetensors.index.json` | weight map | | `config.json`, `generation_config.json` | model + generation config | | `tokenizer*`, `processor_config.json`, `chat_template.jinja` | tokenizer + chat template | ## Serving (vLLM) ```bash vllm serve Ex0bit/Qwen3.6-35B-A3B-PRISM-NVFP4 \ --quantization compressed-tensors \ --dtype auto \ --max-model-len 32768 \ --trust-remote-code ``` Requires vLLM with Blackwell NVFP4 kernels. On SM121 (DGX Spark), use a vLLM build with SM121-aware patches — stock PyPI wheels will fault on the missing `cvt.rn.satfinite.e2m1x2.f32` PTX instruction. Known-working community Docker images (Apache 2.0, tested on GB10): - `ghcr.io/aeon-7/vllm-spark-omni-q36` — vLLM HEAD + GB10 patches + flashinfer sm_120 kernels; also supports DFlash speculative decoding. - `avarok/dgx-vllm-nvfp4-kernel` — generic NVFP4 MoE image with software-E2M1 conversion and Marlin-MoE default. ## License Apache 2.0, inherited from the base model.
### ☕ Support Our Work If you enjoy our work and find it useful, please consider sponsoring or supporting us! [![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ex0bit)