Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS

Instructions to use connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS")
model = AutoModelForMultimodalLM.from_pretrained("connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS

SGLang

How to use connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS with Docker Model Runner:
```
docker model run hf.co/connorhzp/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS
```

connorhzp

AEON-7 commited on May 15

Commit

272f4b0

0 Parent(s):

Duplicate from AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS

Browse files

Files changed (11) hide show

.gitattributes +36 -0
README.md +244 -0
chat_template.jinja +154 -0
config.json +253 -0
generation_config.json +13 -0
hf_quant_config.json +63 -0
model.safetensors +3 -0
preprocessor_config.json +21 -0
tokenizer.json +3 -0
tokenizer_config.json +36 -0
video_preprocessor_config.json +21 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,244 @@

+---
+license: apache-2.0
+base_model: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16
+language:
+- en
+- zh
+- multilingual
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- abliterated
+- uncensored
+- qwen3
+- qwen3.6
+- nvfp4
+- modelopt
+- mtp
+- multi-token-prediction
+- speculative-decoding
+- hybrid-attention
+- mamba
+- gated-deltanet
+- multimodal
+- aeon
+- rtx-5090
+- rtx-pro-6000
+- b100
+- b200
+- dedicated-vram-blackwell
+- sm_120
+- sm_100
+- 32gb
+- conv1d-preserved
+---
+# Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS
+> **Deployment, operations & benchmarks → [github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash)**
+>
+> The GitHub repo is the source of truth for the production deployment guide, hardware-tuned docker-compose configs, full configuration reference, measured benchmarks, and `AGENTS.md` — an operator's manual that pre-empts common stale-documentation traps.
+> ## 🏆 DGX Spark performance — current production *(v3 image, 2026-04-29)*
+>
+> Served with **DFlash spec decode** *(not the MTP head)* on this XS body, the v3 image (`ghcr.io/aeon-7/vllm-aeon-ultimate-dflash:qwen36-v3`) clocks **38.5 tok/s median, 71.3 tok/s peak** thinking-on / **38.1 / 68.4** thinking-off — a **+18 % median / +26 % peak** lift over the prior v2.1 image and a **+17 % / +21 %** stacked lift vs the original `-NVFP4` (compressed-tensors) production. Median TTFT is **247 ms** (was 325 ms — −24 %). See the [GitHub Performance section](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash#performance) for the four-config comparison table.
+> **🙏 Reference recipe credit:** The conv1d-preserved NVFP4 + MTP graft pipeline used to build this XS variant is based on [**sakamakismile**](https://huggingface.co/sakamakismile)'s validated [Qwen3.6-27B-NVFP4-MTP series](https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP) (22K+ downloads). They worked out the modelopt config — including the strategic decision to quantize the GDN projection matmuls to NVFP4 while preserving `linear_attn.conv1d` at BF16 — and the MTP-head graft technique. We adapted the recipe to AEON-Ultimate's abliterated weights and ship both the conv1d-preserved-only XS variant (matching their footprint) and a heavier regular-MTP variant that additionally keeps the projections at BF16. Full credit for the underlying recipe → sakamakismile.
+## What "XS" means — and what it's *not*
+This is the **extra-small footprint** sibling of [`-Multimodal-NVFP4-MTP`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP). XS is **not "everything to FP4."** It is a deliberate, principled split: the heavy GDN matmul projections drop to NVFP4 (where they're bandwidth-bound and FP4 wins big), while the SSM-critical `linear_attn.conv1d` kernel **stays BF16** (where FP4 has documented stability problems on long-context recurrence).
+| | **Multimodal-NVFP4-MTP** (regular) | **Multimodal-NVFP4-MTP-XS** *(this repo)* |
+|---|---|---|
+| `linear_attn` projections (`in_proj_qkv`, `in_proj_z`, `in_proj_a/b`, `out_proj`) | preserved BF16 (~11 GB) | quantized to NVFP4 (~3 GB) |
+| **`linear_attn.conv1d`** *(SSM 1D convolution — recurrence-critical)* | **preserved BF16** | **preserved BF16** ✅ |
+| `linear_attn` SSM state vectors (`A_log`, `dt_bias`, `norm.weight`) | preserved BF16 | preserved BF16 ✅ |
+| `mtp.*` head *(grafted bf16 from base, bit-exact verified)* | yes | yes |
+| Vision tower | preserved BF16 | preserved BF16 |
+| **Total disk** | **~27 GB** | **~21 GB** |
+| **VRAM footprint at runtime** | ~28 GB | ~22 GB |
+**This is a smart, strategic quantization — not a precision compromise.** The conv1d preservation matters: the GatedDeltaNet recurrence depends on the 1D convolution behaving numerically like its training distribution, and FP4 quantization of `conv1d` has been observed to cause drift on long-context inference in community testing. By keeping conv1d BF16 while quantizing the projections (which are bandwidth-limited matmuls where FP4 is a clean win), we get the ~6 GB footprint reduction without sacrificing the part of the model that's actually fragile under quantization. This is the same principle modelopt's `NVFP4_DEFAULT_CFG` applies by default and the same recipe sakamakismile validated across his Qwen3.6-NVFP4-MTP series (22K+ downloads).
+**When to pick which:**
+- **Pick the regular variant** if you have ≥48 GB VRAM. Even the *projection* weights at BF16 give a small additional safety margin on long-context recurrence stability.
+- **Pick this XS variant** if you have **24–32 GB VRAM** (RTX 5090, single GPUs without headroom for full BF16 GDN). The conv1d preservation guarantees the SSM recurrence stays numerically stable; the ~6 GB savings buy meaningful KV-cache headroom on tight GPUs.
+We ship both because we have the headroom on RTX PRO 6000 / B100/B200 to run the larger, more numerically-conservative version, and several users on tighter cards have asked for the smaller one. **Neither variant** quantizes `linear_attn.conv1d` — that would be a different (and not-recommended) variant we have explicitly chosen not to ship.
+## Variants
+| Format | Size | Use case |
+|---|---|---|
+| [BF16](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16) | 51 GB | Full-precision reference weights |
+| [NVFP4 (compressed-tensors + DFlash)](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4) | 26 GB | DGX Spark — DFlash spec decode, validated |
+| [Multimodal-NVFP4-MTP](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP) | 27 GB | RTX PRO 6000 / B100/B200 — MTP, GDN preserved BF16 |
+| [Text-NVFP4-MTP](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Text-NVFP4-MTP) | 26 GB | Same as above without vision tower |
+| **Multimodal-NVFP4-MTP-XS** *(this repo)* | **21 GB** | RTX 5090 / smaller dedicated VRAM — MTP, full FP4 incl. GDN projections |
+| [Text-NVFP4-MTP-XS](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Text-NVFP4-MTP-XS) | 20 GB | Same as this repo without vision tower |
+## What this is
+The **modelopt-format NVFP4 + MTP variant, multimodal-preserved, with `linear_attn` projections fully quantized**, of [AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16) — the lossless abliteration of Qwen 3.6 27B (KL 0.000492 vs base, 0/100 refusals, multimodal preserved, hybrid GDN-aware quantization).
+Specifically:
+- **Body quantized to NVFP4** via `nvidia-modelopt` 0.43.0 with `NVFP4_DEFAULT_CFG`. modelopt format, served by vLLM through `--quantization modelopt`.
+- **Linear-attn / GatedDeltaNet projections quantized to NVFP4** (this is the XS difference). Only `linear_attn.conv1d` is kept BF16 (modelopt's default). The community has validated this approach on Qwen3.5/3.6-NVFP4 builds with 22K+ downloads on sakamakismile's reference recipes; we re-ran calibration on our abliterated weights and the model serves correctly.
+- **Vision tower preserved BF16** (333 keys). Multimodal inference fully functional.
+- **MTP head grafted from the base** `Qwen/Qwen3.6-27B` checkpoint (15 tensors, BF16, bit-exact verified). Powers `--speculative-config '{"method":"qwen3_5_mtp",...}'` for self-speculative decoding without a separate drafter.
+## Why MTP
+Multi-Token Prediction (MTP) lets the model predict multiple future tokens per forward pass via the trained `mtp.*` head, enabling **speculative decoding without a separate drafter model**. The acceptance rate is high because the drafter is the model itself — same architecture, same weights, same distribution.
+Indicative published numbers (sakamakismile's reference recipe on RTX 5090):
+- Single-stream short prompts at `n=3`: ~132 tok/s
+- Single-stream long-form: ~105 tok/s
+- 2-parallel aggregate (256K + KV FP8): ~189-207 tok/s
+- Mean acceptance length: ~3.0-4.0 (compared to DFlash chains of ~2.0-2.3)
+Validated benchmarks of the AEON-Ultimate XS variant land in the [GitHub repo](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash#performance) once measured.
+## 🎯 When to pick this variant — measured hardware routing
+The right speculative-decode method depends on **memory architecture**:
+| Hardware tier | Recommended variant | Why |
+|---|---|---|
+| **DGX Spark / GB10** *(sm_121a, unified memory)* | Either: **[`-NVFP4` (DFlash)](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4)** *(simpler, validated)* **or this XS body served with `--speculative-config '{"method":"dflash",...}'`** *(highest measured throughput — see note below)* | Spark prefers DFlash regardless of body. The XS body **with DFlash spec** lands at **37.6 tok/s median, 68.7 tok/s peak** on Spark — the highest measured config. The grafted MTP head in this repo is *unused* in that path. **Never use `--speculative-config '{"method":"qwen3_5_mtp",...}'` on Spark** — that lands at only 24.1 tok/s median. |
+| **RTX PRO 6000 Blackwell** *(96 GB dedicated VRAM)* | [Multimodal-NVFP4-MTP](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP) — GDN BF16 for best long-context fidelity, *or* **this XS variant** for ~10 % faster decode | XS measured 111.4 tok/s median vs regular's 101.5 on RTX PRO 6000. Both win against DFlash on dedicated VRAM. |
+| **B100 / B200** *(sm_100, dedicated FP4)* | [Multimodal-NVFP4-MTP](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP) (preferred — GDN BF16 fits) or this XS | Native FP4 + dedicated VRAM = MTP territory. Whichever fits cleanly. |
+| **RTX 5090** *(sm_120, 32 GB dedicated VRAM)* | **This XS variant** ✅ if you use vision; [Text-XS](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Text-NVFP4-MTP-XS) if text-only | XS variants fit comfortably in 32 GB; matches sakamakismile's reference footprint. |
+| **A100 / H100** *(no native FP4)* | [BF16](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16) | NVFP4 dequantizes to BF16 on Ampere/Hopper — no benefit. |
+Full bench numbers: [GitHub repo Performance section](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash#performance).
+| **A100 / H100** (no native FP4) | [BF16](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16) |
+## Usage
+### vLLM serve — dedicated-VRAM Blackwell (default: MTP via grafted head)
+```bash
+# One-time: pull this repo locally
+hf download AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS \
+  --local-dir ./aeon-ultimate-multimodal-nvfp4-mtp-xs
+# Serve
+export VLLM_NVFP4_GEMM_BACKEND=flashinfer-cutlass
+export VLLM_USE_FLASHINFER_MOE_FP4=0
+export VLLM_USE_FLASHINFER_SAMPLER=1
+vllm serve ./aeon-ultimate-multimodal-nvfp4-mtp-xs \
+  --quantization modelopt \
+  --trust-remote-code \
+  --max-model-len 262144 \
+  --max-num-seqs 32 \
+  --max-num-batched-tokens 32768 \
+  --gpu-memory-utilization 0.94 \
+  --enable-chunked-prefill \
+  --enable-prefix-caching \
+  --reasoning-parser qwen3 \
+  --tool-call-parser qwen3_coder \
+  --enable-auto-tool-choice \
+  --speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}'
+```
+`num_speculative_tokens=3` is the canonical setting for `qwen3_5_mtp`. Higher values diverge the drafter further from the target distribution and acceptance falls.
+### vLLM serve — DGX Spark (DFlash spec, *not* MTP — measured winning config)
+For DGX Spark, swap the spec method to DFlash. The XS body still benefits from FP4 silicon, but DFlash's k=15 chains are decisively better than MTP's n=3 on unified memory.
+```bash
+# Pull the DFlash drafter alongside this body
+hf download z-lab/Qwen3.6-27B-DFlash --local-dir ./qwen36-27b-dflash
+vllm serve ./aeon-ultimate-multimodal-nvfp4-mtp-xs \
+  --quantization modelopt \
+  --trust-remote-code \
+  --max-model-len 200000 \
+  --max-num-seqs 16 \
+  --max-num-batched-tokens 32768 \
+  --gpu-memory-utilization 0.85 \
+  --enable-chunked-prefill \
+  --enable-prefix-caching \
+  --reasoning-parser qwen3 \
+  --tool-call-parser qwen3_coder \
+  --enable-auto-tool-choice \
+  --attention-backend flash_attn \
+  --speculative-config '{"method":"dflash","model":"./qwen36-27b-dflash","num_speculative_tokens":15}'
+```
+Production-validated v3 image: [`ghcr.io/aeon-7/vllm-aeon-ultimate-dflash:qwen36-v3`](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash/pkgs/container/vllm-aeon-ultimate-dflash). Measured **38.1 tok/s median, 68.4 tok/s peak** thinking-off and **38.5 / 71.3** thinking-on — the highest single-stream config we've measured on Spark.
+### Configuration notes
+- **`--quantization modelopt`** is required for this body (not `compressed-tensors` — different format).
+- **`--speculative-config '{"method":"qwen3_5_mtp", ...}'`** uses the grafted MTP head; correct for **dedicated-VRAM Blackwell**. Don't use this on DGX Spark.
+- **`--speculative-config '{"method":"dflash", ...}'`** uses an external DFlash drafter; correct for **DGX Spark**. The grafted MTP head in this repo sits unused in this path (~0.85 GB dead weight). Don't use this on RTX PRO 6000 or B100/B200 — they prefer MTP.
+- **`--gpu-memory-utilization 0.94`** is the validated cap on RTX PRO 6000; `0.85` is the cap on DGX Spark (unified memory thrashes higher).
+## Quantization recipe
+- **Tool**: `nvidia-modelopt` 0.43.0 with `NVFP4_DEFAULT_CFG`
+- **Loader**: `Qwen3_5ForConditionalGeneration.from_pretrained` (multimodal-preserved class)
+- **Calibration**: `neuralmagic/calibration` LLM split, 20 samples × 8192 tokens
+- **Excluded from quantization (kept BF16)** — XS variant differences from the regular variant in **bold**:
+  - `lm_head`, `proj_out.*`, `*router*`, `*mlp.gate.*` (NVFP4_DEFAULT_CFG)
+  - **`*linear_attn.conv1d*`, `*mixer.conv1d*`** *(NVFP4_DEFAULT_CFG default — kept BF16 because FP4 quantization of the SSM 1D convolution causes drift on long-context recurrence; this is the recurrence-critical kernel of the GatedDeltaNet block. **Both regular and XS variants preserve this.**)*
+  - **`*linear_attn*` is NOT broadly excluded** (XS difference — the projection matmuls `in_proj_qkv`, `in_proj_z`, `in_proj_a/b`, `out_proj` get NVFP4-quantized; saves ~8 GB; FP4 is a clean win on bandwidth-bound matmuls)
+  - `*visual*` (vision tower preservation)
+  - `*mtp*` (MTP head preservation)
+  - `*output_layer*`, `output.*`
+- **MTP graft**: 15 tensors copied bf16 from `Qwen/Qwen3.6-27B` after modelopt export
+- **Pipeline**: lna-lab/GGUF-to-NVFP4-SM120 reference recipe, adapted for AEON-Ultimate-BF16 input + separate MTP source
+## Provenance & credits
+- **BF16 source**: [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16). See that card for the full abliteration pipeline.
+- **MTP graft technique**: [lna-lab/GGUF-to-NVFP4-SM120](https://github.com/lna-lab/GGUF-to-NVFP4-SM120) (`docs/MTP_GRAFT_RECIPE.md`)
+- **Reference benchmark recipes**: [`sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP`](https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP)
+- **Quantization**: NVIDIA TensorRT Model Optimizer (`nvidia-modelopt` 0.43.0)
+- **Base**: Alibaba Qwen team — `Qwen/Qwen3.6-27B`
+## License + responsibility
+Apache 2.0, inherited from `Qwen/Qwen3.6-27B`. **This is an uncensored model.** Read the full [User Responsibility & Arbitration Clause](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16#user-responsibility--arbitration-clause) on the BF16 source card before deploying. Summary: you implement downstream safety layers (input validation, output filtering, content moderation, audit logging, rate limiting, access controls, human-in-the-loop for high-risk workflows). The model has no opinions of its own — you supply the opinions, the judgment, and the ethics.
+---
+## ☕ Support the work
+If this release has been useful, tips are deeply appreciated — they go directly toward more compute, more models, and more open releases.
+<table align="center">
+  <tr>
+    <td align="center" width="50%">
+      <strong>₿ Bitcoin (BTC)</strong><br/>
+      <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/btc.png" alt="BTC QR" width="200"/><br/>
+      <sub><code>bc1q09xmzn00q4z3c5raene0f3pzn9d9pvawfm0py4</code></sub>
+    </td>
+    <td align="center" width="50%">
+      <strong>Ξ Ethereum (ETH)</strong><br/>
+      <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/eth.png" alt="ETH QR" width="200"/><br/>
+      <sub><code>0x1512667F6D61454ad531d2E45C0a5d1fd82D0500</code></sub>
+    </td>
+  </tr>
+  <tr>
+    <td align="center" width="50%">
+      <strong>◎ Solana (SOL)</strong><br/>
+      <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/sol.png" alt="SOL QR" width="200"/><br/>
+      <sub><code>DgQsjHdAnT5PNLQTNpJdpLS3tYGpVcsHQCkpoiAKsw8t</code></sub>
+    </td>
+    <td align="center" width="50%">
+      <strong>ⓜ Monero (XMR)</strong><br/>
+      <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/xmr.png" alt="XMR QR" width="200"/><br/>
+      <sub><code>836XrSKw4R76vNi3QPJ5Fa9ugcyvE2cWmKSPv3AhpTNNKvqP8v5ba9JRL4Vh7UnFNjDz3E2GXZDVVenu3rkZaNdUFhjAvgd</code></sub>
+    </td>
+  </tr>
+</table>
+> **Ethereum L2s (Base, Arbitrum, Optimism, Polygon, etc.) and EVM-compatible tokens** can be sent to the same Ethereum address.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | string if args_value is string else args_value | tojson | safe %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- else %}
+        {{- '<think>\n' }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,253 @@

+{
+  "architectures": [
+    "Qwen3_5ForConditionalGeneration"
+  ],
+  "dtype": "bfloat16",
+  "image_token_id": 248056,
+  "language_model_only": false,
+  "model_type": "qwen3_5",
+  "text_config": {
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "attn_output_gate": true,
+    "bos_token_id": 248044,
+    "dtype": "bfloat16",
+    "eos_token_id": 248044,
+    "full_attention_interval": 4,
+    "head_dim": 256,
+    "hidden_act": "silu",
+    "hidden_size": 5120,
+    "initializer_range": 0.02,
+    "intermediate_size": 17408,
+    "layer_types": [
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention"
+    ],
+    "linear_conv_kernel_dim": 4,
+    "linear_key_head_dim": 128,
+    "linear_num_key_heads": 16,
+    "linear_num_value_heads": 48,
+    "linear_value_head_dim": 128,
+    "mamba_ssm_dtype": "float32",
+    "max_position_embeddings": 262144,
+    "model_type": "qwen3_5_text",
+    "mtp_num_hidden_layers": 1,
+    "mtp_use_dedicated_embeddings": false,
+    "num_attention_heads": 24,
+    "num_hidden_layers": 64,
+    "num_key_value_heads": 4,
+    "output_gate_type": "swish",
+    "pad_token_id": null,
+    "partial_rotary_factor": 0.25,
+    "rms_norm_eps": 1e-06,
+    "rope_parameters": {
+      "mrope_interleaved": true,
+      "mrope_section": [
+        11,
+        11,
+        10
+      ],
+      "partial_rotary_factor": 0.25,
+      "rope_theta": 10000000,
+      "rope_type": "default"
+    },
+    "tie_word_embeddings": false,
+    "use_cache": true,
+    "vocab_size": 248320
+  },
+  "tie_word_embeddings": false,
+  "transformers_version": "5.5.3",
+  "video_token_id": 248057,
+  "vision_config": {
+    "deepstack_visual_indexes": [],
+    "depth": 27,
+    "dtype": "bfloat16",
+    "hidden_act": "gelu_pytorch_tanh",
+    "hidden_size": 1152,
+    "in_channels": 3,
+    "initializer_range": 0.02,
+    "intermediate_size": 4304,
+    "model_type": "qwen3_5",
+    "num_heads": 16,
+    "num_position_embeddings": 2304,
+    "out_hidden_size": 5120,
+    "patch_size": 16,
+    "spatial_merge_size": 2,
+    "temporal_patch_size": 2
+  },
+  "vision_end_token_id": 248054,
+  "vision_start_token_id": 248053,
+  "quantization_config": {
+    "config_groups": {
+      "group_0": {
+        "input_activations": {
+          "dynamic": false,
+          "num_bits": 4,
+          "type": "float",
+          "group_size": 16
+        },
+        "weights": {
+          "dynamic": false,
+          "num_bits": 4,
+          "type": "float",
+          "group_size": 16
+        },
+        "targets": [
+          "Linear"
+        ]
+      }
+    },
+    "ignore": [
+      "lm_head",
+      "model.language_model.layers.0.linear_attn.conv1d",
+      "model.language_model.layers.1.linear_attn.conv1d",
+      "model.language_model.layers.10.linear_attn.conv1d",
+      "model.language_model.layers.12.linear_attn.conv1d",
+      "model.language_model.layers.13.linear_attn.conv1d",
+      "model.language_model.layers.14.linear_attn.conv1d",
+      "model.language_model.layers.16.linear_attn.conv1d",
+      "model.language_model.layers.17.linear_attn.conv1d",
+      "model.language_model.layers.18.linear_attn.conv1d",
+      "model.language_model.layers.2.linear_attn.conv1d",
+      "model.language_model.layers.20.linear_attn.conv1d",
+      "model.language_model.layers.21.linear_attn.conv1d",
+      "model.language_model.layers.22.linear_attn.conv1d",
+      "model.language_model.layers.24.linear_attn.conv1d",
+      "model.language_model.layers.25.linear_attn.conv1d",
+      "model.language_model.layers.26.linear_attn.conv1d",
+      "model.language_model.layers.28.linear_attn.conv1d",
+      "model.language_model.layers.29.linear_attn.conv1d",
+      "model.language_model.layers.30.linear_attn.conv1d",
+      "model.language_model.layers.32.linear_attn.conv1d",
+      "model.language_model.layers.33.linear_attn.conv1d",
+      "model.language_model.layers.34.linear_attn.conv1d",
+      "model.language_model.layers.36.linear_attn.conv1d",
+      "model.language_model.layers.37.linear_attn.conv1d",
+      "model.language_model.layers.38.linear_attn.conv1d",
+      "model.language_model.layers.4.linear_attn.conv1d",
+      "model.language_model.layers.40.linear_attn.conv1d",
+      "model.language_model.layers.41.linear_attn.conv1d",
+      "model.language_model.layers.42.linear_attn.conv1d",
+      "model.language_model.layers.44.linear_attn.conv1d",
+      "model.language_model.layers.45.linear_attn.conv1d",
+      "model.language_model.layers.46.linear_attn.conv1d",
+      "model.language_model.layers.48.linear_attn.conv1d",
+      "model.language_model.layers.49.linear_attn.conv1d",
+      "model.language_model.layers.5.linear_attn.conv1d",
+      "model.language_model.layers.50.linear_attn.conv1d",
+      "model.language_model.layers.52.linear_attn.conv1d",
+      "model.language_model.layers.53.linear_attn.conv1d",
+      "model.language_model.layers.54.linear_attn.conv1d",
+      "model.language_model.layers.56.linear_attn.conv1d",
+      "model.language_model.layers.57.linear_attn.conv1d",
+      "model.language_model.layers.58.linear_attn.conv1d",
+      "model.language_model.layers.6.linear_attn.conv1d",
+      "model.language_model.layers.60.linear_attn.conv1d",
+      "model.language_model.layers.61.linear_attn.conv1d",
+      "model.language_model.layers.62.linear_attn.conv1d",
+      "model.language_model.layers.8.linear_attn.conv1d",
+      "model.language_model.layers.9.linear_attn.conv1d",
+      "model.visual*",
+      "mtp.fc",
+      "mtp.layers.0.input_layernorm",
+      "mtp.layers.0.mlp.down_proj",
+      "mtp.layers.0.mlp.gate_proj",
+      "mtp.layers.0.mlp.up_proj",
+      "mtp.layers.0.post_attention_layernorm",
+      "mtp.layers.0.self_attn.k_norm",
+      "mtp.layers.0.self_attn.k_proj",
+      "mtp.layers.0.self_attn.o_proj",
+      "mtp.layers.0.self_attn.q_norm",
+      "mtp.layers.0.self_attn.q_proj",
+      "mtp.layers.0.self_attn.v_proj",
+      "mtp.norm",
+      "mtp.pre_fc_norm_embedding",
+      "mtp.pre_fc_norm_hidden"
+    ],
+    "quant_algo": "NVFP4",
+    "producer": {
+      "name": "modelopt",
+      "version": "0.43.0"
+    },
+    "quant_method": "modelopt",
+    "exclude_modules": [
+      "mtp.fc",
+      "mtp.layers.0.input_layernorm",
+      "mtp.layers.0.mlp.down_proj",
+      "mtp.layers.0.mlp.gate_proj",
+      "mtp.layers.0.mlp.up_proj",
+      "mtp.layers.0.post_attention_layernorm",
+      "mtp.layers.0.self_attn.k_norm",
+      "mtp.layers.0.self_attn.k_proj",
+      "mtp.layers.0.self_attn.o_proj",
+      "mtp.layers.0.self_attn.q_norm",
+      "mtp.layers.0.self_attn.q_proj",
+      "mtp.layers.0.self_attn.v_proj",
+      "mtp.norm",
+      "mtp.pre_fc_norm_embedding",
+      "mtp.pre_fc_norm_hidden"
+    ]
+  }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 248044,
+  "do_sample": true,
+  "eos_token_id": [
+    248046,
+    248044
+  ],
+  "pad_token_id": 248044,
+  "temperature": 1.0,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.5.3"
+}

hf_quant_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+    "producer": {
+        "name": "modelopt",
+        "version": "0.43.0"
+    },
+    "quantization": {
+        "quant_algo": "NVFP4",
+        "kv_cache_quant_algo": null,
+        "group_size": 16,
+        "exclude_modules": [
+            "lm_head",
+            "model.language_model.layers.0.linear_attn.conv1d",
+            "model.language_model.layers.1.linear_attn.conv1d",
+            "model.language_model.layers.10.linear_attn.conv1d",
+            "model.language_model.layers.12.linear_attn.conv1d",
+            "model.language_model.layers.13.linear_attn.conv1d",
+            "model.language_model.layers.14.linear_attn.conv1d",
+            "model.language_model.layers.16.linear_attn.conv1d",
+            "model.language_model.layers.17.linear_attn.conv1d",
+            "model.language_model.layers.18.linear_attn.conv1d",
+            "model.language_model.layers.2.linear_attn.conv1d",
+            "model.language_model.layers.20.linear_attn.conv1d",
+            "model.language_model.layers.21.linear_attn.conv1d",
+            "model.language_model.layers.22.linear_attn.conv1d",
+            "model.language_model.layers.24.linear_attn.conv1d",
+            "model.language_model.layers.25.linear_attn.conv1d",
+            "model.language_model.layers.26.linear_attn.conv1d",
+            "model.language_model.layers.28.linear_attn.conv1d",
+            "model.language_model.layers.29.linear_attn.conv1d",
+            "model.language_model.layers.30.linear_attn.conv1d",
+            "model.language_model.layers.32.linear_attn.conv1d",
+            "model.language_model.layers.33.linear_attn.conv1d",
+            "model.language_model.layers.34.linear_attn.conv1d",
+            "model.language_model.layers.36.linear_attn.conv1d",
+            "model.language_model.layers.37.linear_attn.conv1d",
+            "model.language_model.layers.38.linear_attn.conv1d",
+            "model.language_model.layers.4.linear_attn.conv1d",
+            "model.language_model.layers.40.linear_attn.conv1d",
+            "model.language_model.layers.41.linear_attn.conv1d",
+            "model.language_model.layers.42.linear_attn.conv1d",
+            "model.language_model.layers.44.linear_attn.conv1d",
+            "model.language_model.layers.45.linear_attn.conv1d",
+            "model.language_model.layers.46.linear_attn.conv1d",
+            "model.language_model.layers.48.linear_attn.conv1d",
+            "model.language_model.layers.49.linear_attn.conv1d",
+            "model.language_model.layers.5.linear_attn.conv1d",
+            "model.language_model.layers.50.linear_attn.conv1d",
+            "model.language_model.layers.52.linear_attn.conv1d",
+            "model.language_model.layers.53.linear_attn.conv1d",
+            "model.language_model.layers.54.linear_attn.conv1d",
+            "model.language_model.layers.56.linear_attn.conv1d",
+            "model.language_model.layers.57.linear_attn.conv1d",
+            "model.language_model.layers.58.linear_attn.conv1d",
+            "model.language_model.layers.6.linear_attn.conv1d",
+            "model.language_model.layers.60.linear_attn.conv1d",
+            "model.language_model.layers.61.linear_attn.conv1d",
+            "model.language_model.layers.62.linear_attn.conv1d",
+            "model.language_model.layers.8.linear_attn.conv1d",
+            "model.language_model.layers.9.linear_attn.conv1d",
+            "model.visual*"
+        ]
+    }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a1b465994f5ada331c3458098d4ccf80c4811720c5219ad466b2b0c1753ded4
+size 20559273880

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "size": {
+        "longest_edge": 16777216,
+        "shortest_edge": 65536
+    },
+    "patch_size": 16,
+    "temporal_patch_size": 2,
+    "merge_size": 2,
+    "image_mean": [
+        0.5,
+        0.5,
+        0.5
+    ],
+    "image_std": [
+        0.5,
+        0.5,
+        0.5
+    ],
+    "processor_class": "Qwen3VLProcessor",
+    "image_processor_type": "Qwen2VLImageProcessorFast"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:530dc3d0de71a4d102af7d2f92a2a9178f430b489b1d5b48feb56d9c37e6a54e
+size 11071634

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": true,
+  "local_files_only": false,
+  "max_length": null,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_to_multiple_of": null,
+  "pad_token": "<|endoftext|>",
+  "pad_token_type_id": 0,
+  "padding_side": "left",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "split_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}

video_preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "size": {
+        "longest_edge": 25165824,
+        "shortest_edge": 4096
+    },
+    "patch_size": 16,
+    "temporal_patch_size": 2,
+    "merge_size": 2,
+    "image_mean": [
+        0.5,
+        0.5,
+        0.5
+    ],
+    "image_std": [
+        0.5,
+        0.5,
+        0.5
+    ],
+    "processor_class": "Qwen3VLProcessor",
+    "video_processor_type": "Qwen3VLVideoProcessor"
+}