Instructions to use lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4")
model = AutoModelForImageTextToText.from_pretrained("lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4

SGLang

How to use lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 with Docker Model Runner:
```
docker model run hf.co/lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4
```

lyf commited on Apr 25

Commit

67a56fa

1 Parent(s): 2e152bf

Update README for root-only model layout

Browse files

Remove references to deleted aggressive and conservative profile folders. Document the repository as a single root weight set for direct vLLM loading.

Files changed (1) hide show

README.md +5 -43

README.md CHANGED Viewed

@@ -30,22 +30,14 @@ Uncensored Qwen3.6 35B A3B MoE quantized to NVFP4 `compressed-tensors` for vLLM
 - **35B total / 3B active MoE**
 - **HauhauCS Aggressive uncensored source**
 - **NVFP4 W4A4 compressed-tensors**
 - **~22 GB**
 - **Runs on one RTX 5090**
 - **100K-131K text context target**
 - **vLLM native loading**
-The default model files are placed at the repository root so Hugging Face shows the weights in the right-side download panel and `vllm serve` can load the repo directly.
-## Which profile should I use?
-| Profile | Path | Use |
-| --- | --- | --- |
-| Conservative | repo root / `conservative/` | Recommended default. Linear attention and MTP kept bf16 for quality. |
-| Aggressive | `aggressive/` | More aggressive NVFP4 coverage for smaller footprint / longer context experiments. |
-Recommended default: **root / conservative**.
 ## Download
@@ -54,14 +46,6 @@ hf download lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 \
   --local-dir ./qwen36-35b-a3b-hauhaucs-nvfp4
 ```
-Aggressive profile only:
-```bash
-hf download lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 \
-  --include "aggressive/*" \
-  --local-dir ./qwen36-35b-a3b-hauhaucs-nvfp4
-```
 ## vLLM quickstart
 ```bash
@@ -103,28 +87,6 @@ vllm serve ./qwen36-35b-a3b-hauhaucs-nvfp4 \
   --trust-remote-code
 ```
-Aggressive subfolder quickstart:
-```bash
-hf download lyf/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-NVFP4 \
-  --local-dir ./qwen36-35b-a3b-hauhaucs-nvfp4
-VLLM_NVFP4_GEMM_BACKEND=marlin \
-vllm serve ./qwen36-35b-a3b-hauhaucs-nvfp4/aggressive \
-  --served-model-name qwen36-35b-a3b-hauhaucs-nvfp4-aggressive \
-  --quantization compressed-tensors \
-  --kv-cache-dtype fp8 \
-  --max-model-len 131072 \
-  --max-num-seqs 1 \
-  --max-num-batched-tokens 4096 \
-  --gpu-memory-utilization 0.90 \
-  --enable-prefix-caching \
-  --enable-auto-tool-choice \
-  --tool-call-parser qwen3_coder \
-  --reasoning-parser qwen3 \
-  --trust-remote-code
-```
 ## Quantization recipe
 ```python
@@ -144,14 +106,14 @@ oneshot(
 )
 ```
-- Calibration: `HuggingFaceH4/ultrachat_200k`, 128 samples × 1024 tokens
 - MTP tensors copied from [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
 - Converted using [li-yifei/gguf-to-nvfp4](https://github.com/li-yifei/gguf-to-nvfp4)
 Pipeline:
 ```text
-Q8_K_P GGUF → step1_convert_qwen36_moe.py → HF bf16 → step2_quantize_qwen36_moe.py → NVFP4
 ```
 ## Source models
@@ -163,4 +125,4 @@ Q8_K_P GGUF → step1_convert_qwen36_moe.py → HF bf16 → step2_quantize_qwen3
 - [HauhauCS](https://huggingface.co/HauhauCS) for the uncensored GGUF source
 - [Qwen](https://huggingface.co/Qwen) for the base model and MTP weights
-- [AEON-7](https://huggingface.co/AEON-7) and [RedHatAI](https://huggingface.co/RedHatAI) for conservative quantization approach reference

 - **35B total / 3B active MoE**
 - **HauhauCS Aggressive uncensored source**
+- **Conservative NVFP4 profile**: linear attention and MTP kept in bf16 for quality
 - **NVFP4 W4A4 compressed-tensors**
 - **~22 GB**
 - **Runs on one RTX 5090**
 - **100K-131K text context target**
 - **vLLM native loading**
+The model files are placed at the repository root so Hugging Face shows the weights in the right-side download panel and `vllm serve` can load the repo directly. The repo intentionally keeps a single root weight set to avoid full-repo snapshot downloads pulling multiple profile variants.
 ## Download
   --local-dir ./qwen36-35b-a3b-hauhaucs-nvfp4
 ```
 ## vLLM quickstart
 ```bash
   --trust-remote-code
 ```
 ## Quantization recipe
 ```python
 )
 ```
+- Calibration: `HuggingFaceH4/ultrachat_200k`, 128 samples x 1024 tokens
 - MTP tensors copied from [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
 - Converted using [li-yifei/gguf-to-nvfp4](https://github.com/li-yifei/gguf-to-nvfp4)
 Pipeline:
 ```text
+Q8_K_P GGUF -> step1_convert_qwen36_moe.py -> HF bf16 -> step2_quantize_qwen36_moe.py -> NVFP4
 ```
 ## Source models
 - [HauhauCS](https://huggingface.co/HauhauCS) for the uncensored GGUF source
 - [Qwen](https://huggingface.co/Qwen) for the base model and MTP weights
+- [AEON-7](https://huggingface.co/AEON-7) and [RedHatAI](https://huggingface.co/RedHatAI) for conservative quantization approach reference