Instructions to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4")
model = AutoModelForImageTextToText.from_pretrained("llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4

SGLang

How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with Docker Model Runner:
```
docker model run hf.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4
```

Possibilities of NVFP4?

by johnlaborxxx - opened May 2

Discussion

johnlaborxxx

May 2

•

edited May 2

Hi, recently GGUF nvfp4 is merged in a PR.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.

llmfan46

Owner May 2

•

edited May 2

Hi, recently GGUF nvfp4 is merged in a PR.

Yes and no, it's not as a simple a process as creating NVFP4 GGUF, this is actually what happens when you try to create NVFP4 on llama.cpp:

main: invalid ftype 'NVFP4'

Basically NVFP4 is not recognized as a quantization type by llama.cpp's latest version from 3 hours ago.

Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?

I have actually been spending the whole day yesterday working on this exact thing, but it's been very difficult and it's way more difficult and complicated than simply creating GGUFs with llama.cp.

I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.

Yes, will keep on working on it today, I just have haven't been able to find a recipe that gives you both low size with retained quality, I was able to create one, but it's 27,5 GiB, it's the best quality that I can do but I am not sure if people will be too eager to download an NVFP4 that is just slightly smaller than FP8 and GPTQ-8bit by about 1 GiB, I am suspecting that people might see the size is about 10 GiB bigger than expected and will pass it on for a smaller sized NVFP4 from other uploaders while disregarding the quality tradeoffs (which makes sense, as a higher quality version is not really useful if you can not fit it in your hardware).

llmfan46

Owner May 2

Hi, recently GGUF nvfp4 is merged in a PR.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.

It took a while, but I might finally have something for you later today.

llmfan46

Owner May 3

Hi, recently GGUF nvfp4 is merged in a PR.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.

Finally done, here you go:

https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF

More NVFP4 coming.

johnlaborxxx

about 1 month ago

•

edited about 1 month ago

Hi @llmfan46 ,
Thanks you!
Before I about to download I see your new upload: https://huggingface.co/lyf/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MTP
Assume there will also be a gguf for that?

In my limited research seema like NVFP4 MTP gguf > NVFP4 MLP gguf > NVFP4 gguf?
Thanks!

johnlaborxxx

about 1 month ago

nvm I realized the above one is from another person.
And yours: https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MLP is already part of https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF/tree/main.

Thanks!

llmfan46

Owner about 1 month ago

•

edited about 1 month ago

Some more NVFP4 goodies for you:

https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-NVFP4-Experts-Only-GGUF

https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-NVFP4-Experts-Only

Hope that you like them and that they prove useful to you.

llmfan46

Owner about 1 month ago

Hi @llmfan46 ,
Thanks you!
Before I about to download I see your new upload: https://huggingface.co/lyf/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MTP
Assume there will also be a gguf for that?

In my limited research seema like NVFP4 MTP gguf > NVFP4 MLP gguf > NVFP4 gguf?
Thanks!

@johnlaborxxx

Here you go:

https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MTP-GGUF

johnlaborxxx

29 days ago

•

edited 29 days ago

Wow, this is wonderful! Thanks a lot @llmfan46 !

I currently only downloaded the MLP version, gonna test diff between normal MLP vs q8 MLP as size diff is huge.
Will try MTP once I am ready as I doubt Kobold currently support MTP natively.

// NextN/MTP tensors are currently ignored (reserved for future MTP support)

My search shows MTP would run but the layer speed up just get ignored.
Therefore, I might need to setup LM studio to fully see the effect.

Thanks!

llmfan46

Owner 29 days ago

@johnlaborxxx

Yep sure thing, more models coming soon and be sure to let me know if you need some more NVFP4 GGUFs for some other models of mine, have fun with the models.

johnlaborxxx

29 days ago

•

edited 29 days ago

Hi,

Well since you asked @llmfan46 , so I will shamelessly wonder if gemma4 is also possible, at least the 31b model since q6 is roughly taken my entire vram.
A MLP NVFP4 q8 or not would definitely help reserve more space for context and keep that high bit of attention layer 😍

This is NOT in anyway urgent as I can still work with the existing gemma4 heretic with 60K context.
Only do this if you have time and resource or interest. 👍

Thanks!

llmfan46

Owner 29 days ago

•

edited 29 days ago

Hi,

Well since you asked @llmfan46 , so I will shamelessly wonder if gemma4 is also possible, at least the 31b model since q6 is roughly taken my entire vram.
A MLP NVFP4 q8 or not would definitely help reserve more space for context and keep that high bit of attention layer 😍

This is NOT in anyway urgent as I can still work with the existing gemma4 heretic with 60K context.
Only do this if you have time and resource or interest. 👍

Thanks!

I can do it, but not right away, I need to finish benchmarking and releasing my newest Gemma 4 31B it uncensored finetune, I also have to re-do and re-upload all the Qwen3.6 and Gemma 4 GGUFs due to changes in chat templates.

johnlaborxxx

28 days ago

•

edited 28 days ago

Seems like a lot more to redownload then :)
May I ask a question @llmfan46 , if chat template update, why do gguf also need to be rebuild?
Isnt chat template only getting loaded in kobold/ST/LM studio? Or am I thinking this wrong? Thanks.

llmfan46

Owner 28 days ago

Seems like a lot more to redownload then :)
May I ask a question @llmfan46 , if chat template update, why do gguf also need to be rebuild?
Isnt chat template only getting loaded in kobold/ST/LM studio? Or am I thinking this wrong? Thanks.

Because it's packed into a GGUF, on a safetensor it's easy to just update the chat_template.jinja, here it isn't the same case.

johnlaborxxx

27 days ago

Seems like this is the best of both worlds:
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF

Even though kobold does not support MTP yet, I can still enjoy the MLP q8 nvfp4.
And if kobold start to support then we rock!

Then I guess this can be retired?
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF
Thanks.

llmfan46

Owner 27 days ago

Seems like this is the best of both worlds:
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF

Yeah it doesn't get better than that now.

Even though kobold does not support MTP yet, I can still enjoy the MLP q8 nvfp4.
And if kobold start to support then we rock!

Then I guess this can be retired?
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF
Thanks.

I'll just leave it for now in case someone doesn't care about MTP and just want the smaller sizes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment