Instructions to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4") model = AutoModelForImageTextToText.from_pretrained("llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4
- SGLang
How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4 with Docker Model Runner:
docker model run hf.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GPTQ-Int4
Possibilities of NVFP4?
Hi, recently GGUF nvfp4 is merged in a PR.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.
Hi, recently GGUF nvfp4 is merged in a PR.
Yes and no, it's not as a simple a process as creating NVFP4 GGUF, this is actually what happens when you try to create NVFP4 on llama.cpp:
main: invalid ftype 'NVFP4'
Basically NVFP4 is not recognized as a quantization type by llama.cpp's latest version from 3 hours ago.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I have actually been spending the whole day yesterday working on this exact thing, but it's been very difficult and it's way more difficult and complicated than simply creating GGUFs with llama.cp.
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.
Yes, will keep on working on it today, I just have haven't been able to find a recipe that gives you both low size with retained quality, I was able to create one, but it's 27,5 GiB, it's the best quality that I can do but I am not sure if people will be too eager to download an NVFP4 that is just slightly smaller than FP8 and GPTQ-8bit by about 1 GiB, I am suspecting that people might see the size is about 10 GiB bigger than expected and will pass it on for a smaller sized NVFP4 from other uploaders while disregarding the quality tradeoffs (which makes sense, as a higher quality version is not really useful if you can not fit it in your hardware).
Hi, recently GGUF nvfp4 is merged in a PR.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.
It took a while, but I might finally have something for you later today.
Hi, recently GGUF nvfp4 is merged in a PR.
Wonder if you can also release NVFP4 version of these gguf (qwen3.5&3.6 27b/gemm4 31b?)?
I think NVFP4 is about just a bit worse than Q6 in terms of intelligence but would be 3 times faster on Nvidia GPUs. Thanks.
Finally done, here you go:
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF
More NVFP4 coming.
Hi @llmfan46 ,
Thanks you!
Before I about to download I see your new upload: https://huggingface.co/lyf/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MTP
Assume there will also be a gguf for that?
In my limited research seema like NVFP4 MTP gguf > NVFP4 MLP gguf > NVFP4 gguf?
Thanks!
nvm I realized the above one is from another person.
And yours: https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MLP is already part of https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF/tree/main.
Thanks!
Some more NVFP4 goodies for you:
https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-NVFP4-Experts-Only-GGUF
https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-NVFP4-Experts-Only
Hope that you like them and that they prove useful to you.
Hi @llmfan46 ,
Thanks you!
Before I about to download I see your new upload: https://huggingface.co/lyf/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MTP
Assume there will also be a gguf for that?In my limited research seema like NVFP4 MTP gguf > NVFP4 MLP gguf > NVFP4 gguf?
Thanks!
Here you go:
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-MTP-GGUF
Wow, this is wonderful! Thanks a lot @llmfan46 !
I currently only downloaded the MLP version, gonna test diff between normal MLP vs q8 MLP as size diff is huge.
Will try MTP once I am ready as I doubt Kobold currently support MTP natively.
// NextN/MTP tensors are currently ignored (reserved for future MTP support)
My search shows MTP would run but the layer speed up just get ignored.
Therefore, I might need to setup LM studio to fully see the effect.
Thanks!
Yep sure thing, more models coming soon and be sure to let me know if you need some more NVFP4 GGUFs for some other models of mine, have fun with the models.
Hi,
Well since you asked @llmfan46 , so I will shamelessly wonder if gemma4 is also possible, at least the 31b model since q6 is roughly taken my entire vram.
A MLP NVFP4 q8 or not would definitely help reserve more space for context and keep that high bit of attention layer π
This is NOT in anyway urgent as I can still work with the existing gemma4 heretic with 60K context.
Only do this if you have time and resource or interest. π
Thanks!
Hi,
Well since you asked @llmfan46 , so I will shamelessly wonder if gemma4 is also possible, at least the 31b model since q6 is roughly taken my entire vram.
A MLP NVFP4 q8 or not would definitely help reserve more space for context and keep that high bit of attention layer πThis is NOT in anyway urgent as I can still work with the existing gemma4 heretic with 60K context.
Only do this if you have time and resource or interest. πThanks!
I can do it, but not right away, I need to finish benchmarking and releasing my newest Gemma 4 31B it uncensored finetune, I also have to re-do and re-upload all the Qwen3.6 and Gemma 4 GGUFs due to changes in chat templates.
Seems like a lot more to redownload then :)
May I ask a question @llmfan46 , if chat template update, why do gguf also need to be rebuild?
Isnt chat template only getting loaded in kobold/ST/LM studio? Or am I thinking this wrong? Thanks.
Seems like a lot more to redownload then :)
May I ask a question @llmfan46 , if chat template update, why do gguf also need to be rebuild?
Isnt chat template only getting loaded in kobold/ST/LM studio? Or am I thinking this wrong? Thanks.
Because it's packed into a GGUF, on a safetensor it's easy to just update the chat_template.jinja, here it isn't the same case.
Seems like this is the best of both worlds:
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF
Even though kobold does not support MTP yet, I can still enjoy the MLP q8 nvfp4.
And if kobold start to support then we rock!
Then I guess this can be retired?
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF
Thanks.
Seems like this is the best of both worlds:
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF
Yeah it doesn't get better than that now.
Even though kobold does not support MTP yet, I can still enjoy the MLP q8 nvfp4.
And if kobold start to support then we rock!Then I guess this can be retired?
https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-NVFP4-GGUF
Thanks.
I'll just leave it for now in case someone doesn't care about MTP and just want the smaller sizes.