How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shakedzy/QwQ-32b-Preview-bnb-4bit-wTags to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shakedzy/QwQ-32b-Preview-bnb-4bit-wTags to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for shakedzy/QwQ-32b-Preview-bnb-4bit-wTags to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="shakedzy/QwQ-32b-Preview-bnb-4bit-wTags",
    max_seq_length=2048,
)
Quick Links

QwQ-32B-Preview LoRA for separating thinking/answer parts

This LoRA file was fine-tuned to make QwQ constantly separate its private thoughts from the final answer using <THINKING>...</THINKING><ANSWER>...</ANSWER> tags.

A Q4_K_M GGUF version (which can be used as an adapter for Ollama) is available on shakedzy/QwQ-32B-Preview-with-Tags-LoRA-GGUF.

Downloads last month
-
Safetensors
Model size
34B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for shakedzy/QwQ-32b-Preview-bnb-4bit-wTags

Base model

Qwen/Qwen2.5-32B
Adapter
(25)
this model