Instructions to use rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled", max_seq_length=2048, )
Thanks and request for FP8 version
Hi @rico03 ,
thank you very much for this model, it works very well for my use-cases and even MTP with 3 tokens has a very decent acceptance ratio.
But would it be possible to prepare an FP8 version, which would also keep MTP?
Also, would it be possible to create a version distilled from Opus 4.7 (current is based on Opus 4.6)?
From what I know you need the full model for MTP, but you can try to quantize it. I've already a GGUF version in another repo. It's very easy to quantize, you have only to use llama.
For opus4.7: fine tuning like this one only add a reasoning pattern and not more knowledge. So the way in which opus4.6 and opau4.7 reason are very similar. To create a fine tuned version with opus4.7 I'll need a dataset of samples, but the most reliable are with opus4.6,so it will be a bet.
Thank you for your comment!
You can use MTP also with fp8. You can do it in two way:
- on-the-fly:
vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled
--dtype fp8
--quantization fp8
- pre-quantize
pip install auto-fp8
python -c "
from auto_fp8 import AutoFP8ForCausalLM
model = AutoFP8ForCausalLM.from_pretrained('rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled')
model.save_pretrained('./qwen36-fp8')
"
And then run the quantize version
Thank you, I'm currently using online quantization because I have a 96GB VRAM GPU, but I wrote this request for the sake of others who might not have the capacity for online quantization.
In the next few days I will try to find time to create and publish such a quantization myself :)