Instructions to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF", filename="Qwen3.6-27B-Claude-Opus-Reasoning-Distill.bf16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Use Docker
docker model run hf.co/TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Ollama:
ollama run hf.co/TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
- Unsloth Studio
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF to start chatting
- Pi
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Docker Model Runner:
docker model run hf.co/TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
- Lemonade
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF-Q4_K_M
List all available models
lemonade list
Removing the model's vision feature?
Hi friend,
I've been following your work for a while now, and the models you've created are truly remarkable examples of successful optimizations achieved through hard work.
I have an idea: since we're creating a model based on Claude Opus, could you perhaps develop it further with more coding-oriented features? Removing the Vision feature from the general-purpose model could reduce the space it occupies in VRAM. For someone using autonomous coding, the Vision feature isn't really necessary. Wouldn't removing the Vision feature and fine-tuning the coding result in a much better model?
Perhaps, if you have the time and resources, you could release a different fork as TeichAI/Qwen3.6-27B-Claude-Opus-CODER-Reasoning-Distill-v2-GGUF.
Finally, thank you so much for your efforts; you're guiding us.
It would result in a much better coding model, but then it wouldn't be qwen3.6 architecture and would also require much more compute to occupy the space the vision encoder used. 🤷
you can run without mmproj to save some VRAM, just download the GGUF manually or use --no-mmproj in llama.cpp
Well the vision encoder is a separate model that, if images are present, encodes them and projects them into Qwen’s hidden state, so it could be removed with no hit to non vision performance, but it would not increase performance without significant retraining, and the encoder is only a couple hundred million parameters anyway so VRAM savings would be marginal, though a dedicated agentic coding model trained on frontier coding traces would be great, though since coding quality mainly depends on the large amounts of code it sees in pretraining, not the few thousand examples it sees in a frontier finetune (that influences mainly how it structures its agentic workflow and reasoning), I would recommend Devstral 2 small 24B as a good model as it is a purely agentic coding focused model from the ground up.
I'm not sure I understand what you are suggesting since qwen 3.6 coding quality is miles ahead of devstral 2 small already?
He has expressed in other discussions that getting traces of something like opus not just creating new projects from scratch, but also spawning into an already populated git repo and tasked with exploring, auditing, fixing, or implementing things.
I'm working on a new version of agentic datagen that could make this nice and easy to do. we'll be back soon with some new data and a new model will follow soon after :)
Personally i think removing vision from the model is a terrible idea. Some of us use these models with vision to iterate through UI/UX along with coding, having automated screenshot generation and analysis as UI/UX development happens.