Instructions to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF",
	filename="Qwen3.6-27B-Claude-Opus-Reasoning-Distill.bf16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Use Docker

docker model run hf.co/TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Ollama:
```
ollama run hf.co/TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
```

Unsloth Studio

How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF to start chatting

How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Docker Model Runner:
```
docker model run hf.co/TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M
```

Lemonade

How to use TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-GGUF-Q4_K_M

List all available models

lemonade list

Removing the model's vision feature?

by NovaYear - opened Apr 30

Discussion

NovaYear

Apr 30

Hi friend,
I've been following your work for a while now, and the models you've created are truly remarkable examples of successful optimizations achieved through hard work.
I have an idea: since we're creating a model based on Claude Opus, could you perhaps develop it further with more coding-oriented features? Removing the Vision feature from the general-purpose model could reduce the space it occupies in VRAM. For someone using autonomous coding, the Vision feature isn't really necessary. Wouldn't removing the Vision feature and fine-tuning the coding result in a much better model?
Perhaps, if you have the time and resources, you could release a different fork as TeichAI/Qwen3.6-27B-Claude-Opus-CODER-Reasoning-Distill-v2-GGUF.
Finally, thank you so much for your efforts; you're guiding us.

CompactAI

TeichAI org Apr 30

It would result in a much better coding model, but then it wouldn't be qwen3.6 architecture and would also require much more compute to occupy the space the vision encoder used. 🤷

gtrak

Apr 30

you can run without mmproj to save some VRAM, just download the GGUF manually or use --no-mmproj in llama.cpp

Bob-the-Koala

Apr 30

Well the vision encoder is a separate model that, if images are present, encodes them and projects them into Qwen’s hidden state, so it could be removed with no hit to non vision performance, but it would not increase performance without significant retraining, and the encoder is only a couple hundred million parameters anyway so VRAM savings would be marginal, though a dedicated agentic coding model trained on frontier coding traces would be great, though since coding quality mainly depends on the large amounts of code it sees in pretraining, not the few thousand examples it sees in a frontier finetune (that influences mainly how it structures its agentic workflow and reasoning), I would recommend Devstral 2 small 24B as a good model as it is a purely agentic coding focused model from the ground up.

gtrak

Apr 30

I'm not sure I understand what you are suggesting since qwen 3.6 coding quality is miles ahead of devstral 2 small already?

armand0e

TeichAI org Apr 30

•

edited Apr 30

He has expressed in other discussions that getting traces of something like opus not just creating new projects from scratch, but also spawning into an already populated git repo and tasked with exploring, auditing, fixing, or implementing things.

armand0e

TeichAI org Apr 30

I'm working on a new version of agentic datagen that could make this nice and easy to do. we'll be back soon with some new data and a new model will follow soon after :)

veldierin

14 days ago

Personally i think removing vision from the model is a terrible idea. Some of us use these models with vision to iterate through UI/UX along with coding, having automated screenshot generation and analysis as UI/UX development happens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment