Instructions to use sphaela/Qwen3.6-27B-AutoRound-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sphaela/Qwen3.6-27B-AutoRound-GGUF",
	filename="Qwen3.6-27B-Q2_K_MIXED.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Use Docker

docker model run hf.co/sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with Ollama:
```
ollama run hf.co/sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M
```

Unsloth Studio

How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sphaela/Qwen3.6-27B-AutoRound-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sphaela/Qwen3.6-27B-AutoRound-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sphaela/Qwen3.6-27B-AutoRound-GGUF to start chatting

How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with Docker Model Runner:
```
docker model run hf.co/sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M
```

Lemonade

How to use sphaela/Qwen3.6-27B-AutoRound-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sphaela/Qwen3.6-27B-AutoRound-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-27B-AutoRound-GGUF-Q4_K_M

List all available models

lemonade list

MTP support

by Throghar - opened May 16

Discussion

Throghar

May 16

Hi, is it possible to reacreate these quants with MTP support?

tooltd

May 17

Me too, please.

soyalemujica

30 days ago

Hello, I'd love to see MTP support as well!

sphaela

Owner 29 days ago

I'll try to look into it, thanks for the suggestion :3

agsharathnaik

16 days ago

Yep this is the best small model that fits into 11GB GPU or reasonable with 32GB system ram. This with MTP should make it the best allrounder.

soyalemujica

14 days ago

No MTP yet? :(

sphaela

Owner 9 days ago

Hi guys thanks for the support :3 Autoround supports MTP with GGUF just merged so I'll try to requant the models with MTP enable.

soyalemujica

9 days ago

YES Please! I'd be willing to donate even 5 bucks if you could make MTP model for Q5KM and Q4KM 🤤

tooltd

9 days ago

Do Q3 for poor too, with Autoroundbest Recipe Settings 😁

sphaela

Owner 9 days ago

Hi everyone the requant has begun on my two DGX Spark and Qwen3.6-27B variant will be upload in 12hrs or so with the 35B following shortly :3
@soyalemujica I really appreciate but I currently don't have ways to accept donation ;-; thank you so much!

sphaela

Owner 8 days ago

Hi guys sorry for the wait, the models is being upload right now! I literary stayed up all night to quantize everything. If anyone want to support me you could buy me a coffee over https://ko-fi.com/sphaela (I set one up per request :3)

sphaela

Owner 8 days ago

Everything is uploaded, enjoy ;)

soyalemujica

8 days ago

I'd like to thank you so very much! I have tipped you for making this possible, these autoround quants are the best I have ever tried, for some reason they think less, and come up with solutions quicker.

sphaela

Owner 8 days ago

I'd like to thank you so very much! I have tipped you for making this possible, these autoround quants are the best I have ever tried, for some reason they think less, and come up with solutions quicker.

@soyalemujica Thank you so much for your generous tip! I'm glad you like it!

saadsafi

7 days ago

excellent work !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment