Instructions to use aaardpark/Qwen3.6-27B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use aaardpark/Qwen3.6-27B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="aaardpark/Qwen3.6-27B-GGUF", filename="qwen3.6-27B-aaardpark-uniform-Q3_K.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use aaardpark/Qwen3.6-27B-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf aaardpark/Qwen3.6-27B-GGUF # Run inference directly in the terminal: llama cli -hf aaardpark/Qwen3.6-27B-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf aaardpark/Qwen3.6-27B-GGUF # Run inference directly in the terminal: llama cli -hf aaardpark/Qwen3.6-27B-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf aaardpark/Qwen3.6-27B-GGUF # Run inference directly in the terminal: ./llama-cli -hf aaardpark/Qwen3.6-27B-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf aaardpark/Qwen3.6-27B-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf aaardpark/Qwen3.6-27B-GGUF
Use Docker
docker model run hf.co/aaardpark/Qwen3.6-27B-GGUF
- LM Studio
- Jan
- Ollama
How to use aaardpark/Qwen3.6-27B-GGUF with Ollama:
ollama run hf.co/aaardpark/Qwen3.6-27B-GGUF
- Unsloth Studio
How to use aaardpark/Qwen3.6-27B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aaardpark/Qwen3.6-27B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aaardpark/Qwen3.6-27B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for aaardpark/Qwen3.6-27B-GGUF to start chatting
- Pi
How to use aaardpark/Qwen3.6-27B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf aaardpark/Qwen3.6-27B-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "aaardpark/Qwen3.6-27B-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use aaardpark/Qwen3.6-27B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf aaardpark/Qwen3.6-27B-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default aaardpark/Qwen3.6-27B-GGUF
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use aaardpark/Qwen3.6-27B-GGUF with Docker Model Runner:
docker model run hf.co/aaardpark/Qwen3.6-27B-GGUF
- Lemonade
How to use aaardpark/Qwen3.6-27B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull aaardpark/Qwen3.6-27B-GGUF
Run and chat with the model
lemonade run user.Qwen3.6-27B-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf aaardpark/Qwen3.6-27B-GGUF# Run inference directly in the terminal:
llama cli -hf aaardpark/Qwen3.6-27B-GGUFUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf aaardpark/Qwen3.6-27B-GGUF# Run inference directly in the terminal:
./llama-cli -hf aaardpark/Qwen3.6-27B-GGUFBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf aaardpark/Qwen3.6-27B-GGUF# Run inference directly in the terminal:
./build/bin/llama-cli -hf aaardpark/Qwen3.6-27B-GGUFUse Docker
docker model run hf.co/aaardpark/Qwen3.6-27B-GGUFQwen3.6-27B | aard-Q3
11 GB of Qwen 3.6-27B
| GSM8K (50 problems, 5-shot) | |
|---|---|
| FP16 (full precision) | 49 / 50 = 98% |
| aard-Q3 (this file) | 47 / 50 = 94% |
96% of the model's reasoning capacity carries through 3-bit compression.
Why this 3-bit pack isn't bad
I've been messing around with a new quant method that handles 3 bit pretty well by using a different way of preserving important weights.
Three sample outputs
Math:
Q: What is 5/8 of 2/3?
A: To find 5/8 of 2/3, multiply the two fractions: 5/8 ร 2/3 = (5 ร 2)/(8 ร 3) = 10/24 = 5/12.
Factual:
Q: What year did the Treaty of Westphalia end the Thirty Years' War, and name two of its key provisions.
A: The Peace of Westphalia was signed in 1648, ending the Thirty Years' War. Two key provisions:
- State sovereignty โ each state had exclusive right to govern its internal affairs (religion and military), laying the foundation for the modern sovereign-state system.
- Religious tolerance โ extended the Peace of Augsburg to recognize Calvinism alongside Lutheranism and Catholicism, ending decades of religious warfare in the Holy Roman Empire.
Code debug:
Q: Find the bug in this Python:
def merge(a, b): result = [] i, j = 0, 0 while i < len(a) and j < len(b): if a[i] < b[j]: result.append(a[i]) i += 1 else: result.append(b[j]) i += 1 result.extend(a[i:]) result.extend(b[j:]) return resultA: The bug is in the
elsebranch. Whena[i] >= b[j], the code appendsb[j]but incrementsiinstead ofj. The wrong pointer advances, later iterations re-readb[j], and the function eventually walks off the end ofa. Fix: changei += 1in theelsebranch toj += 1.
Quick stats
| File | qwen3.6-27B-aaardpark-uniform-Q3_K.gguf |
| Size | 11 GB |
| Format | GGUF, uniform Q3_K (497 ร Q3_K + 1 ร Q6_K, 3.59 BPW) |
| Min VRAM | 16 GB |
| Throughput | ~30 tok/s on Apple M-series |
| Native context | 262K (Gated Delta Net linear attention) |
Run it
huggingface-cli download aaardpark/Qwen3.6-27B-GGUF \
qwen3.6-27B-aaardpark-uniform-Q3_K.gguf --local-dir .
llama-cli -m qwen3.6-27B-aaardpark-uniform-Q3_K.gguf -ngl 99 -c 32768
Qwen 3.6 is a thinking model โ it emits a <think>โฆ</think> block before the final answer. Budget at least 2048 tokens (4096 for hard reasoning), or set enable_thinking=False if your runtime supports it. Needs llama.cpp build 8670 or later.
More from aaardpark
- Qwen 3.5 27B GGUF โ 11 GB, 96% GSM8K
- gemma-4-31B-it โ 15.3 GB, 96% GSM8K
- Qwen 2.5 72B Instruct โ 35 GB, 88% GSM8K
- Qwen 2.5 32B Instruct โ 15 GB
- Downloads last month
- 5
Model tree for aaardpark/Qwen3.6-27B-GGUF
Base model
Qwen/Qwen3.6-27B
Install (macOS, Linux)
# Start a local OpenAI-compatible server with a web UI: llama serve -hf aaardpark/Qwen3.6-27B-GGUF# Run inference directly in the terminal: llama cli -hf aaardpark/Qwen3.6-27B-GGUF