Instructions to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF",
	filename="Qwen3.5-122B-A10B-Opus-Reasoning-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Use Docker

docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Ollama
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Ollama:
```
ollama run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
```

Unsloth Studio

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF to start chatting

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Docker Model Runner:
```
docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
```

Lemonade

How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.5-122B-A10B-Opus-Reasoning-GGUF-Q4_K_M

List all available models

lemonade list

Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

The first Claude Opus-distilled reasoning fine-tune of Qwen3.5-122B at full scale. Enhanced multi-step reasoning, analytical depth, and uncensored output — trained where competitors can't reach.

122B total parameters, 10B active per token (Mixture-of-Experts). LoRA fine-tuned on 12,840 Claude Opus 4.6 reasoning traces. 7 quantization levels from Q2_K to BF16.

⚡ Forged on 8×H200 SXM5 | 1.1TB VRAM

Why This Model

	Base Qwen3.5-122B	Jackrong (27B)	TIMTEH (this)
Scale	122B/10B active	27B dense	122B/10B active
Training data	Base alignment	Opus distillation	Opus distillation
Reasoning quality	Standard	Enhanced (small scale)	Enhanced (full MoE scale)
Uncensored	❌	✅	✅
Hardware required to train	Any	Consumer GPU	8×H200 (1.1TB VRAM)

Nobody else has fine-tuned Qwen3.5-122B on Opus reasoning data. Jackrong stopped at 27B because they don't have the hardware. We do.

Quantizations

Quant	File	Size	BPW	RAM Required	Use Case
BF16	`...-BF16.gguf`	228 GB	16.0	~235 GB	Maximum quality, reference
Q8_0	`...-Q8_0.gguf`	121 GB	8.5	~125 GB	Near-lossless, high-VRAM setups
Q6_K	`...-Q6_K.gguf`	94 GB	6.6	~98 GB	Excellent quality
Q5_K_M	`...-Q5_K_M.gguf`	81 GB	5.7	~85 GB	Great balance
Q4_K_M	`...-Q4_K_M.gguf`	70 GB	4.9	~74 GB	⭐ Recommended — best quality/size
Q3_K_M	`...-Q3_K_M.gguf`	55 GB	3.9	~58 GB	Fits 2×48GB GPUs
Q2_K	`...-Q2_K.gguf`	42 GB	2.9	~45 GB	Single 48GB GPU

Training Details

Parameter	Value
Base Model	Qwen/Qwen3.5-122B-A10B
Method	LoRA (r=64, alpha=128, dropout=0.05)
Trainable Parameters	66.8M / 122.1B (0.05%)
Training Samples	12,840
Epochs	2
Steps	1,266
Final Avg Loss	0.1502
Training Time	6 hours 34 minutes
Hardware	8× NVIDIA H200 SXM5 (141GB HBM3e each, NVLink 478 GB/s)
Precision	BF16 (full, no quantized training)
Effective Batch Size	64
Learning Rate	Cosine schedule, peak 2e-4
Max Sequence Length	4096

Training Datasets

Dataset	Samples	Source
opus-10000x	9,633	Claude Opus 4.6 reasoning traces (10K filtered)
opus-3000x	2,326	Claude Opus 4.6 reasoning traces (3K filtered)
reasoning-700x	633	Qwen3.5 reasoning samples
high-reasoning-250x	250	High-quality Opus reasoning (curated)

Architecture

Type: Qwen3_5MoeForCausalLM (Mixture-of-Experts)
Total Parameters: 122.1B
Active Parameters: ~10B per token
Hidden Size: 3,072
Layers: 48
Attention Heads: 32 (GQA)
Experts: 256 routed + shared expert, 10 active per token
Context Length: 131,072 tokens (default), extensible to 262K
Vocab Size: 248,320
Thinking Mode: Supports <think> tags for explicit chain-of-thought
License: Apache 2.0

Usage

llama.cpp

# Recommended: Q4_K_M
./llama-cli -m Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf \
  -p "Analyze the following problem step by step:" \
  -n 2048 --temp 0.7 --top-p 0.9

# Server mode
./llama-server -m Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf \
  --port 8080 --host 0.0.0.0 -c 65536

Ollama

ollama run timteh673/Qwen3.5-122B-A10B-Opus-Reasoning

LM Studio

Download the GGUF file and load in LM Studio. Supports thinking/non-thinking modes via enable_thinking in chat template.

Open WebUI / SillyTavern

Point your backend to a llama.cpp server running any quant. Full OpenAI-compatible API at /v1/chat/completions.

Recommended Settings

Setting	Value	Notes
Temperature	0.6–0.7	Reasoning tasks
Temperature	0.8–1.0	Creative tasks
Top-P	0.9
Min-P	0.05	Good alternative to Top-P
Context	32K+	Supports up to 131K
Thinking	Enabled	Use `enable_thinking=True` for best results

What's Different From Base

Enhanced reasoning chains — trained on 12,840 Opus-quality multi-step analytical traces
Better instruction following — deeper engagement with complex prompts
Uncensored — no refusal training, responds to all prompts
MoE efficiency — only 10B params active per token despite 122B total
Thinking mode — native <think> tag support for explicit chain-of-thought

Pipeline

Qwen3.5-122B-A10B (base)
  → LoRA fine-tune (r=64, 12,840 Opus traces, 8×H200, 6.5h)
  → Merge adapter into base weights
  → Convert to BF16 GGUF (llama.cpp, 879 tensors)
  → Quantize: Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K

All steps executed natively in BF16 — no quantized training, no optimization hacks. When you have 1.1TB VRAM, you use it.

Model Provenance

Base: Qwen/Qwen3.5-122B-A10B (Apache 2.0)
Training Framework: transformers + PEFT + TRL (raw, no wrappers)
Quantization: llama.cpp (build 8c60b8a)
Hardware: 8×NVIDIA H200 SXM5 (IBM Cloud, 1.1TB VRAM total)

Also From TIMTEH

Model	Status	Description
Qwen3.5-397B-A17B-Uncensored-GGUF	✅ Live	Abliterated 397B MoE — 7 quants
Mistral-Small-4-119B-Uncensored-GGUF	✅ Live	First TIMTEH release — 7 quants
Nemotron-3-Super-120B-A12B-Uncensored-GGUF	✅ Live	Benchmarked — 7 quants
Qwen3.5-397B Opus-Reasoning	🔥 Training	Stage 2 fine-tune (same technique, 397B scale)

⚠️ Disclaimer

This model has been fine-tuned on uncensored reasoning data. It may generate content that is harmful, offensive, or inappropriate. Users are solely responsible for ensuring their use complies with applicable laws and ethical standards. Intended for research, testing, and controlled environments.

☕ Support This Work

Running 8×H200 GPUs isn't free. Every donation directly funds more open-weight model releases, better abliteration techniques, and pushing the frontier of what's possible with open models.

Buy Me a Coffee QR Code

💎 Crypto Donations

Currency	Address
BTC	`bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh`
ETH	`0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108`
SOL	`9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG`

🏢 Enterprise & Custom Models

Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning, abliteration, and deployment on 8×H200 SXM5.

Custom fine-tuning on your data (up to 400B+ parameters)
Private CARE abliteration (Phase 2 technique)
Deployment architecture consulting (tensor parallelism, speculative decoding)
Bespoke distillation datasets

📧 Contact: tim@timlex.co

Part of the TIMTEH Cognitive Preservation Foundry — surgical capability preservation at scale. ⚡ Forged on 8×NVIDIA H200 SXM5 | 1.1TB VRAM

Downloads last month: 1,678

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

Base model

Qwen/Qwen3.5-122B-A10B

Adapter

(10)

this model

timteh673
/

Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

Why This Model

Quantizations

Training Details

Training Datasets

Architecture

Usage

llama.cpp

Ollama

LM Studio

Open WebUI / SillyTavern

Recommended Settings

What's Different From Base

Pipeline

Model Provenance

Also From TIMTEH

⚠️ Disclaimer

☕ Support This Work

💎 Crypto Donations

🏢 Enterprise & Custom Models

Model tree for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

Datasets used to train timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF