Instructions to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF", filename="Qwen3.5-122B-A10B-Opus-Reasoning-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Use Docker
docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
- Ollama
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Ollama:
ollama run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
- Unsloth Studio
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF to start chatting
- Pi
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Docker Model Runner:
docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
- Lemonade
How to use timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.5-122B-A10B-Opus-Reasoning-GGUF-Q4_K_M
List all available models
lemonade list
Qwen3.5-122B-A10B-Opus-Reasoning-GGUF
The first Claude Opus-distilled reasoning fine-tune of Qwen3.5-122B at full scale. Enhanced multi-step reasoning, analytical depth, and uncensored output — trained where competitors can't reach.
122B total parameters, 10B active per token (Mixture-of-Experts). LoRA fine-tuned on 12,840 Claude Opus 4.6 reasoning traces. 7 quantization levels from Q2_K to BF16.
⚡ Forged on 8×H200 SXM5 | 1.1TB VRAM
Why This Model
| Base Qwen3.5-122B | Jackrong (27B) | TIMTEH (this) | |
|---|---|---|---|
| Scale | 122B/10B active | 27B dense | 122B/10B active |
| Training data | Base alignment | Opus distillation | Opus distillation |
| Reasoning quality | Standard | Enhanced (small scale) | Enhanced (full MoE scale) |
| Uncensored | ❌ | ✅ | ✅ |
| Hardware required to train | Any | Consumer GPU | 8×H200 (1.1TB VRAM) |
Nobody else has fine-tuned Qwen3.5-122B on Opus reasoning data. Jackrong stopped at 27B because they don't have the hardware. We do.
Quantizations
| Quant | File | Size | BPW | RAM Required | Use Case |
|---|---|---|---|---|---|
| BF16 | ...-BF16.gguf |
228 GB | 16.0 | ~235 GB | Maximum quality, reference |
| Q8_0 | ...-Q8_0.gguf |
121 GB | 8.5 | ~125 GB | Near-lossless, high-VRAM setups |
| Q6_K | ...-Q6_K.gguf |
94 GB | 6.6 | ~98 GB | Excellent quality |
| Q5_K_M | ...-Q5_K_M.gguf |
81 GB | 5.7 | ~85 GB | Great balance |
| Q4_K_M | ...-Q4_K_M.gguf |
70 GB | 4.9 | ~74 GB | ⭐ Recommended — best quality/size |
| Q3_K_M | ...-Q3_K_M.gguf |
55 GB | 3.9 | ~58 GB | Fits 2×48GB GPUs |
| Q2_K | ...-Q2_K.gguf |
42 GB | 2.9 | ~45 GB | Single 48GB GPU |
Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-122B-A10B |
| Method | LoRA (r=64, alpha=128, dropout=0.05) |
| Trainable Parameters | 66.8M / 122.1B (0.05%) |
| Training Samples | 12,840 |
| Epochs | 2 |
| Steps | 1,266 |
| Final Avg Loss | 0.1502 |
| Training Time | 6 hours 34 minutes |
| Hardware | 8× NVIDIA H200 SXM5 (141GB HBM3e each, NVLink 478 GB/s) |
| Precision | BF16 (full, no quantized training) |
| Effective Batch Size | 64 |
| Learning Rate | Cosine schedule, peak 2e-4 |
| Max Sequence Length | 4096 |
Training Datasets
| Dataset | Samples | Source |
|---|---|---|
| opus-10000x | 9,633 | Claude Opus 4.6 reasoning traces (10K filtered) |
| opus-3000x | 2,326 | Claude Opus 4.6 reasoning traces (3K filtered) |
| reasoning-700x | 633 | Qwen3.5 reasoning samples |
| high-reasoning-250x | 250 | High-quality Opus reasoning (curated) |
Architecture
- Type: Qwen3_5MoeForCausalLM (Mixture-of-Experts)
- Total Parameters: 122.1B
- Active Parameters: ~10B per token
- Hidden Size: 3,072
- Layers: 48
- Attention Heads: 32 (GQA)
- Experts: 256 routed + shared expert, 10 active per token
- Context Length: 131,072 tokens (default), extensible to 262K
- Vocab Size: 248,320
- Thinking Mode: Supports
<think>tags for explicit chain-of-thought - License: Apache 2.0
Usage
llama.cpp
# Recommended: Q4_K_M
./llama-cli -m Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf \
-p "Analyze the following problem step by step:" \
-n 2048 --temp 0.7 --top-p 0.9
# Server mode
./llama-server -m Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf \
--port 8080 --host 0.0.0.0 -c 65536
Ollama
ollama run timteh673/Qwen3.5-122B-A10B-Opus-Reasoning
LM Studio
Download the GGUF file and load in LM Studio. Supports thinking/non-thinking modes via enable_thinking in chat template.
Open WebUI / SillyTavern
Point your backend to a llama.cpp server running any quant. Full OpenAI-compatible API at /v1/chat/completions.
Recommended Settings
| Setting | Value | Notes |
|---|---|---|
| Temperature | 0.6–0.7 | Reasoning tasks |
| Temperature | 0.8–1.0 | Creative tasks |
| Top-P | 0.9 | |
| Min-P | 0.05 | Good alternative to Top-P |
| Context | 32K+ | Supports up to 131K |
| Thinking | Enabled | Use enable_thinking=True for best results |
What's Different From Base
- Enhanced reasoning chains — trained on 12,840 Opus-quality multi-step analytical traces
- Better instruction following — deeper engagement with complex prompts
- Uncensored — no refusal training, responds to all prompts
- MoE efficiency — only 10B params active per token despite 122B total
- Thinking mode — native
<think>tag support for explicit chain-of-thought
Pipeline
Qwen3.5-122B-A10B (base)
→ LoRA fine-tune (r=64, 12,840 Opus traces, 8×H200, 6.5h)
→ Merge adapter into base weights
→ Convert to BF16 GGUF (llama.cpp, 879 tensors)
→ Quantize: Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K
All steps executed natively in BF16 — no quantized training, no optimization hacks. When you have 1.1TB VRAM, you use it.
Model Provenance
- Base: Qwen/Qwen3.5-122B-A10B (Apache 2.0)
- Training Framework: transformers + PEFT + TRL (raw, no wrappers)
- Quantization: llama.cpp (build 8c60b8a)
- Hardware: 8×NVIDIA H200 SXM5 (IBM Cloud, 1.1TB VRAM total)
Also From TIMTEH
| Model | Status | Description |
|---|---|---|
| Qwen3.5-397B-A17B-Uncensored-GGUF | ✅ Live | Abliterated 397B MoE — 7 quants |
| Mistral-Small-4-119B-Uncensored-GGUF | ✅ Live | First TIMTEH release — 7 quants |
| Nemotron-3-Super-120B-A12B-Uncensored-GGUF | ✅ Live | Benchmarked — 7 quants |
| Qwen3.5-397B Opus-Reasoning | 🔥 Training | Stage 2 fine-tune (same technique, 397B scale) |
⚠️ Disclaimer
This model has been fine-tuned on uncensored reasoning data. It may generate content that is harmful, offensive, or inappropriate. Users are solely responsible for ensuring their use complies with applicable laws and ethical standards. Intended for research, testing, and controlled environments.
☕ Support This Work
Running 8×H200 GPUs isn't free. Every donation directly funds more open-weight model releases, better abliteration techniques, and pushing the frontier of what's possible with open models.
💎 Crypto Donations
| Currency | Address |
|---|---|
| BTC | bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh |
| ETH | 0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108 |
| SOL | 9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG |
🏢 Enterprise & Custom Models
Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning, abliteration, and deployment on 8×H200 SXM5.
- Custom fine-tuning on your data (up to 400B+ parameters)
- Private CARE abliteration (Phase 2 technique)
- Deployment architecture consulting (tensor parallelism, speculative decoding)
- Bespoke distillation datasets
📧 Contact: tim@timlex.co
Part of the TIMTEH Cognitive Preservation Foundry — surgical capability preservation at scale. ⚡ Forged on 8×NVIDIA H200 SXM5 | 1.1TB VRAM
- Downloads last month
- 1,678
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF
Base model
Qwen/Qwen3.5-122B-A10B
docker model run hf.co/timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF: