Instructions to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF",
	filename="Dynamic/Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF
# Run inference directly in the terminal:
./llama-cli -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Use Docker

docker model run hf.co/Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

LM Studio
Jan

vLLM

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Ollama
How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with Ollama:
```
ollama run hf.co/Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF
```

Unsloth Studio

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF to start chatting

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Run Hermes

hermes

Docker Model Runner
How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with Docker Model Runner:
```
docker model run hf.co/Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF
```

Lemonade

How to use Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Run and chat with the model

lemonade run user.Qwen3.5-122B-A10B-PRISM-PRO-GGUF-{{QUANT_TAG}}

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Get PRISM-PRO Models on Day-0 & Support Our Research & Development efforts

PRISM-LITE Version | PRISM VIP Memberships | Ko-fi | Direct Model Purchase

Qwen3.5-122B-A10B-PRISM-PRO-GGUF

GGUF quantized versions of Qwen3.5-122B-A10B-PRISM-PRO -- an unrestricted PRISM Production model with full over-refusal and bias mechanisms completely removed using our State of the Art PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

If you find PRISM models useful, please consider supporting development:

Available Quantizations

Quantization	Size	BPW	Description
Dynamic	57.7 GB	4.06	PRISM Dynamic -- forensic per-block quantization with 5-tier `ffn_down_exps` allocation

PRISM Dynamic Quantization

This is not a standard uniform quantization. PRISM Dynamic uses forensic per-block analysis derived from comprehensive KLD sensitivity scoring to assign optimal quantization types to each tensor block individually:

Critical blocks (convergence + exit layers): Q6_K (6.6 BPW)
High-impact blocks (entry zone): Q5_K_M (5.5 BPW)
Standard blocks (bulk processing): Q4_K_M (4.8 BPW)
Low-sensitivity blocks: IQ4_XS (4.25 BPW)
Cold blocks (lowest sensitivity): IQ3_XXS (3.06 BPW)

All attention tensors are preserved at Q8_0. All norms and routing weights are kept at F32. The imatrix used for information-sensitive quantization types is included.

Included Files

Dynamic/
  Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf   -- Dynamic quant (57.7 GB)
  mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf     -- Vision encoder (871 MB)
  imatrix.dat                                   -- Importance matrix (342 MB)

Model Highlights

PRISM Ablation -- State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities.
122B Hybrid MoE Architecture -- 122 billion total parameters with 10 billion active per token across 256 routed experts + 1 shared expert per layer.
Hybrid Attention -- Novel GatedDeltaNet linear attention (36 layers) combined with full attention (12 layers) for efficient long-context processing.
Native Multimodal -- Vision encoder included as mmproj GGUF for seamless image and video understanding.
262K Full Context Window -- Native 262,144 token context length.
Dual Modes -- Supports both Thinking (deep reasoning) and Instant (direct response) modes.

Usage

llama.cpp (Recommended)

# Text-only inference
./llama-cli \
  -m Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  -p "Hello! Tell me about quantum computing." \
  -n 2048 -ngl 999 --temp 0.7

# With vision (multimodal)
./llama-mtmd-cli \
  -m Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf \
  --image photo.jpg \
  -p "Describe this image in detail." \
  -n 2048 -ngl 999

# Server mode
./llama-server \
  -m Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf \
  -ngl 999 --port 8080

koboldcpp

koboldcpp \
  --model Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf \
  --gpulayers 999 \
  --contextsize 8192

Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER top_k 20
EOF

ollama create prism-pro -f Modelfile
ollama run prism-pro

Hardware Requirements

Setup	VRAM Required	Notes
Dynamic (GPU only)	~60 GB	Fits on 1x A100 80GB or 1x H100 80GB
Dynamic (GPU + CPU offload)	48+ GB GPU + RAM	Offload some layers to CPU
Dynamic (CPU only)	64+ GB RAM	Slower but functional

Benchmarks

Benchmark	Qwen3.5-122B-A10B	GPT-5-mini	Qwen3-235B-A22B
MMLU-Pro	86.7	83.7	84.4
MMLU-Redux	94.0	93.7	93.8
GPQA Diamond	86.6	82.8	81.1
HMMT Feb 25	91.4	89.2	85.1
SWE-bench Verified	72.0	72.0	--
LiveCodeBench v6	78.9	80.5	75.1
MMMU	83.9	79.0	80.6
VideoMME (w/ sub)	87.3	83.5	83.8

Note: Benchmark results are from the base Qwen3.5-122B-A10B model.

License

Based on Qwen3.5-122B-A10B by the Qwen Team (Alibaba Group). Licensed under Apache 2.0.

Acknowledgments

Based on Qwen3.5-122B-A10B by the Qwen Team. GGUF conversion and quantization by Ex0bit. See the Qwen3.5 blog post for architecture details.

Citation

@misc{qwen35prismpro_gguf,
    title  = {Qwen3.5-122B-A10B-PRISM-PRO-GGUF},
    author = {Ex0bit},
    month  = {February},
    year   = {2026}
}

Downloads last month: 11

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

(47)

this model

Collection including Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Qwen 3.5 PRISM

Collection

PRISM abliterated variants of Qwen 3.5 122B-A10B — multimodal MoE with hybrid attention. PRO, LITE, and GGUF formats. • 5 items • Updated Mar 5