---
license: apache-2.0
base_model: google/gemma-4-26B-A4B-it
base_model_relation: finetune
tags:
  - gemma4
  - gemma
  - google
  - mlx
  - apple-silicon
  - moe
  - mixture-of-experts
  - zero-refusals
  - prism-dq
  - dynamic-quantization
  - multimodal
  - vision
  - video-text-to-text
  - image-text-to-text
  - abliterated
  - text-generation
language:
  - en
pipeline_tag: image-text-to-text
library_name: mlx
quantized_by: Ex0bit
---
[![Parameters](https://img.shields.io/badge/Parameters-26B_A4B_MoE-blue)]()
[![Format](https://img.shields.io/badge/Format-MLX-green)]()
[![Quant](https://img.shields.io/badge/Quant-PRISM_Dynamic_(6.52_BPW)-yellow)]()
[![Multimodal](https://img.shields.io/badge/Multimodal-Vision%20%2B%20Video%20%2B%20Text-purple)]()


<div align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/onzfqZmEuOqedGOcyedHc.png" width="800">
</div>

# MYTHOS-26B-A4B — PRISM Dynamic Quantization (MLX)

**Gemma 4 26B-A4B MoE PRISM-PRO-Dynamic-Quant for Apple Silicon**

- **PRISM-PRO**: Production model with full over-refusal and bias mechanisms completely removed using State of the Art **PRISM pipeline**.
- **DQ**: Per-tensor-class mixed-precision allocation derived entirely from weight structure sensitivity analysis — not closed-gated datasets.

Created by [Ex0bit](https://hf.co/Ex0bit)

---

<div align="center">

### 💡Support My Research & Development efforts. Members Receive access to the latest PRISM-PRO Model drops on Day-0

[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20PRISM-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/Ex0bit)

</div>

---

## Model Details

| Property | Value |
|----------|-------|
| Base Model | google/gemma-4-26B-A4B-it |
| Architecture | Gemma 4 MoE (128 experts, top-8 routing) |
| Parameters | 26B total / 4B active per token |
| Quantization | PRISM-PRO-DYNAMIC-QUANT (MLX native) |
| Achieved BPW | 6.52 |
| File Size | ~20 GB |
| Context Length | 262,144 tokens |
| Modalities | Text, Image, Video |
| Runtime | mlx-vlm (Apple Silicon Metal) |
| Creator | [Ex0bit](https://hf.co/Ex0bit) |

## Supported Modalities

- **Text**: Full instruction-following and chat
- **Image**: Vision understanding via SigLIP encoder (280 soft tokens per image)
- **Video**: Gemma4VideoProcessor (32 frames, pooled)

> Note: This 26B MoE variant does not include audio support. For audio, see the 31B dense variant.

## PRISM-DQ Quantization

This MLX model uses **PRISM-PRO Dynamic Quantization** — a per-tensor-class mixed-precision allocation that assigns different quantization types to different tensor classes based on weight structure sensitivity.

Unlike uniform quantization (Q4, Q6, Q8), PRISM-DQ analyzes each tensor class's sensitivity and allocates precision where it matters most. Attention projections receive higher precision than FFN layers, with block-level overrides that protect critical layers.

The model's `config.json` contains per-tensor quantization overrides that mlx-vlm loads natively — no custom runtime required. The compiled Metal kernels automatically handle mixed-precision tensors in a single forward pass at full GPU speed.


## Usage

### mlx-vlm (CLI)
```bash
pip install mlx-vlm

# Interactive chat
mlx_vlm.chat --model Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX \
  --temperature 0.7 --max-tokens 2048 --max-kv-size 8192

# Vision prompt
python -m mlx_vlm.generate \
  --model Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX \
  --image path/to/image.jpg \
  --prompt "Describe this image in detail." \
  --max-tokens 500
```

### Python API
```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX")
config = model.config

prompt = apply_chat_template(
    processor, config,
    "Describe this scene.",
    num_images=1
)
response = generate(
    model, processor, prompt,
    image=["path/to/image.jpg"],
    max_tokens=500, temperature=0.7
)
print(response)
```

## Refusal & Bias Removal

This model has been treated to remove bias, over-refusals and propaganda from the base google/gemma-4-26B-A4B-it using the State of The Art PRISM pipeline.

## License

Apache 2.0 (inherited from google/gemma-4-26B-A4B-it)

## Credits

- Creator: [Ex0bit](https://hf.co/Ex0bit)
- Base model: [Google DeepMind](https://deepmind.google/models/gemma/gemma-4/)
- Quantization engine: PRISM-DQ by [Ex0bit](https://hf.co/Ex0bit)