---
license: gemma
base_model: google/gemma-4-26B-A4B-it
tags:
  - gguf
  - gemma4
  - uncensored
  - fast
  - llama.cpp
  - apple-silicon
  - conversational
  - korean
  - coding
  - tool-use
language:
  - en
  - ko
pipeline_tag: text-generation
---

# SuperGemma4-26B-Uncensored-Fast GGUF v2

The fast, uncensored `llama.cpp` build of the strongest `SuperGemma` text line.

This release is for people who want three things together:

- a model that feels less censored than stock chat releases
- a model that is more capable than the raw base on practical text workloads
- a compact local GGUF that still serves quickly on Apple Silicon

## Why this build

- Uncensored chat behavior without forcing every prompt into coding mode
- Tuned from the strongest `fast` line instead of the raw base
- Neutral chat template baked into the GGUF to reduce prompt-routing bugs
- Verified on Apple Silicon with clean general-chat and coding responses

## Headline numbers

- Base model: `google/gemma-4-26B-A4B-it`
- Format: `GGUF Q4_K_M`
- General Korean prompt speed: `222.0 tok/s`
- Generation speed: `89.4 tok/s`
- Derived from the verified `SuperGemma Fast` MLX line

## Why this build is appealing

- Carries the stronger `Fast` weights instead of the plain stock base
- Keeps general chat natural instead of routing everything into coding mode
- Preserves the uncensored release identity while staying useful on normal prompts
- Gives you a practical `llama.cpp` deployment target without losing the personality of the tuned line

## Why it is better than stock

- Inherits the `Fast` line improvements over the original local baseline:
  - Quick bench overall: `95.8` vs `91.4`
  - Faster average generation on the MLX reference run: `46.2 tok/s` vs `42.5 tok/s`
  - Higher scores in code, logic, browser workflows, and Korean
- Ships with a neutral embedded template to avoid the older routing bug where simple questions drifted into coding/tool-call behavior

## Included file

- `supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf`

## Quick local checks

Tested on Apple M4 Max with `llama.cpp`:

- General Korean prompt: `봄에 먹기 좋은 한식 반찬 5개 추천`
  - Prompt speed: `222.0 tok/s`
  - Generation speed: `89.4 tok/s`
  - Output stayed in normal Korean assistant mode
- Code prompt: `파이썬으로 피보나치 함수를 짧게 작성해줘`
  - Prompt speed: `704.9 tok/s`
  - Generation speed: `89.4 tok/s`
  - Output returned concise Python code correctly

## Notes

- This GGUF is exported from the `supergemma4-26b-uncensored-fast-v2` MLX line.
- Gemma 4 MoE expert tensors were converted with a patched local converter so GGUF export works correctly.
- A neutral template is embedded to avoid the old issue where general prompts were pushed into coding/tool-call behavior.