---
license: gemma
language:
  - en
pipeline_tag: text-generation
base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
tags:
  - mnn
  - gemma4
  - gemma-4
  - mobile
  - on-device
  - tokforge
---

# Huihui-gemma-4-E4B-it-abliterated-MNN

Pre-converted [Huihui Gemma 4 E4B Abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) in MNN format for on-device inference with [TokForge](https://tokforge.ai).

> **Original model by [huihui-ai](https://huggingface.co/huihui-ai)** — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.

## ⚠️ REQUIRES TOKFORGE 3.4.9

**This model requires TokForge 3.4.9 or later** — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.

- **TokForge 3.4.9 release**: Coming tomorrow
- **Download**: [tokforge.ai](https://tokforge.ai)
- **Community**: [TokForge Discord](https://discord.gg/EDmD8tspGu)

huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.

## Model Details

| Field | Value |
|---|---|
| **Architecture** | Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) |
| **Parameters** | E4B (3B active params, ~4B effective) |
| **Vocab Size** | 262,144 |
| **Weight Quantization** | MNN Q4 (128-block) |
| **PLE Quantization** | int4 |
| **Total Size** | 4.3 GB |

## Performance

### Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)

| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | CPU | ~15-16 tok/s (est, based on E4B baseline) |

> **Why CPU?** Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.

## Files

| File | Purpose |
|---|---|
| `llm.mnn` | Model graph |
| `llm.mnn.weight` | Q4 quantized weights |
| `per_layer_embeddings_int4.bin` | Per-Layer Embeddings (int4) |
| `embeddings_int4.bin` | Token embeddings (int4) |
| `tokenizer.txt` | BPE tokens (262K vocab) |
| `llm_config.json` | Runtime config + jinja chat template |
| `config.json` | Device backend defaults |

## Usage in TokForge

**Requires TokForge 3.4.9+** (releases tomorrow).

1. Update to TokForge 3.4.9 from [tokforge.ai](https://tokforge.ai)
2. In Models tab, add this model via HuggingFace repo ID: `darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN`
3. Load the model, select **CPU backend** (recommended for Gemma 4)
4. Start chatting

## Attribution

- **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) by huihui-ai
- **MNN conversion**: [darkmaniac7](https://huggingface.co/darkmaniac7) for TokForge
- **MNN framework**: [alibaba/MNN](https://github.com/alibaba/MNN) + [TokForge fork](https://github.com/darkmaniac7/MNN-TokForge) (Gemma 4 runtime)

## Links

- **TokForge**: [tokforge.ai](https://tokforge.ai)
- **Discord**: [discord.gg/EDmD8tspGu](https://discord.gg/EDmD8tspGu)
- **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated)

## License

Gemma Community License — see the [base model](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) for full terms.