license: gemma
language:
- en
pipeline_tag: text-generation
base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
tags:
- mnn
- gemma4
- gemma-4
- mobile
- on-device
- tokforge
Huihui-gemma-4-E4B-it-abliterated-MNN
Pre-converted Huihui Gemma 4 E4B Abliterated in MNN format for on-device inference with TokForge.
Original model by huihui-ai — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.
⚠️ REQUIRES TOKFORGE 3.4.9
This model requires TokForge 3.4.9 or later — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.
- TokForge 3.4.9 release: Coming tomorrow
- Download: tokforge.ai
- Community: TokForge Discord
huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.
Model Details
| Field | Value |
|---|---|
| Architecture | Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) |
| Parameters | E4B (3B active params, ~4B effective) |
| Vocab Size | 262,144 |
| Weight Quantization | MNN Q4 (128-block) |
| PLE Quantization | int4 |
| Total Size | 4.3 GB |
Performance
Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | CPU | ~15-16 tok/s (est, based on E4B baseline) |
Why CPU? Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.
Files
| File | Purpose |
|---|---|
llm.mnn |
Model graph |
llm.mnn.weight |
Q4 quantized weights |
per_layer_embeddings_int4.bin |
Per-Layer Embeddings (int4) |
embeddings_int4.bin |
Token embeddings (int4) |
tokenizer.txt |
BPE tokens (262K vocab) |
llm_config.json |
Runtime config + jinja chat template |
config.json |
Device backend defaults |
Usage in TokForge
Requires TokForge 3.4.9+ (releases tomorrow).
- Update to TokForge 3.4.9 from tokforge.ai
- In Models tab, add this model via HuggingFace repo ID:
darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN - Load the model, select CPU backend (recommended for Gemma 4)
- Start chatting
Attribution
- Base model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated by huihui-ai
- MNN conversion: darkmaniac7 for TokForge
- MNN framework: alibaba/MNN + TokForge fork (Gemma 4 runtime)
Links
- TokForge: tokforge.ai
- Discord: discord.gg/EDmD8tspGu
- Base model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
License
Gemma Community License — see the base model for full terms.