--- license: gemma language: - en pipeline_tag: text-generation base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated tags: - mnn - gemma4 - gemma-4 - mobile - on-device - tokforge --- # Huihui-gemma-4-E4B-it-abliterated-MNN Pre-converted [Huihui Gemma 4 E4B Abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) in MNN format for on-device inference with [TokForge](https://tokforge.ai). > **Original model by [huihui-ai](https://huggingface.co/huihui-ai)** — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment. ## ⚠️ REQUIRES TOKFORGE 3.4.9 **This model requires TokForge 3.4.9 or later** — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model. - **TokForge 3.4.9 release**: Coming tomorrow - **Download**: [tokforge.ai](https://tokforge.ai) - **Community**: [TokForge Discord](https://discord.gg/EDmD8tspGu) huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance. ## Model Details | Field | Value | |---|---| | **Architecture** | Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) | | **Parameters** | E4B (3B active params, ~4B effective) | | **Vocab Size** | 262,144 | | **Weight Quantization** | MNN Q4 (128-block) | | **PLE Quantization** | int4 | | **Total Size** | 4.3 GB | ## Performance ### Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC) | Device | SoC | Backend | tok/s | |---|---|---|---| | RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | CPU | ~15-16 tok/s (est, based on E4B baseline) | > **Why CPU?** Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models. ## Files | File | Purpose | |---|---| | `llm.mnn` | Model graph | | `llm.mnn.weight` | Q4 quantized weights | | `per_layer_embeddings_int4.bin` | Per-Layer Embeddings (int4) | | `embeddings_int4.bin` | Token embeddings (int4) | | `tokenizer.txt` | BPE tokens (262K vocab) | | `llm_config.json` | Runtime config + jinja chat template | | `config.json` | Device backend defaults | ## Usage in TokForge **Requires TokForge 3.4.9+** (releases tomorrow). 1. Update to TokForge 3.4.9 from [tokforge.ai](https://tokforge.ai) 2. In Models tab, add this model via HuggingFace repo ID: `darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN` 3. Load the model, select **CPU backend** (recommended for Gemma 4) 4. Start chatting ## Attribution - **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) by huihui-ai - **MNN conversion**: [darkmaniac7](https://huggingface.co/darkmaniac7) for TokForge - **MNN framework**: [alibaba/MNN](https://github.com/alibaba/MNN) + [TokForge fork](https://github.com/darkmaniac7/MNN-TokForge) (Gemma 4 runtime) ## Links - **TokForge**: [tokforge.ai](https://tokforge.ai) - **Discord**: [discord.gg/EDmD8tspGu](https://discord.gg/EDmD8tspGu) - **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) ## License Gemma Community License — see the [base model](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) for full terms.