| --- |
| license: gemma |
| language: |
| - en |
| pipeline_tag: text-generation |
| base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated |
| tags: |
| - mnn |
| - gemma4 |
| - gemma-4 |
| - mobile |
| - on-device |
| - tokforge |
| --- |
| |
| # Huihui-gemma-4-E4B-it-abliterated-MNN |
|
|
| Pre-converted [Huihui Gemma 4 E4B Abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) in MNN format for on-device inference with [TokForge](https://tokforge.ai). |
|
|
| > **Original model by [huihui-ai](https://huggingface.co/huihui-ai)** — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment. |
|
|
| ## ⚠️ REQUIRES TOKFORGE 3.4.9 |
|
|
| **This model requires TokForge 3.4.9 or later** — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model. |
|
|
| - **TokForge 3.4.9 release**: Coming tomorrow |
| - **Download**: [tokforge.ai](https://tokforge.ai) |
| - **Community**: [TokForge Discord](https://discord.gg/EDmD8tspGu) |
|
|
| huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance. |
|
|
| ## Model Details |
|
|
| | Field | Value | |
| |---|---| |
| | **Architecture** | Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) | |
| | **Parameters** | E4B (3B active params, ~4B effective) | |
| | **Vocab Size** | 262,144 | |
| | **Weight Quantization** | MNN Q4 (128-block) | |
| | **PLE Quantization** | int4 | |
| | **Total Size** | 4.3 GB | |
|
|
| ## Performance |
|
|
| ### Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC) |
|
|
| | Device | SoC | Backend | tok/s | |
| |---|---|---|---| |
| | RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | CPU | ~15-16 tok/s (est, based on E4B baseline) | |
|
|
| > **Why CPU?** Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models. |
|
|
| ## Files |
|
|
| | File | Purpose | |
| |---|---| |
| | `llm.mnn` | Model graph | |
| | `llm.mnn.weight` | Q4 quantized weights | |
| | `per_layer_embeddings_int4.bin` | Per-Layer Embeddings (int4) | |
| | `embeddings_int4.bin` | Token embeddings (int4) | |
| | `tokenizer.txt` | BPE tokens (262K vocab) | |
| | `llm_config.json` | Runtime config + jinja chat template | |
| | `config.json` | Device backend defaults | |
|
|
| ## Usage in TokForge |
|
|
| **Requires TokForge 3.4.9+** (releases tomorrow). |
|
|
| 1. Update to TokForge 3.4.9 from [tokforge.ai](https://tokforge.ai) |
| 2. In Models tab, add this model via HuggingFace repo ID: `darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN` |
| 3. Load the model, select **CPU backend** (recommended for Gemma 4) |
| 4. Start chatting |
|
|
| ## Attribution |
|
|
| - **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) by huihui-ai |
| - **MNN conversion**: [darkmaniac7](https://huggingface.co/darkmaniac7) for TokForge |
| - **MNN framework**: [alibaba/MNN](https://github.com/alibaba/MNN) + [TokForge fork](https://github.com/darkmaniac7/MNN-TokForge) (Gemma 4 runtime) |
|
|
| ## Links |
|
|
| - **TokForge**: [tokforge.ai](https://tokforge.ai) |
| - **Discord**: [discord.gg/EDmD8tspGu](https://discord.gg/EDmD8tspGu) |
| - **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) |
|
|
| ## License |
|
|
| Gemma Community License — see the [base model](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) for full terms. |
|
|