darkmaniac7's picture
Add README.md (Q4+int4 PLE, requires TokForge 3.4.9)
2ccd743 verified
|
Raw
History Blame
3.5 kB
---
license: gemma
language:
- en
pipeline_tag: text-generation
base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
tags:
- mnn
- gemma4
- gemma-4
- mobile
- on-device
- tokforge
---
# Huihui-gemma-4-E4B-it-abliterated-MNN
Pre-converted [Huihui Gemma 4 E4B Abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) in MNN format for on-device inference with [TokForge](https://tokforge.ai).
> **Original model by [huihui-ai](https://huggingface.co/huihui-ai)** — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.
## ⚠️ REQUIRES TOKFORGE 3.4.9
**This model requires TokForge 3.4.9 or later** — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.
- **TokForge 3.4.9 release**: Coming tomorrow
- **Download**: [tokforge.ai](https://tokforge.ai)
- **Community**: [TokForge Discord](https://discord.gg/EDmD8tspGu)
huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.
## Model Details
| Field | Value |
|---|---|
| **Architecture** | Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) |
| **Parameters** | E4B (3B active params, ~4B effective) |
| **Vocab Size** | 262,144 |
| **Weight Quantization** | MNN Q4 (128-block) |
| **PLE Quantization** | int4 |
| **Total Size** | 4.3 GB |
## Performance
### Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | CPU | ~15-16 tok/s (est, based on E4B baseline) |
> **Why CPU?** Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.
## Files
| File | Purpose |
|---|---|
| `llm.mnn` | Model graph |
| `llm.mnn.weight` | Q4 quantized weights |
| `per_layer_embeddings_int4.bin` | Per-Layer Embeddings (int4) |
| `embeddings_int4.bin` | Token embeddings (int4) |
| `tokenizer.txt` | BPE tokens (262K vocab) |
| `llm_config.json` | Runtime config + jinja chat template |
| `config.json` | Device backend defaults |
## Usage in TokForge
**Requires TokForge 3.4.9+** (releases tomorrow).
1. Update to TokForge 3.4.9 from [tokforge.ai](https://tokforge.ai)
2. In Models tab, add this model via HuggingFace repo ID: `darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN`
3. Load the model, select **CPU backend** (recommended for Gemma 4)
4. Start chatting
## Attribution
- **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) by huihui-ai
- **MNN conversion**: [darkmaniac7](https://huggingface.co/darkmaniac7) for TokForge
- **MNN framework**: [alibaba/MNN](https://github.com/alibaba/MNN) + [TokForge fork](https://github.com/darkmaniac7/MNN-TokForge) (Gemma 4 runtime)
## Links
- **TokForge**: [tokforge.ai](https://tokforge.ai)
- **Discord**: [discord.gg/EDmD8tspGu](https://discord.gg/EDmD8tspGu)
- **Base model**: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated)
## License
Gemma Community License — see the [base model](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) for full terms.