darkmaniac7's picture
Add README.md (Q4+int4 PLE, requires TokForge 3.4.9)
2ccd743 verified
|
Raw
History Blame
3.5 kB
metadata
license: gemma
language:
  - en
pipeline_tag: text-generation
base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
tags:
  - mnn
  - gemma4
  - gemma-4
  - mobile
  - on-device
  - tokforge

Huihui-gemma-4-E4B-it-abliterated-MNN

Pre-converted Huihui Gemma 4 E4B Abliterated in MNN format for on-device inference with TokForge.

Original model by huihui-ai — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.

⚠️ REQUIRES TOKFORGE 3.4.9

This model requires TokForge 3.4.9 or later — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.

huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.

Model Details

Field Value
Architecture Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings)
Parameters E4B (3B active params, ~4B effective)
Vocab Size 262,144
Weight Quantization MNN Q4 (128-block)
PLE Quantization int4
Total Size 4.3 GB

Performance

Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)

Device SoC Backend tok/s
RedMagic 11 Pro SM8850 (Snapdragon 8 Elite 2) CPU ~15-16 tok/s (est, based on E4B baseline)

Why CPU? Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.

Files

File Purpose
llm.mnn Model graph
llm.mnn.weight Q4 quantized weights
per_layer_embeddings_int4.bin Per-Layer Embeddings (int4)
embeddings_int4.bin Token embeddings (int4)
tokenizer.txt BPE tokens (262K vocab)
llm_config.json Runtime config + jinja chat template
config.json Device backend defaults

Usage in TokForge

Requires TokForge 3.4.9+ (releases tomorrow).

  1. Update to TokForge 3.4.9 from tokforge.ai
  2. In Models tab, add this model via HuggingFace repo ID: darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN
  3. Load the model, select CPU backend (recommended for Gemma 4)
  4. Start chatting

Attribution

Links

License

Gemma Community License — see the base model for full terms.