Add README.md (Q4+int4 PLE, requires TokForge 3.4.9)

2ccd743 verified 3 months ago

3.5 kB

license: gemma
language:
  - en
pipeline_tag: text-generation
base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
tags:
  - mnn
  - gemma4
  - gemma-4
  - mobile
  - on-device
  - tokforge

Huihui-gemma-4-E4B-it-abliterated-MNN

Pre-converted Huihui Gemma 4 E4B Abliterated in MNN format for on-device inference with TokForge.

Original model by huihui-ai — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.

⚠️ REQUIRES TOKFORGE 3.4.9

This model requires TokForge 3.4.9 or later — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.

TokForge 3.4.9 release: Coming tomorrow
Download: tokforge.ai
Community: TokForge Discord

huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.

Model Details

Field	Value
Architecture	Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings)
Parameters	E4B (3B active params, ~4B effective)
Vocab Size	262,144
Weight Quantization	MNN Q4 (128-block)
PLE Quantization	int4
Total Size	4.3 GB

Performance

Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)

Device	SoC	Backend	tok/s
RedMagic 11 Pro	SM8850 (Snapdragon 8 Elite 2)	CPU	~15-16 tok/s (est, based on E4B baseline)

Why CPU? Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.

Files

File	Purpose
`llm.mnn`	Model graph
`llm.mnn.weight`	Q4 quantized weights
`per_layer_embeddings_int4.bin`	Per-Layer Embeddings (int4)
`embeddings_int4.bin`	Token embeddings (int4)
`tokenizer.txt`	BPE tokens (262K vocab)
`llm_config.json`	Runtime config + jinja chat template
`config.json`	Device backend defaults

Usage in TokForge

Requires TokForge 3.4.9+ (releases tomorrow).

Update to TokForge 3.4.9 from tokforge.ai
In Models tab, add this model via HuggingFace repo ID: darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN
Load the model, select CPU backend (recommended for Gemma 4)
Start chatting

Attribution

Base model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated by huihui-ai
MNN conversion: darkmaniac7 for TokForge
MNN framework: alibaba/MNN + TokForge fork (Gemma 4 runtime)

License

Gemma Community License — see the base model for full terms.

darkmaniac7
/

Huihui-gemma-4-E4B-it-abliterated-MNN