Add README.md (Q4+int4 PLE, requires TokForge 3.4.9)

2ccd743 verified 3 months ago

3.5 kB

	---
	license: gemma
	language:
	- en
	pipeline_tag: text-generation
	base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
	tags:
	- mnn
	- gemma4
	- gemma-4
	- mobile
	- on-device
	- tokforge
	---

	# Huihui-gemma-4-E4B-it-abliterated-MNN

	Pre-converted [Huihui Gemma 4 E4B Abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) in MNN format for on-device inference with [TokForge](https://tokforge.ai).

	> Original model by [huihui-ai](https://huggingface.co/huihui-ai) — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.

	## ⚠️ REQUIRES TOKFORGE 3.4.9

	This model requires TokForge 3.4.9 or later — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.

	- TokForge 3.4.9 release: Coming tomorrow
	- Download: [tokforge.ai](https://tokforge.ai)
	- Community: [TokForge Discord](https://discord.gg/EDmD8tspGu)

	huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.

	## Model Details

	\| Field \| Value \|
	\|---\|---\|
	\| Architecture \| Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) \|
	\| Parameters \| E4B (3B active params, ~4B effective) \|
	\| Vocab Size \| 262,144 \|
	\| Weight Quantization \| MNN Q4 (128-block) \|
	\| PLE Quantization \| int4 \|
	\| Total Size \| 4.3 GB \|

	## Performance

	### Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)

	\| Device \| SoC \| Backend \| tok/s \|
	\|---\|---\|---\|---\|
	\| RedMagic 11 Pro \| SM8850 (Snapdragon 8 Elite 2) \| CPU \| ~15-16 tok/s (est, based on E4B baseline) \|

	> Why CPU? Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.

	## Files

	\| File \| Purpose \|
	\|---\|---\|
	\| `llm.mnn` \| Model graph \|
	\| `llm.mnn.weight` \| Q4 quantized weights \|
	\| `per_layer_embeddings_int4.bin` \| Per-Layer Embeddings (int4) \|
	\| `embeddings_int4.bin` \| Token embeddings (int4) \|
	\| `tokenizer.txt` \| BPE tokens (262K vocab) \|
	\| `llm_config.json` \| Runtime config + jinja chat template \|
	\| `config.json` \| Device backend defaults \|

	## Usage in TokForge

	Requires TokForge 3.4.9+ (releases tomorrow).

	1. Update to TokForge 3.4.9 from [tokforge.ai](https://tokforge.ai)
	2. In Models tab, add this model via HuggingFace repo ID: `darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN`
	3. Load the model, select CPU backend (recommended for Gemma 4)
	4. Start chatting

	## Attribution

	- Base model: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) by huihui-ai
	- MNN conversion: [darkmaniac7](https://huggingface.co/darkmaniac7) for TokForge
	- MNN framework: [alibaba/MNN](https://github.com/alibaba/MNN) + [TokForge fork](https://github.com/darkmaniac7/MNN-TokForge) (Gemma 4 runtime)

	## Links

	- TokForge: [tokforge.ai](https://tokforge.ai)
	- Discord: [discord.gg/EDmD8tspGu](https://discord.gg/EDmD8tspGu)
	- Base model: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated)

	## License

	Gemma Community License — see the [base model](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) for full terms.

	---
	license: gemma
	language:
	- en
	pipeline_tag: text-generation
	base_model: huihui-ai/Huihui-gemma-4-E4B-it-abliterated
	tags:
	- mnn
	- gemma4
	- gemma-4
	- mobile
	- on-device
	- tokforge
	---

	# Huihui-gemma-4-E4B-it-abliterated-MNN

	Pre-converted [Huihui Gemma 4 E4B Abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) in MNN format for on-device inference with [TokForge](https://tokforge.ai).

	> Original model by [huihui-ai](https://huggingface.co/huihui-ai) — converted to MNN Q4 (with int4 per-layer embeddings) for mobile deployment.

	## ⚠️ REQUIRES TOKFORGE 3.4.9

	This model requires TokForge 3.4.9 or later — the first build with Gemma 4 runtime support. Earlier TokForge versions do NOT include the Gemma 4 CPUAttention implementation and will fail to load this model.

	- TokForge 3.4.9 release: Coming tomorrow
	- Download: [tokforge.ai](https://tokforge.ai)
	- Community: [TokForge Discord](https://discord.gg/EDmD8tspGu)

	huihui-ai's abliterated Gemma 4 E4B — true weight-surgery abliteration with int4 PLE for optimal mobile performance.

	## Model Details

	\| Field \| Value \|
	\|---\|---\|
	\| Architecture \| Gemma 4 (shared-KV attention, 35 layers, per-layer embeddings) \|
	\| Parameters \| E4B (3B active params, ~4B effective) \|
	\| Vocab Size \| 262,144 \|
	\| Weight Quantization \| MNN Q4 (128-block) \|
	\| PLE Quantization \| int4 \|
	\| Total Size \| 4.3 GB \|

	## Performance

	### Estimated performance (not yet directly benchmarked on this exact variant — extrapolated from same-size Gemma 4 variants on the same SoC)

	\| Device \| SoC \| Backend \| tok/s \|
	\|---\|---\|---\|---\|
	\| RedMagic 11 Pro \| SM8850 (Snapdragon 8 Elite 2) \| CPU \| ~15-16 tok/s (est, based on E4B baseline) \|

	> Why CPU? Gemma 4's per-layer embeddings (PLE) architecture benefits more from CPU's direct memory access than OpenCL's GPU memory transfer overhead. CPU is the recommended backend for all Gemma 4 models.

	## Files

	\| File \| Purpose \|
	\|---\|---\|
	\| `llm.mnn` \| Model graph \|
	\| `llm.mnn.weight` \| Q4 quantized weights \|
	\| `per_layer_embeddings_int4.bin` \| Per-Layer Embeddings (int4) \|
	\| `embeddings_int4.bin` \| Token embeddings (int4) \|
	\| `tokenizer.txt` \| BPE tokens (262K vocab) \|
	\| `llm_config.json` \| Runtime config + jinja chat template \|
	\| `config.json` \| Device backend defaults \|

	## Usage in TokForge

	Requires TokForge 3.4.9+ (releases tomorrow).

	1. Update to TokForge 3.4.9 from [tokforge.ai](https://tokforge.ai)
	2. In Models tab, add this model via HuggingFace repo ID: `darkmaniac7/Huihui-gemma-4-E4B-it-abliterated-MNN`
	3. Load the model, select CPU backend (recommended for Gemma 4)
	4. Start chatting

	## Attribution

	- Base model: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) by huihui-ai
	- MNN conversion: [darkmaniac7](https://huggingface.co/darkmaniac7) for TokForge
	- MNN framework: [alibaba/MNN](https://github.com/alibaba/MNN) + [TokForge fork](https://github.com/darkmaniac7/MNN-TokForge) (Gemma 4 runtime)

	## Links

	- TokForge: [tokforge.ai](https://tokforge.ai)
	- Discord: [discord.gg/EDmD8tspGu](https://discord.gg/EDmD8tspGu)
	- Base model: [huihui-ai/Huihui-gemma-4-E4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated)

	## License

	Gemma Community License — see the [base model](https://huggingface.co/huihui-ai/Huihui-gemma-4-E4B-it-abliterated) for full terms.