Add MNN Q4 conversion for TokForge mobile inference

5e95543 verified 3 months ago

3.44 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	base_model: mistralai/Mistral-7B-Instruct-v0.3
	tags:
	- mnn
	- mistral
	- mobile
	- on-device
	- tokforge
	- uncensored
	- abliterated
	---

	# Mistral-7B-Instruct-v0.3-MNN

	Pre-converted [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) in MNN format for on-device inference with [TokForge](https://tokforge.ai).

	> Original model by [Mistral AI](https://huggingface.co/Mistral AI) — converted to MNN Q4 for mobile deployment.

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Architecture \| Mistral (sliding window attention, 32 layers, GQA) \|
	\| Parameters \| 7B (4-bit quantized) \|
	\| Format \| MNN (Alibaba Mobile Neural Network) \|
	\| Quantization \| W4A16 (4-bit weights, block size 128) \|
	\| Vocab \| 32,768 tokens \|
	\| Source \| [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) \|

	## Description

	Mistral AI's 7B Instruct v0.3 — the model that started the open-source LLM revolution. Updated to v0.3 with extended vocabulary and function calling support. One of the most popular and battle-tested open models. Excellent balance of quality and speed.

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `llm.mnn` \| Model computation graph \|
	\| `llm.mnn.weight` \| Quantized weight data (Q4, block=128) \|
	\| `llm_config.json` \| Model config with Jinja chat template \|
	\| `tokenizer.txt` \| Tokenizer vocabulary \|
	\| `config.json` \| MNN runtime config \|

	## Usage with TokForge

	This model is optimized for [TokForge](https://tokforge.ai) — a free Android app for private, on-device LLM inference.

	1. Download [TokForge from the Play Store](https://tokforge.ai)
	2. Open the app → Models → Download this model
	3. Start chatting — runs 100% locally, no internet required

	### Recommended Settings

	\| Setting \| Value \|
	\|---------\|-------\|
	\| Backend \| OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) \|
	\| Precision \| Low \|
	\| Threads \| 4 \|
	\| Thinking \| Off (or On for thinking-capable models) \|



	## Performance

	Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

	\| Device \| SoC \| Backend \| tok/s \|
	\|---\|---\|---\|---\|
	\| RedMagic 11 Pro \| SM8850 \| OpenCL \| ~15-18 tok/s \|
	\| Lenovo TB520FU \| SM8650 \| OpenCL \| ~10-12 tok/s \|

	## Attribution

	This is an MNN conversion of [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) by [Mistral AI](https://huggingface.co/Mistral AI). All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

	## Limitations

	- Intended for TokForge / MNN on-device inference on Android
	- This is a runtime bundle, not a standard Transformers training checkpoint
	- Quantization (Q4) may slightly reduce quality compared to the full-precision original
	- Abliterated/uncensored models have had safety filters removed — use responsibly

	## Community

	- Website: [tokforge.ai](https://tokforge.ai)
	- Discord: [Join our Discord](https://discord.gg/Acv3CBtfVm)
	- GitHub: [TokForge on GitHub](https://github.com/darkmaniac7/Elysium)

	## Export Details

	Converted using MNN's `llmexport` pipeline:
	```bash
	python llmexport.py --path mistralai/Mistral-7B-Instruct-v0.3 --export mnn --quant_bit 4 --quant_block 128
	```