darkmaniac7's picture
Add MNN Q4 conversion for TokForge mobile inference
5e95543 verified
|
Raw
History Blame Contribute Delete
3.44 kB
metadata
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
  - mnn
  - mistral
  - mobile
  - on-device
  - tokforge
  - uncensored
  - abliterated

Mistral-7B-Instruct-v0.3-MNN

Pre-converted Mistral 7B Instruct v0.3 in MNN format for on-device inference with TokForge.

Original model by [Mistral AI](https://huggingface.co/Mistral AI) — converted to MNN Q4 for mobile deployment.

Model Details

Architecture Mistral (sliding window attention, 32 layers, GQA)
Parameters 7B (4-bit quantized)
Format MNN (Alibaba Mobile Neural Network)
Quantization W4A16 (4-bit weights, block size 128)
Vocab 32,768 tokens
Source mistralai/Mistral-7B-Instruct-v0.3

Description

Mistral AI's 7B Instruct v0.3 — the model that started the open-source LLM revolution. Updated to v0.3 with extended vocabulary and function calling support. One of the most popular and battle-tested open models. Excellent balance of quality and speed.

Files

File Description
llm.mnn Model computation graph
llm.mnn.weight Quantized weight data (Q4, block=128)
llm_config.json Model config with Jinja chat template
tokenizer.txt Tokenizer vocabulary
config.json MNN runtime config

Usage with TokForge

This model is optimized for TokForge — a free Android app for private, on-device LLM inference.

  1. Download TokForge from the Play Store
  2. Open the app → Models → Download this model
  3. Start chatting — runs 100% locally, no internet required

Recommended Settings

Setting Value
Backend OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback)
Precision Low
Threads 4
Thinking Off (or On for thinking-capable models)

Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

Device SoC Backend tok/s
RedMagic 11 Pro SM8850 OpenCL ~15-18 tok/s
Lenovo TB520FU SM8650 OpenCL ~10-12 tok/s

Attribution

This is an MNN conversion of Mistral 7B Instruct v0.3 by [Mistral AI](https://huggingface.co/Mistral AI). All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

Limitations

  • Intended for TokForge / MNN on-device inference on Android
  • This is a runtime bundle, not a standard Transformers training checkpoint
  • Quantization (Q4) may slightly reduce quality compared to the full-precision original
  • Abliterated/uncensored models have had safety filters removed — use responsibly

Community

Export Details

Converted using MNN's llmexport pipeline:

python llmexport.py --path mistralai/Mistral-7B-Instruct-v0.3 --export mnn --quant_bit 4 --quant_block 128