--- license: apache-2.0 language: - en pipeline_tag: text-generation base_model: mistralai/Mistral-7B-Instruct-v0.3 tags: - mnn - mistral - mobile - on-device - tokforge - uncensored - abliterated --- # Mistral-7B-Instruct-v0.3-MNN Pre-converted [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) in MNN format for on-device inference with [TokForge](https://tokforge.ai). > **Original model by [Mistral AI](https://huggingface.co/Mistral AI)** — converted to MNN Q4 for mobile deployment. ## Model Details | | | |---|---| | **Architecture** | Mistral (sliding window attention, 32 layers, GQA) | | **Parameters** | 7B (4-bit quantized) | | **Format** | MNN (Alibaba Mobile Neural Network) | | **Quantization** | W4A16 (4-bit weights, block size 128) | | **Vocab** | 32,768 tokens | | **Source** | [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | ## Description Mistral AI's 7B Instruct v0.3 — the model that started the open-source LLM revolution. Updated to v0.3 with extended vocabulary and function calling support. One of the most popular and battle-tested open models. Excellent balance of quality and speed. ## Files | File | Description | |------|-------------| | `llm.mnn` | Model computation graph | | `llm.mnn.weight` | Quantized weight data (Q4, block=128) | | `llm_config.json` | Model config with Jinja chat template | | `tokenizer.txt` | Tokenizer vocabulary | | `config.json` | MNN runtime config | ## Usage with TokForge This model is optimized for **[TokForge](https://tokforge.ai)** — a free Android app for private, on-device LLM inference. 1. Download [TokForge from the Play Store](https://tokforge.ai) 2. Open the app → Models → Download this model 3. Start chatting — runs 100% locally, no internet required ### Recommended Settings | Setting | Value | |---------|-------| | Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) | | Precision | Low | | Threads | 4 | | Thinking | Off (or On for thinking-capable models) | ## Performance Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size: | Device | SoC | Backend | tok/s | |---|---|---|---| | RedMagic 11 Pro | SM8850 | OpenCL | ~15-18 tok/s | | Lenovo TB520FU | SM8650 | OpenCL | ~10-12 tok/s | ## Attribution This is an MNN conversion of **[Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)** by **[Mistral AI](https://huggingface.co/Mistral AI)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment. ## Limitations - Intended for TokForge / MNN on-device inference on Android - This is a runtime bundle, not a standard Transformers training checkpoint - Quantization (Q4) may slightly reduce quality compared to the full-precision original - Abliterated/uncensored models have had safety filters removed — **use responsibly** ## Community - **Website:** [tokforge.ai](https://tokforge.ai) - **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm) - **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium) ## Export Details Converted using MNN's `llmexport` pipeline: ```bash python llmexport.py --path mistralai/Mistral-7B-Instruct-v0.3 --export mnn --quant_bit 4 --quant_block 128 ```