| --- |
| license: apache-2.0 |
| language: |
| - en |
| pipeline_tag: text-generation |
| base_model: mistralai/Mistral-7B-Instruct-v0.3 |
| tags: |
| - mnn |
| - mistral |
| - mobile |
| - on-device |
| - tokforge |
| - uncensored |
| - abliterated |
| --- |
| |
| # Mistral-7B-Instruct-v0.3-MNN |
|
|
| Pre-converted [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) in MNN format for on-device inference with [TokForge](https://tokforge.ai). |
|
|
| > **Original model by [Mistral AI](https://huggingface.co/Mistral AI)** β converted to MNN Q4 for mobile deployment. |
|
|
| ## Model Details |
|
|
| | | | |
| |---|---| |
| | **Architecture** | Mistral (sliding window attention, 32 layers, GQA) | |
| | **Parameters** | 7B (4-bit quantized) | |
| | **Format** | MNN (Alibaba Mobile Neural Network) | |
| | **Quantization** | W4A16 (4-bit weights, block size 128) | |
| | **Vocab** | 32,768 tokens | |
| | **Source** | [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | |
|
|
| ## Description |
|
|
| Mistral AI's 7B Instruct v0.3 β the model that started the open-source LLM revolution. Updated to v0.3 with extended vocabulary and function calling support. One of the most popular and battle-tested open models. Excellent balance of quality and speed. |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `llm.mnn` | Model computation graph | |
| | `llm.mnn.weight` | Quantized weight data (Q4, block=128) | |
| | `llm_config.json` | Model config with Jinja chat template | |
| | `tokenizer.txt` | Tokenizer vocabulary | |
| | `config.json` | MNN runtime config | |
|
|
| ## Usage with TokForge |
|
|
| This model is optimized for **[TokForge](https://tokforge.ai)** β a free Android app for private, on-device LLM inference. |
|
|
| 1. Download [TokForge from the Play Store](https://tokforge.ai) |
| 2. Open the app β Models β Download this model |
| 3. Start chatting β runs 100% locally, no internet required |
|
|
| ### Recommended Settings |
|
|
| | Setting | Value | |
| |---------|-------| |
| | Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) | |
| | Precision | Low | |
| | Threads | 4 | |
| | Thinking | Off (or On for thinking-capable models) | |
|
|
|
|
|
|
| ## Performance |
|
|
| Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size: |
|
|
| | Device | SoC | Backend | tok/s | |
| |---|---|---|---| |
| | RedMagic 11 Pro | SM8850 | OpenCL | ~15-18 tok/s | |
| | Lenovo TB520FU | SM8650 | OpenCL | ~10-12 tok/s | |
|
|
| ## Attribution |
|
|
| This is an MNN conversion of **[Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)** by **[Mistral AI](https://huggingface.co/Mistral AI)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment. |
|
|
| ## Limitations |
|
|
| - Intended for TokForge / MNN on-device inference on Android |
| - This is a runtime bundle, not a standard Transformers training checkpoint |
| - Quantization (Q4) may slightly reduce quality compared to the full-precision original |
| - Abliterated/uncensored models have had safety filters removed β **use responsibly** |
|
|
| ## Community |
|
|
| - **Website:** [tokforge.ai](https://tokforge.ai) |
| - **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm) |
| - **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium) |
|
|
| ## Export Details |
|
|
| Converted using MNN's `llmexport` pipeline: |
| ```bash |
| python llmexport.py --path mistralai/Mistral-7B-Instruct-v0.3 --export mnn --quant_bit 4 --quant_block 128 |
| ``` |
|
|