File size: 3,444 Bytes
5e95543 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | ---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
- mnn
- mistral
- mobile
- on-device
- tokforge
- uncensored
- abliterated
---
# Mistral-7B-Instruct-v0.3-MNN
Pre-converted [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) in MNN format for on-device inference with [TokForge](https://tokforge.ai).
> **Original model by [Mistral AI](https://huggingface.co/Mistral AI)** — converted to MNN Q4 for mobile deployment.
## Model Details
| | |
|---|---|
| **Architecture** | Mistral (sliding window attention, 32 layers, GQA) |
| **Parameters** | 7B (4-bit quantized) |
| **Format** | MNN (Alibaba Mobile Neural Network) |
| **Quantization** | W4A16 (4-bit weights, block size 128) |
| **Vocab** | 32,768 tokens |
| **Source** | [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) |
## Description
Mistral AI's 7B Instruct v0.3 — the model that started the open-source LLM revolution. Updated to v0.3 with extended vocabulary and function calling support. One of the most popular and battle-tested open models. Excellent balance of quality and speed.
## Files
| File | Description |
|------|-------------|
| `llm.mnn` | Model computation graph |
| `llm.mnn.weight` | Quantized weight data (Q4, block=128) |
| `llm_config.json` | Model config with Jinja chat template |
| `tokenizer.txt` | Tokenizer vocabulary |
| `config.json` | MNN runtime config |
## Usage with TokForge
This model is optimized for **[TokForge](https://tokforge.ai)** — a free Android app for private, on-device LLM inference.
1. Download [TokForge from the Play Store](https://tokforge.ai)
2. Open the app → Models → Download this model
3. Start chatting — runs 100% locally, no internet required
### Recommended Settings
| Setting | Value |
|---------|-------|
| Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
| Precision | Low |
| Threads | 4 |
| Thinking | Off (or On for thinking-capable models) |
## Performance
Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 | OpenCL | ~15-18 tok/s |
| Lenovo TB520FU | SM8650 | OpenCL | ~10-12 tok/s |
## Attribution
This is an MNN conversion of **[Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)** by **[Mistral AI](https://huggingface.co/Mistral AI)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.
## Limitations
- Intended for TokForge / MNN on-device inference on Android
- This is a runtime bundle, not a standard Transformers training checkpoint
- Quantization (Q4) may slightly reduce quality compared to the full-precision original
- Abliterated/uncensored models have had safety filters removed — **use responsibly**
## Community
- **Website:** [tokforge.ai](https://tokforge.ai)
- **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm)
- **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium)
## Export Details
Converted using MNN's `llmexport` pipeline:
```bash
python llmexport.py --path mistralai/Mistral-7B-Instruct-v0.3 --export mnn --quant_bit 4 --quant_block 128
```
|