File size: 3,444 Bytes

5e95543

---
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
  - mnn
  - mistral
  - mobile
  - on-device
  - tokforge
  - uncensored
  - abliterated
---

# Mistral-7B-Instruct-v0.3-MNN

Pre-converted [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) in MNN format for on-device inference with [TokForge](https://tokforge.ai).

> **Original model by [Mistral AI](https://huggingface.co/Mistral AI)** — converted to MNN Q4 for mobile deployment.

## Model Details

| | |
|---|---|
| **Architecture** | Mistral (sliding window attention, 32 layers, GQA) |
| **Parameters** | 7B (4-bit quantized) |
| **Format** | MNN (Alibaba Mobile Neural Network) |
| **Quantization** | W4A16 (4-bit weights, block size 128) |
| **Vocab** | 32,768 tokens |
| **Source** | [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) |

## Description

Mistral AI's 7B Instruct v0.3 — the model that started the open-source LLM revolution. Updated to v0.3 with extended vocabulary and function calling support. One of the most popular and battle-tested open models. Excellent balance of quality and speed.

## Files

| File | Description |
|------|-------------|
| `llm.mnn` | Model computation graph |
| `llm.mnn.weight` | Quantized weight data (Q4, block=128) |
| `llm_config.json` | Model config with Jinja chat template |
| `tokenizer.txt` | Tokenizer vocabulary |
| `config.json` | MNN runtime config |

## Usage with TokForge

This model is optimized for **[TokForge](https://tokforge.ai)** — a free Android app for private, on-device LLM inference.

1. Download [TokForge from the Play Store](https://tokforge.ai)
2. Open the app → Models → Download this model
3. Start chatting — runs 100% locally, no internet required

### Recommended Settings

| Setting | Value |
|---------|-------|
| Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
| Precision | Low |
| Threads | 4 |
| Thinking | Off (or On for thinking-capable models) |



## Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 | OpenCL | ~15-18 tok/s |
| Lenovo TB520FU | SM8650 | OpenCL | ~10-12 tok/s |

## Attribution

This is an MNN conversion of **[Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)** by **[Mistral AI](https://huggingface.co/Mistral AI)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

## Limitations

- Intended for TokForge / MNN on-device inference on Android
- This is a runtime bundle, not a standard Transformers training checkpoint
- Quantization (Q4) may slightly reduce quality compared to the full-precision original
- Abliterated/uncensored models have had safety filters removed — **use responsibly**

## Community

- **Website:** [tokforge.ai](https://tokforge.ai)
- **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm)
- **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium)

## Export Details

Converted using MNN's `llmexport` pipeline:
```bash
python llmexport.py --path mistralai/Mistral-7B-Instruct-v0.3 --export mnn --quant_bit 4 --quant_block 128
```