| --- |
| license: apache-2.0 |
| tags: |
| - mnn |
| - qwen3 |
| - mobile |
| - on-device |
| - tokforge |
| - abliterated |
| base_model: Qwen/Qwen3-8B |
| --- |
| |
| # Qwen3-8B-abliterated-v2 (MNN) |
|
|
| Pre-converted [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) abliterated model in MNN format for on-device inference. |
|
|
| ## Model Details |
| - **Architecture:** Qwen3 (standard attention, 36 layers) |
| - **Parameters:** 8B (4-bit quantized) |
| - **Format:** MNN (Alibaba Mobile Neural Network) |
| - **Vocab:** 151,936 tokens |
| - **Quantization:** W4A16 (4-bit weights, 16-bit activations) |
|
|
| ## Files |
| | File | Size | Description | |
| |------|------|-------------| |
| | `llm.mnn` | 631KB | Model graph | |
| | `llm.mnn.weight` | 4.4GB | Quantized weights | |
| | `embeddings_bf16.bin` | 1.2GB | BF16 embedding table (required) | |
| | `llm_config.json` | 4.5KB | Model config with jinja chat template | |
| | `tokenizer.txt` | 3.0MB | Tokenizer | |
| | `config.json` | 210B | MNN runtime config | |
|
|
| ## Usage with TokForge |
| This model is optimized for [TokForge](https://tokforge.ai) — an Android app for on-device LLM inference. |
|
|
| ### Performance (Speculative Decoding) |
| | Device | SoC | Backend | AR tok/s | Spec Decode tok/s | Uplift | |
| |--------|-----|---------|----------|-------------------|--------| |
| | S26 Ultra | SM8850 | OpenCL | ~14 | 17.8 | +27% | |
| | RedMagic 11 Pro | SM8850 | OpenCL | ~14 | 17.8 | +27% | |
| | Lenovo TB520FU | SM8650 | OpenCL | 9.9 | 12.2 | +23% | |
|
|
| Draft model: [Qwen3-0.6B](https://huggingface.co/darkmaniac7/TokForge-AccelerationPack-Draft) |
|
|
| ## Abliteration |
| This model has been abliterated (safety filters removed) for unrestricted conversation. Use responsibly. |
|
|
| ## Limitations and Intended Use |
|
|
| - Intended for TokForge / MNN on-device inference, especially Android phones and tablets. |
| - The best-known uplift for this model comes from pairing it with a small CPU draft model for speculative decoding. |
| - Real throughput varies by SoC, thermal state, backend, and generation length. |
| - This repo is a runtime bundle, not a standard Transformers training checkpoint. |
|
|
| ## Community |
|
|
| - Website: [tokforge.ai](https://tokforge.ai) |
| - Discord: [Join the Discord](https://discord.gg/Acv3CBtfVm) |
|
|
| ## Export |
| Converted using MNN's `llmexport` pipeline with `--quant_bit 4 --quant_block 128`. |
|
|