--- license: apache-2.0 tags: - mnn - qwen3 - mobile - on-device - tokforge - abliterated base_model: Qwen/Qwen3-8B --- # Qwen3-8B-abliterated-v2 (MNN) Pre-converted [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) abliterated model in MNN format for on-device inference. ## Model Details - **Architecture:** Qwen3 (standard attention, 36 layers) - **Parameters:** 8B (4-bit quantized) - **Format:** MNN (Alibaba Mobile Neural Network) - **Vocab:** 151,936 tokens - **Quantization:** W4A16 (4-bit weights, 16-bit activations) ## Files | File | Size | Description | |------|------|-------------| | `llm.mnn` | 631KB | Model graph | | `llm.mnn.weight` | 4.4GB | Quantized weights | | `embeddings_bf16.bin` | 1.2GB | BF16 embedding table (required) | | `llm_config.json` | 4.5KB | Model config with jinja chat template | | `tokenizer.txt` | 3.0MB | Tokenizer | | `config.json` | 210B | MNN runtime config | ## Usage with TokForge This model is optimized for [TokForge](https://tokforge.ai) — an Android app for on-device LLM inference. ### Performance (Speculative Decoding) | Device | SoC | Backend | AR tok/s | Spec Decode tok/s | Uplift | |--------|-----|---------|----------|-------------------|--------| | S26 Ultra | SM8850 | OpenCL | ~14 | 17.8 | +27% | | RedMagic 11 Pro | SM8850 | OpenCL | ~14 | 17.8 | +27% | | Lenovo TB520FU | SM8650 | OpenCL | 9.9 | 12.2 | +23% | Draft model: [Qwen3-0.6B](https://huggingface.co/darkmaniac7/TokForge-AccelerationPack-Draft) ## Abliteration This model has been abliterated (safety filters removed) for unrestricted conversation. Use responsibly. ## Limitations and Intended Use - Intended for TokForge / MNN on-device inference, especially Android phones and tablets. - The best-known uplift for this model comes from pairing it with a small CPU draft model for speculative decoding. - Real throughput varies by SoC, thermal state, backend, and generation length. - This repo is a runtime bundle, not a standard Transformers training checkpoint. ## Community - Website: [tokforge.ai](https://tokforge.ai) - Discord: [Join the Discord](https://discord.gg/Acv3CBtfVm) ## Export Converted using MNN's `llmexport` pipeline with `--quant_bit 4 --quant_block 128`.