File size: 4,311 Bytes
91fb598 76c76fa d22e35b 91fb598 a5ba51d 91fb598 a5ba51d 91fb598 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | ---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
base_model: Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2
tags:
- mnn
- qwen3
- mobile
- on-device
- tokforge
- uncensored
- abliterated
---
# Josiefied-Qwen3-4B-abliterated-v2-MNN
Pre-converted [Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2) in MNN format for on-device inference with [TokForge](https://tokforge.ai).
> **Original model by [Goekdeniz-Guelmez](https://huggingface.co/Goekdeniz-Guelmez)** — converted to MNN Q4 for mobile deployment.
## Model Details
| | |
|---|---|
| **Architecture** | Qwen3 (standard multi-head attention, 36 layers) |
| **Parameters** | 4B (4-bit quantized) |
| **Format** | MNN (Alibaba Mobile Neural Network) |
| **Quantization** | W4A16 (4-bit weights, block size 128) |
| **Vocab** | 151,936 tokens |
| **Source** | [Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2) |
## Description
Josiefied abliterated v2 by Goekdeniz Guelmez — refined 4B Qwen3 with abliterated safety filters. The v2 iteration improves on the original with better uncensoring and instruction following. Great balance of speed and quality for everyday mobile use.
## Files
| File | Description |
|------|-------------|
| `llm.mnn` | Model computation graph |
| `llm.mnn.weight` | Quantized weight data (Q4, block=128) |
| `llm_config.json` | Model config with Jinja chat template |
| `tokenizer.txt` | Tokenizer vocabulary |
| `config.json` | MNN runtime config |
## Usage with TokForge
This model is optimized for **[TokForge](https://tokforge.ai)** — a free Android app for private, on-device LLM inference.
1. Download [TokForge from the Play Store](https://tokforge.ai)
2. Open the app → Models → Download this model
3. Start chatting — runs 100% locally, no internet required
### Recommended Settings
| Setting | Value |
|---------|-------|
| Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
| Precision | Low |
| Threads | 4 |
| Thinking | Off (or On for thinking-capable models) |
### Speculative Decoding
Pair with the [TokForge Acceleration Pack](https://huggingface.co/darkmaniac7/TokForge-AccelerationPack-Draft) for **+20-38% faster generation** on supported devices.
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | OpenCL | **22.4 tok/s** |
| Lenovo TB520FU | SM8650 (Snapdragon 8 Gen 3) | OpenCL | **16.9 tok/s** |
| OnePlus Ace 5 Ultra | D9400+ (Dimensity 9400) | OpenCL | **15.9 tok/s** |
| Xiaomi Pad 7 Pro | SM8635 (Snapdragon 7+ Gen 3) | OpenCL | **9.3 tok/s** |
## Performance
Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:
| Device | SoC | Backend | Approx. tok/s |
|---|---|---|---|
| SM8850 (RedMagic) | Snapdragon 8 Elite 2 | OpenCL | ~17-24 tok/s |
| SM8650 (Lenovo) | Snapdragon 8 Gen 3 | OpenCL | ~15-17 tok/s |
| SM8635 (Xiaomi) | Snapdragon 7+ Gen 3 | OpenCL | ~9-12 tok/s |
| D9400+ (OnePlus) | Dimensity 9400 | OpenCL | ~9-15 tok/s |
## Attribution
This is an MNN conversion of **[Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2)** by **[Goekdeniz-Guelmez](https://huggingface.co/Goekdeniz-Guelmez)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.
## Limitations
- Intended for TokForge / MNN on-device inference on Android
- This is a runtime bundle, not a standard Transformers training checkpoint
- Quantization (Q4) may slightly reduce quality compared to the full-precision original
- Abliterated/uncensored models have had safety filters removed — **use responsibly**
## Community
- **Website:** [tokforge.ai](https://tokforge.ai)
- **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm)
- **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium)
## Export Details
Converted using MNN's `llmexport` pipeline:
```bash
python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2 --export mnn --quant_bit 4 --quant_block 128
```
|