File size: 4,311 Bytes
91fb598
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76c76fa
 
 
d22e35b
 
 
91fb598
 
 
 
 
a5ba51d
91fb598
a5ba51d
 
 
 
91fb598
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
base_model: Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2
tags:
  - mnn
  - qwen3
  - mobile
  - on-device
  - tokforge
  - uncensored
  - abliterated
---

# Josiefied-Qwen3-4B-abliterated-v2-MNN

Pre-converted [Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2) in MNN format for on-device inference with [TokForge](https://tokforge.ai).

> **Original model by [Goekdeniz-Guelmez](https://huggingface.co/Goekdeniz-Guelmez)** — converted to MNN Q4 for mobile deployment.

## Model Details

| | |
|---|---|
| **Architecture** | Qwen3 (standard multi-head attention, 36 layers) |
| **Parameters** | 4B (4-bit quantized) |
| **Format** | MNN (Alibaba Mobile Neural Network) |
| **Quantization** | W4A16 (4-bit weights, block size 128) |
| **Vocab** | 151,936 tokens |
| **Source** | [Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2) |

## Description

Josiefied abliterated v2 by Goekdeniz Guelmez — refined 4B Qwen3 with abliterated safety filters. The v2 iteration improves on the original with better uncensoring and instruction following. Great balance of speed and quality for everyday mobile use.

## Files

| File | Description |
|------|-------------|
| `llm.mnn` | Model computation graph |
| `llm.mnn.weight` | Quantized weight data (Q4, block=128) |
| `llm_config.json` | Model config with Jinja chat template |
| `tokenizer.txt` | Tokenizer vocabulary |
| `config.json` | MNN runtime config |

## Usage with TokForge

This model is optimized for **[TokForge](https://tokforge.ai)** — a free Android app for private, on-device LLM inference.

1. Download [TokForge from the Play Store](https://tokforge.ai)
2. Open the app → Models → Download this model
3. Start chatting — runs 100% locally, no internet required

### Recommended Settings

| Setting | Value |
|---------|-------|
| Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
| Precision | Low |
| Threads | 4 |
| Thinking | Off (or On for thinking-capable models) |

### Speculative Decoding

Pair with the [TokForge Acceleration Pack](https://huggingface.co/darkmaniac7/TokForge-AccelerationPack-Draft) for **+20-38% faster generation** on supported devices.

| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | OpenCL | **22.4 tok/s** |
| Lenovo TB520FU | SM8650 (Snapdragon 8 Gen 3) | OpenCL | **16.9 tok/s** |
| OnePlus Ace 5 Ultra | D9400+ (Dimensity 9400) | OpenCL | **15.9 tok/s** |
| Xiaomi Pad 7 Pro | SM8635 (Snapdragon 7+ Gen 3) | OpenCL | **9.3 tok/s** |

## Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

| Device | SoC | Backend | Approx. tok/s |
|---|---|---|---|
| SM8850 (RedMagic) | Snapdragon 8 Elite 2 | OpenCL | ~17-24 tok/s |
| SM8650 (Lenovo) | Snapdragon 8 Gen 3 | OpenCL | ~15-17 tok/s |
| SM8635 (Xiaomi) | Snapdragon 7+ Gen 3 | OpenCL | ~9-12 tok/s |
| D9400+ (OnePlus) | Dimensity 9400 | OpenCL | ~9-15 tok/s |

## Attribution

This is an MNN conversion of **[Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2)** by **[Goekdeniz-Guelmez](https://huggingface.co/Goekdeniz-Guelmez)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

## Limitations

- Intended for TokForge / MNN on-device inference on Android
- This is a runtime bundle, not a standard Transformers training checkpoint
- Quantization (Q4) may slightly reduce quality compared to the full-precision original
- Abliterated/uncensored models have had safety filters removed — **use responsibly**

## Community

- **Website:** [tokforge.ai](https://tokforge.ai)
- **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm)
- **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium)

## Export Details

Converted using MNN's `llmexport` pipeline:
```bash
python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2 --export mnn --quant_bit 4 --quant_block 128
```