darkmaniac7 commited on
Commit
91fb598
·
verified ·
1 Parent(s): 828564c

Add MNN Q4 conversion for TokForge mobile inference

Browse files
Files changed (8) hide show
  1. .gitattributes +2 -0
  2. README.md +107 -0
  3. config.json +10 -0
  4. export_args.json +42 -0
  5. llm.mnn +3 -0
  6. llm.mnn.weight +3 -0
  7. llm_config.json +18 -0
  8. tokenizer.txt +0 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ llm.mnn filter=lfs diff=lfs merge=lfs -text
37
+ llm.mnn.weight filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ base_model: Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2
7
+ tags:
8
+ - mnn
9
+ - qwen3
10
+ - mobile
11
+ - on-device
12
+ - tokforge
13
+ - uncensored
14
+ - abliterated
15
+ ---
16
+
17
+ # Josiefied-Qwen3-4B-abliterated-v2-MNN
18
+
19
+ Pre-converted [Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2) in MNN format for on-device inference with [TokForge](https://tokforge.ai).
20
+
21
+ > **Original model by [Goekdeniz-Guelmez](https://huggingface.co/Goekdeniz-Guelmez)** — converted to MNN Q4 for mobile deployment.
22
+
23
+ ## Model Details
24
+
25
+ | | |
26
+ |---|---|
27
+ | **Architecture** | Qwen3 (standard multi-head attention, 36 layers) |
28
+ | **Parameters** | 4B (4-bit quantized) |
29
+ | **Format** | MNN (Alibaba Mobile Neural Network) |
30
+ | **Quantization** | W4A16 (4-bit weights, block size 128) |
31
+ | **Vocab** | 151,936 tokens |
32
+ | **Source** | [Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2) |
33
+
34
+ ## Description
35
+
36
+ Josiefied abliterated v2 by Goekdeniz Guelmez — refined 4B Qwen3 with abliterated safety filters. The v2 iteration improves on the original with better uncensoring and instruction following. Great balance of speed and quality for everyday mobile use.
37
+
38
+ ## Files
39
+
40
+ | File | Description |
41
+ |------|-------------|
42
+ | `llm.mnn` | Model computation graph |
43
+ | `llm.mnn.weight` | Quantized weight data (Q4, block=128) |
44
+ | `llm_config.json` | Model config with Jinja chat template |
45
+ | `tokenizer.txt` | Tokenizer vocabulary |
46
+ | `config.json` | MNN runtime config |
47
+
48
+ ## Usage with TokForge
49
+
50
+ This model is optimized for **[TokForge](https://tokforge.ai)** — a free Android app for private, on-device LLM inference.
51
+
52
+ 1. Download [TokForge from the Play Store](https://tokforge.ai)
53
+ 2. Open the app → Models → Download this model
54
+ 3. Start chatting — runs 100% locally, no internet required
55
+
56
+ ### Recommended Settings
57
+
58
+ | Setting | Value |
59
+ |---------|-------|
60
+ | Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
61
+ | Precision | Low |
62
+ | Threads | 4 |
63
+ | Thinking | Off (or On for thinking-capable models) |
64
+
65
+ ### Speculative Decoding
66
+
67
+ Pair with the [TokForge Acceleration Pack](https://huggingface.co/darkmaniac7/TokForge-AccelerationPack-Draft) for **+20-38% faster generation** on supported devices.
68
+
69
+ | Device | SoC | AR | With Draft | Uplift |
70
+ |---|---|---|---|---|
71
+ | Lenovo TB520FU | SM8650 | 17.7 tok/s | 24.4 tok/s | **+38%** |
72
+ | RedMagic 11 Pro | SM8850 | 24.0 tok/s | ~28 tok/s | **+17%** |
73
+
74
+ ## Performance
75
+
76
+ Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:
77
+
78
+ | Device Class | SoC Example | Backend | Approx. tok/s |
79
+ |---|---|---|---|
80
+ | Flagship 2025 | SM8850 | OpenCL | ~17-24 tok/s |
81
+ | Mid-range 2024 | SM8650 | OpenCL | ~14-18 tok/s |
82
+ | Budget 2024 | SM8635 | CPU | ~9-12 tok/s |
83
+ | MediaTek D9400 | MT6991 | Vulkan | ~14 tok/s |
84
+
85
+ ## Attribution
86
+
87
+ This is an MNN conversion of **[Josiefied-Qwen3-4B-abliterated-v2](https://huggingface.co/Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2)** by **[Goekdeniz-Guelmez](https://huggingface.co/Goekdeniz-Guelmez)**. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.
88
+
89
+ ## Limitations
90
+
91
+ - Intended for TokForge / MNN on-device inference on Android
92
+ - This is a runtime bundle, not a standard Transformers training checkpoint
93
+ - Quantization (Q4) may slightly reduce quality compared to the full-precision original
94
+ - Abliterated/uncensored models have had safety filters removed — **use responsibly**
95
+
96
+ ## Community
97
+
98
+ - **Website:** [tokforge.ai](https://tokforge.ai)
99
+ - **Discord:** [Join our Discord](https://discord.gg/Acv3CBtfVm)
100
+ - **GitHub:** [TokForge on GitHub](https://github.com/darkmaniac7/Elysium)
101
+
102
+ ## Export Details
103
+
104
+ Converted using MNN's `llmexport` pipeline:
105
+ ```bash
106
+ python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2 --export mnn --quant_bit 4 --quant_block 128
107
+ ```
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "llm_model": "llm.mnn",
3
+ "llm_weight": "llm.mnn.weight",
4
+ "backend_type": "cpu",
5
+ "thread_num": 4,
6
+ "precision": "low",
7
+ "memory": "low",
8
+ "sampler_type": "penalty",
9
+ "penalty": 1.1
10
+ }
export_args.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "path": "/root/models/hf_convert_queue/Josiefied-Qwen3-4B-abliterated-v2",
3
+ "type": null,
4
+ "tokenizer_path": "/root/models/hf_convert_queue/Josiefied-Qwen3-4B-abliterated-v2",
5
+ "eagle_path": null,
6
+ "lora_path": null,
7
+ "gptq_path": null,
8
+ "dst_path": "/root/models/hf_uploads/Josiefied-Qwen3-4B-abliterated-v2-MNN",
9
+ "verbose": false,
10
+ "test": null,
11
+ "export": "mnn",
12
+ "onnx_slim": false,
13
+ "quant_bit": 4,
14
+ "quant_block": 128,
15
+ "visual_quant_bit": null,
16
+ "visual_quant_block": null,
17
+ "lm_quant_bit": 4,
18
+ "lm_quant_block": 128,
19
+ "mnnconvert": "../../../build/MNNConvert",
20
+ "ppl": false,
21
+ "awq": false,
22
+ "hqq": false,
23
+ "omni": false,
24
+ "transformer_fuse": false,
25
+ "group_conv_native": false,
26
+ "smooth": false,
27
+ "sym": false,
28
+ "visual_sym": false,
29
+ "seperate_embed": false,
30
+ "lora_split": false,
31
+ "calib_data": null,
32
+ "act_bit": 16,
33
+ "embed_bit": 16,
34
+ "act_sym": false,
35
+ "quant_config": null,
36
+ "generate_for_npu": false,
37
+ "skip_weight": false,
38
+ "omni_epochs": 20,
39
+ "omni_lr": 0.005,
40
+ "omni_wd": 0.0001,
41
+ "tie_word_embeddings": true
42
+ }
llm.mnn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91abf52c44be37545762d78439afb0c26f5deb031567c262b7126d3410b9102f
3
+ size 645760
llm.mnn.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f5b620b46e043ad6700ae55e063bbf933fe4c6fd378f5dd7211195abdbfb37c
3
+ size 2264102338
llm_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "qwen3",
3
+ "hidden_size": 2560,
4
+ "attention_mask": "float",
5
+ "attention_type": "full",
6
+ "is_mrope": false,
7
+ "jinja": {
8
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
9
+ "eos": "<|im_end|>"
10
+ },
11
+ "tie_embeddings": [
12
+ 2045314498,
13
+ 2239792578,
14
+ 24309760,
15
+ 4,
16
+ 128
17
+ ]
18
+ }
tokenizer.txt ADDED
The diff for this file is too large to render. See raw diff