darkmaniac7's picture
Upload README.md with huggingface_hub
338750c verified
|
Raw
History Blame
2.23 kB
metadata
license: apache-2.0
tags:
  - mnn
  - qwen3
  - mobile
  - on-device
  - tokforge
  - abliterated
base_model: Qwen/Qwen3-8B

Qwen3-8B-abliterated-v2 (MNN)

Pre-converted Qwen3-8B abliterated model in MNN format for on-device inference.

Model Details

  • Architecture: Qwen3 (standard attention, 36 layers)
  • Parameters: 8B (4-bit quantized)
  • Format: MNN (Alibaba Mobile Neural Network)
  • Vocab: 151,936 tokens
  • Quantization: W4A16 (4-bit weights, 16-bit activations)

Files

File Size Description
llm.mnn 631KB Model graph
llm.mnn.weight 4.4GB Quantized weights
embeddings_bf16.bin 1.2GB BF16 embedding table (required)
llm_config.json 4.5KB Model config with jinja chat template
tokenizer.txt 3.0MB Tokenizer
config.json 210B MNN runtime config

Usage with TokForge

This model is optimized for TokForge — an Android app for on-device LLM inference.

Performance (Speculative Decoding)

Device SoC Backend AR tok/s Spec Decode tok/s Uplift
S26 Ultra SM8850 OpenCL ~14 17.8 +27%
RedMagic 11 Pro SM8850 OpenCL ~14 17.8 +27%
Lenovo TB520FU SM8650 OpenCL 9.9 12.2 +23%

Draft model: Qwen3-0.6B

Abliteration

This model has been abliterated (safety filters removed) for unrestricted conversation. Use responsibly.

Limitations and Intended Use

  • Intended for TokForge / MNN on-device inference, especially Android phones and tablets.
  • The best-known uplift for this model comes from pairing it with a small CPU draft model for speculative decoding.
  • Real throughput varies by SoC, thermal state, backend, and generation length.
  • This repo is a runtime bundle, not a standard Transformers training checkpoint.

Community

Export

Converted using MNN's llmexport pipeline with --quant_bit 4 --quant_block 128.