darkmaniac7's picture
Upload README.md with huggingface_hub
338750c verified
|
Raw
History Blame
2.23 kB
---
license: apache-2.0
tags:
- mnn
- qwen3
- mobile
- on-device
- tokforge
- abliterated
base_model: Qwen/Qwen3-8B
---
# Qwen3-8B-abliterated-v2 (MNN)
Pre-converted [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) abliterated model in MNN format for on-device inference.
## Model Details
- **Architecture:** Qwen3 (standard attention, 36 layers)
- **Parameters:** 8B (4-bit quantized)
- **Format:** MNN (Alibaba Mobile Neural Network)
- **Vocab:** 151,936 tokens
- **Quantization:** W4A16 (4-bit weights, 16-bit activations)
## Files
| File | Size | Description |
|------|------|-------------|
| `llm.mnn` | 631KB | Model graph |
| `llm.mnn.weight` | 4.4GB | Quantized weights |
| `embeddings_bf16.bin` | 1.2GB | BF16 embedding table (required) |
| `llm_config.json` | 4.5KB | Model config with jinja chat template |
| `tokenizer.txt` | 3.0MB | Tokenizer |
| `config.json` | 210B | MNN runtime config |
## Usage with TokForge
This model is optimized for [TokForge](https://tokforge.ai) — an Android app for on-device LLM inference.
### Performance (Speculative Decoding)
| Device | SoC | Backend | AR tok/s | Spec Decode tok/s | Uplift |
|--------|-----|---------|----------|-------------------|--------|
| S26 Ultra | SM8850 | OpenCL | ~14 | 17.8 | +27% |
| RedMagic 11 Pro | SM8850 | OpenCL | ~14 | 17.8 | +27% |
| Lenovo TB520FU | SM8650 | OpenCL | 9.9 | 12.2 | +23% |
Draft model: [Qwen3-0.6B](https://huggingface.co/darkmaniac7/TokForge-AccelerationPack-Draft)
## Abliteration
This model has been abliterated (safety filters removed) for unrestricted conversation. Use responsibly.
## Limitations and Intended Use
- Intended for TokForge / MNN on-device inference, especially Android phones and tablets.
- The best-known uplift for this model comes from pairing it with a small CPU draft model for speculative decoding.
- Real throughput varies by SoC, thermal state, backend, and generation length.
- This repo is a runtime bundle, not a standard Transformers training checkpoint.
## Community
- Website: [tokforge.ai](https://tokforge.ai)
- Discord: [Join the Discord](https://discord.gg/Acv3CBtfVm)
## Export
Converted using MNN's `llmexport` pipeline with `--quant_bit 4 --quant_block 128`.