Qwen3.5-4B-Claude-4.6-Opus-abliterated (MNN)
Pre-converted Qwen3.5-4B abliterated model in MNN format for on-device inference.
Model Details
- Architecture: Qwen3.5 (hybrid LinearAttention + standard attention)
- Parameters: 4B (4-bit quantized)
- Format: MNN (Alibaba Mobile Neural Network)
- Quantization: W4A16 (4-bit weights, 16-bit activations)
- Attention: LinearAttention (gated_delta_rule) + standard attention hybrid
Performance
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| S26 Ultra | SM8850 | CPU | 20.5 |
| Lenovo TB520FU | SM8650 | CPU | 14.0 |
| Xiaomi Pad 7 Pro | SM8635 | CPU | 11.8 |
CPU is recommended for Qwen3.5 models (LinearAttention runs natively on CPU).
Usage with TokForge
Optimized for TokForge — an Android app for on-device LLM inference.
Abliteration
Safety filters removed for unrestricted conversation. Use responsibly.
Limitations and Intended Use
- Intended for TokForge / MNN on-device inference, especially creative writing and roleplay use.
Qwen3.5hybrid LinearAttention models generally route best to CPU in current TokForge builds.- Performance varies by device class and generation length.
- This repo is a packaged runtime artifact, not a standard Transformers training checkpoint.
Community
- Website: tokforge.ai
- Discord: Join the Discord
- Downloads last month
- 5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support