Qwen3.5-4B-Claude-4.6-Opus-abliterated (MNN)

Pre-converted Qwen3.5-4B abliterated model in MNN format for on-device inference.

Model Details

  • Architecture: Qwen3.5 (hybrid LinearAttention + standard attention)
  • Parameters: 4B (4-bit quantized)
  • Format: MNN (Alibaba Mobile Neural Network)
  • Quantization: W4A16 (4-bit weights, 16-bit activations)
  • Attention: LinearAttention (gated_delta_rule) + standard attention hybrid

Performance

Device SoC Backend tok/s
S26 Ultra SM8850 CPU 20.5
Lenovo TB520FU SM8650 CPU 14.0
Xiaomi Pad 7 Pro SM8635 CPU 11.8

CPU is recommended for Qwen3.5 models (LinearAttention runs natively on CPU).

Usage with TokForge

Optimized for TokForge — an Android app for on-device LLM inference.

Abliteration

Safety filters removed for unrestricted conversation. Use responsibly.

Limitations and Intended Use

  • Intended for TokForge / MNN on-device inference, especially creative writing and roleplay use.
  • Qwen3.5 hybrid LinearAttention models generally route best to CPU in current TokForge builds.
  • Performance varies by device class and generation length.
  • This repo is a packaged runtime artifact, not a standard Transformers training checkpoint.

Community

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3.5-4B-Claude-4.6-Opus-abliterated-MNN

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(312)
this model