Qwen3-0.6B-lk-alpha-20k-MNN

Qwen3-0.6B draft model exported for TokForge + MNN speculative decoding, trained with an LK Alpha objective instead of standard KL.

This bundle is aimed at people who want to experiment with acceptance-oriented draft training on mobile.

Why this repo exists

This is the LK-loss variant of our 20K Qwen3 draft lane:

  • Qwen3-0.6B student
  • Qwen3-8B teacher
  • 20K teacher dataset
  • LK Alpha training objective
  • exported as a ready-to-use MNN draft bundle

Best-known use

  • Draft model backend: CPU
  • Draft threads: 2
  • Draft predict length: d=3
  • Target pairing: usually Qwen3-8B in TokForge

Benchmark snapshot

On RedMagic SM8850 with Qwen3-8B target:

  • AR baseline: 13.9 tok/s
  • This draft model: 17.7 tok/s
  • Uplift: about +27%

Training acceptance (alpha) at the final logged epoch:

  • 0.7350

Included files

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • config.json
  • config_cpu.json
  • tokenizer files
  • ONNX export artifact for reference

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "opencl",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Known-good draft-side config:

{
  "backend_type": "cpu",
  "thread_num": 2,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy"
}

Notes

  • In our testing, this trained objective improved acceptance over the KL baseline.
  • On short device benchmarks, the runtime win was in the same general band as the KL model.
  • This makes it a good experimental alternative, but not a guaranteed universal replacement.

Limitations and Intended Use

  • Intended for speculative decoding with larger Qwen3 targets inside TokForge.
  • Training acceptance improved over the KL baseline, but device throughput gains stayed in a similar band on short runs.
  • Best current evidence is strongest on Qwen3-8B.
  • This is a specialized runtime artifact, not a general-purpose pretrained release.

Collection

TokForge

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-0.6B-lk-alpha-20k-MNN

Finetuned
Qwen/Qwen3-0.6B
Quantized
(334)
this model

Collection including darkmaniac7/Qwen3-0.6B-lk-alpha-20k-MNN