darkmaniac7
/

Qwen3-0.6B-lk-alpha-20k-MNN

Text Generation

speculative-decoding

Model card Files Files and versions

Qwen3-0.6B-lk-alpha-20k-MNN

Qwen3-0.6B draft model exported for TokForge + MNN speculative decoding, trained with an LK Alpha objective instead of standard KL.

This bundle is aimed at people who want to experiment with acceptance-oriented draft training on mobile.

Why this repo exists

This is the LK-loss variant of our 20K Qwen3 draft lane:

Qwen3-0.6B student
Qwen3-8B teacher
20K teacher dataset
LK Alpha training objective
exported as a ready-to-use MNN draft bundle

Best-known use

Draft model backend: CPU
Draft threads: 2
Draft predict length: d=3
Target pairing: usually Qwen3-8B in TokForge

Benchmark snapshot

On RedMagic SM8850 with Qwen3-8B target:

AR baseline: 13.9 tok/s
This draft model: 17.7 tok/s
Uplift: about +27%

Training acceptance (alpha) at the final logged epoch:

0.7350

Included files

llm.mnn
llm.mnn.weight
llm_config.json
config.json
config_cpu.json
tokenizer files
ONNX export artifact for reference

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "opencl",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Known-good draft-side config:

{
  "backend_type": "cpu",
  "thread_num": 2,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy"
}

Notes

In our testing, this trained objective improved acceptance over the KL baseline.
On short device benchmarks, the runtime win was in the same general band as the KL model.
This makes it a good experimental alternative, but not a guaranteed universal replacement.

Limitations and Intended Use

Intended for speculative decoding with larger Qwen3 targets inside TokForge.
Training acceptance improved over the KL baseline, but device throughput gains stayed in a similar band on short runs.
Best current evidence is strongest on Qwen3-8B.
This is a specialized runtime artifact, not a general-purpose pretrained release.

Collection

TokForge Mobile Draft Models

TokForge

Website: tokforge.ai
Discord: Join the Discord

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month: 3

Model tree for darkmaniac7/Qwen3-0.6B-lk-alpha-20k-MNN

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(334)

this model

Collection including darkmaniac7/Qwen3-0.6B-lk-alpha-20k-MNN

TokForge Mobile Draft Models

Small MNN draft models and speculative-decoding bundles for TokForge on Android. Includes practical Qwen3 0.6B drafts plus experimental variants. • 5 items • Updated Mar 25