TokForge Mobile Draft Models
Collection
Small MNN draft models and speculative-decoding bundles for TokForge on Android. Includes practical Qwen3 0.6B drafts plus experimental variants. • 5 items • Updated
Experimental Qwen3.5-0.8B draft bundle for TokForge + MNN speculative decoding research.
This repo captures an acceptance-oriented Qwen3.5-0.8B draft experiment exported into a ready-to-run MNN bundle.
It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation.
For the associated LK Alpha training lane:
alpha) was 0.6972 on the small Qwen3.5 datasetThis bundle is meant for TokForge / MNN, not standard HF Inference.
Typical TokForge recipe:
{
"backend_type": "cpu",
"thread_num": 2,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 2,
"draft_config_path": "/path/to/config_cpu.json"
}
This is currently best treated as experimental:
Qwen3.5-0.8B draft pathQwen3-0.6B draft laneQwen3.5 draft lane, not the strongest practical mobile draft we have.Qwen3-0.6B -> Qwen3 draft lane.llm.mnnllm.mnn.weightllm_config.jsonconfig_cpu.jsonMNN runtime bundle for TokForge-style use.If you benchmark this on your own device, feel free to share results in Discord.