metadata
license: apache-2.0
language:
- en
tags:
- tokforge
- mnn
- android
- mobile
- speculative-decoding
- qwen3.5
- draft-model
- experimental
- text-generation
pipeline_tag: text-generation
inference: false
Qwen3.5-0.8B-lk-alpha-ep4-MNN
Experimental Qwen3.5-0.8B draft bundle for TokForge + MNN speculative decoding research.
Why this repo exists
This repo captures an acceptance-oriented Qwen3.5-0.8B draft experiment exported into a ready-to-run MNN bundle.
It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation.
Training snapshot
For the associated LK Alpha training lane:
- final reported acceptance (
alpha) was0.6972on the small Qwen3.5 dataset
Usage
This bundle is meant for TokForge / MNN, not standard HF Inference.
Typical TokForge recipe:
{
"backend_type": "cpu",
"thread_num": 2,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 2,
"draft_config_path": "/path/to/config_cpu.json"
}
Status
This is currently best treated as experimental:
- useful if you want to inspect the
Qwen3.5-0.8Bdraft path - useful for reproducing training/export experiments
- not currently the top practical mobile recommendation versus the stronger
Qwen3-0.6Bdraft lane
Limitations and Intended Use
- This is an experimental
Qwen3.5draft lane, not the strongest practical mobile draft we have. - Cross-family drafting was generally weaker than the same-architecture
Qwen3-0.6B -> Qwen3draft lane. - Use this for reproducibility and research rather than as the default recommended draft.
Collection
Included files
llm.mnnllm.mnn.weightllm_config.jsonconfig_cpu.json- tokenizer files
- ONNX export artifact for reference
Notes
- This is an
MNNruntime bundle for TokForge-style use. - It is not a standard HF Transformers checkpoint.
TokForge
- Website: tokforge.ai
- Discord: Join the Discord
If you benchmark this on your own device, feel free to share results in Discord.