Qwen3.5-0.8B-lk-alpha-ep4-MNN

Experimental Qwen3.5-0.8B draft bundle for TokForge + MNN speculative decoding research.

Why this repo exists

This repo captures an acceptance-oriented Qwen3.5-0.8B draft experiment exported into a ready-to-run MNN bundle.

It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation.

Training snapshot

For the associated LK Alpha training lane:

  • final reported acceptance (alpha) was 0.6972 on the small Qwen3.5 dataset

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "cpu",
  "thread_num": 2,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 2,
  "draft_config_path": "/path/to/config_cpu.json"
}

Status

This is currently best treated as experimental:

  • useful if you want to inspect the Qwen3.5-0.8B draft path
  • useful for reproducing training/export experiments
  • not currently the top practical mobile recommendation versus the stronger Qwen3-0.6B draft lane

Limitations and Intended Use

  • This is an experimental Qwen3.5 draft lane, not the strongest practical mobile draft we have.
  • Cross-family drafting was generally weaker than the same-architecture Qwen3-0.6B -> Qwen3 draft lane.
  • Use this for reproducibility and research rather than as the default recommended draft.

Collection

Included files

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • config_cpu.json
  • tokenizer files
  • ONNX export artifact for reference

Notes

  • This is an MNN runtime bundle for TokForge-style use.
  • It is not a standard HF Transformers checkpoint.

TokForge

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including darkmaniac7/Qwen3.5-0.8B-lk-alpha-ep4-MNN