| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - tokforge |
| - mnn |
| - android |
| - mobile |
| - speculative-decoding |
| - qwen3.5 |
| - draft-model |
| - experimental |
| - text-generation |
| pipeline_tag: text-generation |
| inference: false |
| --- |
| |
| # Qwen3.5-0.8B-lk-alpha-ep4-MNN |
|
|
| Experimental `Qwen3.5-0.8B` draft bundle for **TokForge + MNN** speculative decoding research. |
|
|
| ## Why this repo exists |
|
|
| This repo captures an acceptance-oriented `Qwen3.5-0.8B` draft experiment exported into a ready-to-run `MNN` bundle. |
|
|
| It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation. |
|
|
| ## Training snapshot |
|
|
| For the associated `LK Alpha` training lane: |
|
|
| - final reported acceptance (`alpha`) was `0.6972` on the small Qwen3.5 dataset |
|
|
| ## Usage |
|
|
| This bundle is meant for **TokForge / MNN**, not standard HF Inference. |
|
|
| Typical TokForge recipe: |
|
|
| ```json |
| { |
| "backend_type": "cpu", |
| "thread_num": 2, |
| "precision": "low", |
| "memory": "low", |
| "sampler_type": "greedy", |
| "speculative_type": "draftmodel", |
| "draft_predict_length": 2, |
| "draft_config_path": "/path/to/config_cpu.json" |
| } |
| ``` |
|
|
| ## Status |
|
|
| This is currently best treated as **experimental**: |
|
|
| - useful if you want to inspect the `Qwen3.5-0.8B` draft path |
| - useful for reproducing training/export experiments |
| - not currently the top practical mobile recommendation versus the stronger `Qwen3-0.6B` draft lane |
|
|
| ## Limitations and Intended Use |
|
|
| - This is an experimental `Qwen3.5` draft lane, not the strongest practical mobile draft we have. |
| - Cross-family drafting was generally weaker than the same-architecture `Qwen3-0.6B -> Qwen3` draft lane. |
| - Use this for reproducibility and research rather than as the default recommended draft. |
|
|
| ## Collection |
|
|
| - [TokForge Mobile Draft Models](https://huggingface.co/collections/darkmaniac7/tokforge-mobile-draft-models-69c36153ea7084ce78329665) |
|
|
| ## Included files |
|
|
| - `llm.mnn` |
| - `llm.mnn.weight` |
| - `llm_config.json` |
| - `config_cpu.json` |
| - tokenizer files |
| - ONNX export artifact for reference |
|
|
| ## Notes |
|
|
| - This is an `MNN` runtime bundle for TokForge-style use. |
| - It is not a standard HF Transformers checkpoint. |
|
|
| ## TokForge |
|
|
| - Website: [tokforge.ai](https://tokforge.ai) |
| - Discord: [Join the Discord](https://discord.gg/Acv3CBtfVm) |
|
|
| If you benchmark this on your own device, feel free to share results in Discord. |
|
|