File size: 2,344 Bytes
f17c0c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7e30fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f17c0c9
 
 
 
 
 
 
 
 
c7e30fe
 
 
 
 
 
 
 
 
 
f17c0c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
language:
- en
tags:
- tokforge
- mnn
- android
- mobile
- speculative-decoding
- qwen3.5
- draft-model
- experimental
- text-generation
pipeline_tag: text-generation
inference: false
---

# Qwen3.5-0.8B-lk-alpha-ep4-MNN

Experimental `Qwen3.5-0.8B` draft bundle for **TokForge + MNN** speculative decoding research.

## Why this repo exists

This repo captures an acceptance-oriented `Qwen3.5-0.8B` draft experiment exported into a ready-to-run `MNN` bundle.

It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation.

## Training snapshot

For the associated `LK Alpha` training lane:

- final reported acceptance (`alpha`) was `0.6972` on the small Qwen3.5 dataset

## Usage

This bundle is meant for **TokForge / MNN**, not standard HF Inference.

Typical TokForge recipe:

```json
{
  "backend_type": "cpu",
  "thread_num": 2,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 2,
  "draft_config_path": "/path/to/config_cpu.json"
}
```

## Status

This is currently best treated as **experimental**:

- useful if you want to inspect the `Qwen3.5-0.8B` draft path
- useful for reproducing training/export experiments
- not currently the top practical mobile recommendation versus the stronger `Qwen3-0.6B` draft lane

## Limitations and Intended Use

- This is an experimental `Qwen3.5` draft lane, not the strongest practical mobile draft we have.
- Cross-family drafting was generally weaker than the same-architecture `Qwen3-0.6B -> Qwen3` draft lane.
- Use this for reproducibility and research rather than as the default recommended draft.

## Collection

- [TokForge Mobile Draft Models](https://huggingface.co/collections/darkmaniac7/tokforge-mobile-draft-models-69c36153ea7084ce78329665)

## Included files

- `llm.mnn`
- `llm.mnn.weight`
- `llm_config.json`
- `config_cpu.json`
- tokenizer files
- ONNX export artifact for reference

## Notes

- This is an `MNN` runtime bundle for TokForge-style use.
- It is not a standard HF Transformers checkpoint.

## TokForge

- Website: [tokforge.ai](https://tokforge.ai)
- Discord: [Join the Discord](https://discord.gg/Acv3CBtfVm)

If you benchmark this on your own device, feel free to share results in Discord.