darkmaniac7's picture
Upload README.md with huggingface_hub
c7e30fe verified
|
Raw
History Blame Contribute Delete
2.34 kB
---
license: apache-2.0
language:
- en
tags:
- tokforge
- mnn
- android
- mobile
- speculative-decoding
- qwen3.5
- draft-model
- experimental
- text-generation
pipeline_tag: text-generation
inference: false
---
# Qwen3.5-0.8B-lk-alpha-ep4-MNN
Experimental `Qwen3.5-0.8B` draft bundle for **TokForge + MNN** speculative decoding research.
## Why this repo exists
This repo captures an acceptance-oriented `Qwen3.5-0.8B` draft experiment exported into a ready-to-run `MNN` bundle.
It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation.
## Training snapshot
For the associated `LK Alpha` training lane:
- final reported acceptance (`alpha`) was `0.6972` on the small Qwen3.5 dataset
## Usage
This bundle is meant for **TokForge / MNN**, not standard HF Inference.
Typical TokForge recipe:
```json
{
"backend_type": "cpu",
"thread_num": 2,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 2,
"draft_config_path": "/path/to/config_cpu.json"
}
```
## Status
This is currently best treated as **experimental**:
- useful if you want to inspect the `Qwen3.5-0.8B` draft path
- useful for reproducing training/export experiments
- not currently the top practical mobile recommendation versus the stronger `Qwen3-0.6B` draft lane
## Limitations and Intended Use
- This is an experimental `Qwen3.5` draft lane, not the strongest practical mobile draft we have.
- Cross-family drafting was generally weaker than the same-architecture `Qwen3-0.6B -> Qwen3` draft lane.
- Use this for reproducibility and research rather than as the default recommended draft.
## Collection
- [TokForge Mobile Draft Models](https://huggingface.co/collections/darkmaniac7/tokforge-mobile-draft-models-69c36153ea7084ce78329665)
## Included files
- `llm.mnn`
- `llm.mnn.weight`
- `llm_config.json`
- `config_cpu.json`
- tokenizer files
- ONNX export artifact for reference
## Notes
- This is an `MNN` runtime bundle for TokForge-style use.
- It is not a standard HF Transformers checkpoint.
## TokForge
- Website: [tokforge.ai](https://tokforge.ai)
- Discord: [Join the Discord](https://discord.gg/Acv3CBtfVm)
If you benchmark this on your own device, feel free to share results in Discord.