Upload README.md with huggingface_hub

c7e30fe verified 3 months ago

2.34 kB

license: apache-2.0
language:
  - en
tags:
  - tokforge
  - mnn
  - android
  - mobile
  - speculative-decoding
  - qwen3.5
  - draft-model
  - experimental
  - text-generation
pipeline_tag: text-generation
inference: false

Qwen3.5-0.8B-lk-alpha-ep4-MNN

Experimental Qwen3.5-0.8B draft bundle for TokForge + MNN speculative decoding research.

Why this repo exists

This repo captures an acceptance-oriented Qwen3.5-0.8B draft experiment exported into a ready-to-run MNN bundle.

It is here because people asked for the actual artifacts behind the work, not because it is the current default recommendation.

Training snapshot

For the associated LK Alpha training lane:

final reported acceptance (alpha) was 0.6972 on the small Qwen3.5 dataset

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "cpu",
  "thread_num": 2,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 2,
  "draft_config_path": "/path/to/config_cpu.json"
}

Status

This is currently best treated as experimental:

useful if you want to inspect the Qwen3.5-0.8B draft path
useful for reproducing training/export experiments
not currently the top practical mobile recommendation versus the stronger Qwen3-0.6B draft lane

Limitations and Intended Use

This is an experimental Qwen3.5 draft lane, not the strongest practical mobile draft we have.
Cross-family drafting was generally weaker than the same-architecture Qwen3-0.6B -> Qwen3 draft lane.
Use this for reproducibility and research rather than as the default recommended draft.

Collection

TokForge Mobile Draft Models

Included files

llm.mnn
llm.mnn.weight
llm_config.json
config_cpu.json
tokenizer files
ONNX export artifact for reference

Notes

This is an MNN runtime bundle for TokForge-style use.
It is not a standard HF Transformers checkpoint.

TokForge

Website: tokforge.ai
Discord: Join the Discord

If you benchmark this on your own device, feel free to share results in Discord.