Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model:
|
| 4 |
+
- Qwen/Qwen3-8B
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Introduce
|
| 8 |
+
We adapted the official speculative sampling training method, Eagle3, for training on Qwen3-30B-A3B
|
| 9 |
+
|
| 10 |
+
After implementing Eagle3, the inference performance of Qwen3-30B-Moe using the SGLang framework on 8*H200 GPU improved from 183 tokens/s to 325 tokens/s.
|
| 11 |
+
|
| 12 |
+
The TPS (tokens per second) improvement reached nearly 70%.
|
| 13 |
+
|
| 14 |
+
On a single RTX 5090, the TPS (transactions per second) of Qwen3-8B-Eagle3 increased from 164 to 268.
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
| model | gpu | tps |
|
| 18 |
+
|---------|---------|---------|
|
| 19 |
+
| qwen3-30b_moe | h200 | 147 |
|
| 20 |
+
| qwen3-30b-moe_eagle3 | h200 | 231 |
|
| 21 |
+
| qwen3-30b_moe | 8*h200 | 183 |
|
| 22 |
+
| qwen3-30b_moe-eagle3 | 8*h200 | 325 |
|
| 23 |
+
| qwen3-30b_moe | 8*5090 | 164 |
|
| 24 |
+
| qwen3-30b_moe-eagle3 | 8*5090 | 268 |
|
| 25 |
+
## How to use
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
The launch command for using Eagle3 with SGLang is:
|
| 29 |
+
|
| 30 |
+
```python3
|
| 31 |
+
|
| 32 |
+
python3 -m sglang.launch_server --model Qwen/Qwen3-30B-A3B --speculative-algorithm EAGLE3 --speculative-draft-model-path Tengyunw/qwen3_30b_moe_eagle3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32 --mem-fraction 0.9 --cuda-graph-max-bs 2 --dtype bfloat16
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## How to train
|
| 37 |
+
|
| 38 |
+
Training Dataset:
|
| 39 |
+
ultrachat_200k.
|
| 40 |
+
Only the prompts from these datasets were utilized for data synthesis. This synthesized data is used to train the Eagle modules.
|
| 41 |
+
|
| 42 |
+
dataset nums: 600K samples,1B tokens
|
| 43 |
+
|
| 44 |
+
Evaluation Dataset:
|
| 45 |
+
ShareGPT,GSM8K,HUAMEVAL,MT-BENCH,APLCA
|
| 46 |
+
|
| 47 |
+
Our Sharegpt test data is located in the eagle_data.jsonl file under this directory.
|