OWSCloud
/

qwen3_30b_moe_eagle3

Model card Files Files and versions

Lil2J commited on Aug 4, 2025

Commit

624d38a

·

verified ·

1 Parent(s): bcdc513

Create README.md

Files changed (1) hide show

README.md +47 -0

README.md ADDED Viewed

	@@ -0,0 +1,47 @@

+---
+license: mit
+base_model:
+- Qwen/Qwen3-8B
+---
+## Introduce
+We adapted the official speculative sampling training method, Eagle3, for training on Qwen3-30B-A3B
+After implementing Eagle3, the inference performance of Qwen3-30B-Moe using the SGLang framework on 8*H200 GPU improved from 183 tokens/s to 325 tokens/s.
+The TPS (tokens per second) improvement reached nearly 70%.
+On a single RTX 5090, the TPS (transactions per second) of Qwen3-8B-Eagle3 increased from 164 to 268.
+| model | gpu | tps |
+|---------|---------|---------|
+| qwen3-30b_moe   | h200   | 147  |
+| qwen3-30b-moe_eagle3   | h200   | 231   |
+| qwen3-30b_moe   | 8*h200   | 183   |
+| qwen3-30b_moe-eagle3  | 8*h200  | 325  |
+| qwen3-30b_moe   | 8*5090   | 164   |
+| qwen3-30b_moe-eagle3  | 8*5090  | 268  |
+## How to use
+The launch command for using Eagle3 with SGLang is:
+```python3
+python3 -m sglang.launch_server --model Qwen/Qwen3-30B-A3B --speculative-algorithm EAGLE3 --speculative-draft-model-path Tengyunw/qwen3_30b_moe_eagle3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32 --mem-fraction 0.9 --cuda-graph-max-bs 2 --dtype bfloat16
+```
+## How to train
+Training Dataset:
+ultrachat_200k.
+Only the prompts from these datasets were utilized for data synthesis. This synthesized data is used to train the Eagle modules.
+dataset nums: 600K samples,1B tokens
+Evaluation Dataset:
+ShareGPT,GSM8K,HUAMEVAL,MT-BENCH,APLCA
+Our Sharegpt test data is located in the eagle_data.jsonl file under this directory.