--- license: apache-2.0 base_model: Qwen/Qwen3-8B tags: - speculative-decoding - eagle3 - sglang - multilingual - indic - hindi - gujarati language: - hi - gu - en library_name: sglang --- # Multilingual EAGLE-3 Draft Head for Qwen3-8B (Hindi / Gujarati / English) — Research Preview An [EAGLE-3](https://github.com/SafeAILab/EAGLE) speculative-decoding **draft head** for [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B), trained to recover the acceptance length (τ) that public **English-only** EAGLE-3 heads lose on Indic languages. Pair it with Qwen3-8B in [SGLang](https://github.com/sgl-project/sglang) for **lossless** faster generation on Hindi and Gujarati. > ⚠️ **Research preview / proof-of-concept — not a production-tuned head.** It was trained on a small (~2,100-example) FLORES-derived dataset. It recovers Indic acceptance but **regresses on English** and carries a **training-domain bias**. Please read the **Limitations** before use. To our knowledge this is the first publicly released Indic EAGLE-3 head; it accompanies the study described below. ## Results — acceptance length τ Config `steps=3, topk=1, draft_tokens=4`, temperature 0, 50 parallel prompts/language. τ = mean accepted tokens per verification step (higher = faster). **EAGLE-3 is lossless** — outputs are identical to standard decoding; only speed changes. **vs. the public English EAGLE-3 head, on FLORES-200 prompts:** | language | English head | **this head** | |---|---|---| | English | 2.37 | 1.47 (1.40 ± 0.09 across 3 seeds) | | Hindi | 1.36 | **1.86 ± 0.20** | | Gujarati | 1.07 | **2.16 ± 0.29** | **Held-out, out-of-domain (Aya instruction prompts) — the recovery generalizes:** Gujarati 1.08 → **2.31**, Hindi 1.40 → **1.92** (English head is domain-robust, confirming the comparison is fair). ## Why the English head fails on Indic (mechanism) EAGLE-3 heads emit over a reduced **32k "draft vocabulary"** chosen by token frequency. An English-trained head's 32k **excludes ~half of all Hindi/Gujarati tokens** (it covers only ~50% / ~46%), so it literally cannot propose them → acceptance collapses toward 1. This head rebuilds the draft vocab from multilingual data (~100% Indic coverage). Across 8 languages, τ correlates with draft-vocab coverage (Pearson r = **+0.95**) and inversely with tokenization inflation (r = **−0.87**). ## Usage (SGLang) ```bash python -m sglang.launch_server \ --model Qwen/Qwen3-8B \ --speculative-algorithm EAGLE3 \ --speculative-draft-model-path SwitchXDDD/multilingual-eagle3-qwen3-8b \ --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \ --dtype bfloat16 ``` ## Training - **Target:** `Qwen/Qwen3-8B` (frozen). **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge), EAGLE-3 **online** mode. Draft config: `qwen3-8b-eagle3.json` (1-layer `LlamaForCausalLMEagle3`, `draft_vocab_size=32000`). - **Data:** ~2,100 conversations = **Qwen3-8B's own responses** to **FLORES-200** prompts — 700 Hindi + 700 Gujarati + 700 English. Target-regenerated so the draft matches the target's distribution. - **Recipe:** 5 epochs, lr 1e-4, max-length 4096, bf16, 1× H100. ## Limitations (please read) - **English regression** (2.37 → ~1.40). A *same-recipe English-only control* also reaches ~1.45, so this is **limited/narrow English training data, not multilingual interference** — but the head is still worse at English than the off-the-shelf head. Mitigation: mix in diverse English (e.g. ShareGPT) when training your own. - **Training-domain bias:** trained on FLORES (wiki-news). The held-out Aya results above show the recovery largely holds, but expect some domain sensitivity. - **Single seed released:** seed-to-seed τ varies (Gujarati ± 0.29 over 3 seeds). This is one representative run. - **Small dataset, not quality/safety-tuned** — a proof-of-concept, not a maximally-optimized head. - **Lossless:** it does not change model outputs, only decoding speed. ## License & provenance Weights released under **Apache-2.0** (consistent with Qwen3 and SpecForge). Training prompts are derived from **FLORES-200** (CC-BY-SA-4.0); responses generated by Qwen3-8B (Apache-2.0). Please retain attribution. ## Citation If you use this head, please cite EAGLE-3, SpecForge, Qwen3, and FLORES-200: - Li et al., *EAGLE-3* (NeurIPS 2025). SGLang team, *SpecForge*. Qwen team, *Qwen3*. NLLB team, *FLORES-200 / No Language Left Behind*. - This work: *Cross-Lingual EAGLE-3 for Indic Languages* (link TBD). *Companion 32B result:* the same degradation→recovery pattern replicates at Qwen3-32B (Gujarati 1.03 → 2.47); that head is validated but **not yet publicly released** (pending a held-out + multi-seed pass).