---
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
  - speculative-decoding
  - eagle3
  - sglang
  - multilingual
  - indic
  - hindi
  - gujarati
language:
  - hi
  - gu
  - en
library_name: sglang
---

# Multilingual EAGLE-3 Draft Head for Qwen3-8B (Hindi / Gujarati / English) — Research Preview

An [EAGLE-3](https://github.com/SafeAILab/EAGLE) speculative-decoding **draft head** for [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B), trained to recover the acceptance length (τ) that public **English-only** EAGLE-3 heads lose on Indic languages. Pair it with Qwen3-8B in [SGLang](https://github.com/sgl-project/sglang) for **lossless** faster generation on Hindi and Gujarati.

> ⚠️ **Research preview / proof-of-concept — not a production-tuned head.** It was trained on a small (~2,100-example) FLORES-derived dataset. It recovers Indic acceptance but **regresses on English** and carries a **training-domain bias**. Please read the **Limitations** before use. To our knowledge this is the first publicly released Indic EAGLE-3 head; it accompanies the study described below.

## Results — acceptance length τ
Config `steps=3, topk=1, draft_tokens=4`, temperature 0, 50 parallel prompts/language. τ = mean accepted tokens per verification step (higher = faster). **EAGLE-3 is lossless** — outputs are identical to standard decoding; only speed changes.

**vs. the public English EAGLE-3 head, on FLORES-200 prompts:**

| language | English head | **this head** |
|---|---|---|
| English | 2.37 | 1.47  (1.40 ± 0.09 across 3 seeds) |
| Hindi | 1.36 | **1.86 ± 0.20** |
| Gujarati | 1.07 | **2.16 ± 0.29** |

**Held-out, out-of-domain (Aya instruction prompts) — the recovery generalizes:** Gujarati 1.08 → **2.31**, Hindi 1.40 → **1.92** (English head is domain-robust, confirming the comparison is fair).

## Why the English head fails on Indic (mechanism)
EAGLE-3 heads emit over a reduced **32k "draft vocabulary"** chosen by token frequency. An English-trained head's 32k **excludes ~half of all Hindi/Gujarati tokens** (it covers only ~50% / ~46%), so it literally cannot propose them → acceptance collapses toward 1. This head rebuilds the draft vocab from multilingual data (~100% Indic coverage). Across 8 languages, τ correlates with draft-vocab coverage (Pearson r = **+0.95**) and inversely with tokenization inflation (r = **−0.87**).

## Usage (SGLang)
```bash
python -m sglang.launch_server \
  --model Qwen/Qwen3-8B \
  --speculative-algorithm EAGLE3 \
  --speculative-draft-model-path SwitchXDDD/multilingual-eagle3-qwen3-8b \
  --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \
  --dtype bfloat16
```

## Training
- **Target:** `Qwen/Qwen3-8B` (frozen). **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge), EAGLE-3 **online** mode. Draft config: `qwen3-8b-eagle3.json` (1-layer `LlamaForCausalLMEagle3`, `draft_vocab_size=32000`).
- **Data:** ~2,100 conversations = **Qwen3-8B's own responses** to **FLORES-200** prompts — 700 Hindi + 700 Gujarati + 700 English. Target-regenerated so the draft matches the target's distribution.
- **Recipe:** 5 epochs, lr 1e-4, max-length 4096, bf16, 1× H100.

## Limitations (please read)
- **English regression** (2.37 → ~1.40). A *same-recipe English-only control* also reaches ~1.45, so this is **limited/narrow English training data, not multilingual interference** — but the head is still worse at English than the off-the-shelf head. Mitigation: mix in diverse English (e.g. ShareGPT) when training your own.
- **Training-domain bias:** trained on FLORES (wiki-news). The held-out Aya results above show the recovery largely holds, but expect some domain sensitivity.
- **Single seed released:** seed-to-seed τ varies (Gujarati ± 0.29 over 3 seeds). This is one representative run.
- **Small dataset, not quality/safety-tuned** — a proof-of-concept, not a maximally-optimized head.
- **Lossless:** it does not change model outputs, only decoding speed.

## License & provenance
Weights released under **Apache-2.0** (consistent with Qwen3 and SpecForge). Training prompts are derived from **FLORES-200** (CC-BY-SA-4.0); responses generated by Qwen3-8B (Apache-2.0). Please retain attribution.

## Citation
If you use this head, please cite EAGLE-3, SpecForge, Qwen3, and FLORES-200:
- Li et al., *EAGLE-3* (NeurIPS 2025). SGLang team, *SpecForge*. Qwen team, *Qwen3*. NLLB team, *FLORES-200 / No Language Left Behind*.
- This work: *Cross-Lingual EAGLE-3 for Indic Languages* (link TBD).

*Companion 32B result:* the same degradation→recovery pattern replicates at Qwen3-32B (Gujarati 1.03 → 2.47); that head is validated but **not yet publicly released** (pending a held-out + multi-seed pass).