---
library_name: transformers
license: apache-2.0
base_model: swiss-ai/Apertus-8B-Instruct-2509
tags:
  - eagle3
  - speculative-decoding
  - draft-model
  - llama
language:
  - en
  - de
  - fr
  - it
pipeline_tag: text-generation
---

# EAGLE3-Apertus-8B-Instruct-2509

An [Eagle3](https://arxiv.org/abs/2503.01840) draft model for speculative decoding with [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509).

## Model Description

This is a lightweight draft model trained to accelerate inference of Apertus-8B-Instruct through speculative decoding. Eagle3 uses a single-layer architecture that predicts future tokens by leveraging the target model's hidden states.

| Property | Value |
|----------|-------|
| Architecture | `LlamaForCausalLMEagle3` |
| Hidden Size | 4096 |
| Intermediate Size | 21504 |
| Attention Heads | 32 |
| KV Heads | 8 |
| Layers | 1 |
| Vocab Size | 131,072 |
| Draft Vocab Size | 32,000 |
| Precision | bfloat16 |
| Parameters | ~513M |

## Training Details

- **Framework**: [SpecForge](https://github.com/sgl-project/SpecForge)
- **Target Model**: swiss-ai/Apertus-8B-Instruct-2509
- **Epochs**: 10
- **Batch Size**: 1 per GPU
- **Learning Rate**: 1e-4
- **Max Sequence Length**: 4096
- **Hardware**: 64 GPUs (16 nodes × 4 GPUs)
- **Precision**: bfloat16

### Training Data

The model was trained on ~375k samples of regenerated conversation data. The dataset consists of prompts from:
- [UltraChat](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
- [ShareGPT](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered)
- [OpenThoughts-114k-math](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math)

The responses were regenerated using Apertus-8B-Instruct-2509 to ensure the draft model learns from the target model's own output distribution.

See: [thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data](https://huggingface.co/datasets/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data)

## Usage

### With vLLM

```bash
VLLM_USE_V1=1 vllm serve swiss-ai/Apertus-8B-Instruct-2509 \
    --speculative-config '{"model": "thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509", "num_speculative_tokens": 3, "method": "eagle3"}'
```

Or in Python:

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="swiss-ai/Apertus-8B-Instruct-2509",
    speculative_config={
        "model": "thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509",
        "num_speculative_tokens": 3,
        "method": "eagle3",
    },
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
outputs = llm.generate(["Hello, how are you?"], sampling_params)
print(outputs[0].outputs[0].text)
```

### With SGLang

```bash
python -m sglang.launch_server \
    --model swiss-ai/Apertus-8B-Instruct-2509 \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509 \
    --speculative-num-steps 5 \
    --speculative-eagle-topk 8 \
    --speculative-num-draft-tokens 32
```

## Continue Training

To resume training from this checkpoint:

1. Clone [SpecForge](https://github.com/sgl-project/SpecForge)
2. Download the training dataset from [thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data](https://huggingface.co/datasets/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data)
3. Download this checkpoint and place it in a subdirectory of your output directory (e.g., `outputs/apertus-8b-eagle3/epoch_9_step_55000/`)
4. Run with `--resume` (it will automatically find the last checkpoint in `--output-dir`):

```bash
NUM_GPUS=4
TP_SIZE=1

torchrun \
    --standalone \
    --nproc_per_node $NUM_GPUS \
    scripts/train_eagle3.py \
    --target-model-path swiss-ai/Apertus-8B-Instruct-2509 \
    --draft-model-config /path/to/configs/apertus-8b-eagle3.json \
    --train-data-path /path/to/merged_train_regen.jsonl \
    --output-dir /path/to/outputs/apertus-8b-eagle3 \
    --num-epochs 15 \
    --batch-size 1 \
    --tp-size $TP_SIZE \
    --learning-rate 1e-4 \
    --max-length 4096 \
    --chat-template apertus \
    --cache-dir /path/to/cache \
    --target-model-backend sglang \
    --resume
```

The `--resume` flag uses `get_last_checkpoint()` to automatically find the most recent checkpoint in the output directory.

## License

Apache 2.0

## Citation

If you use this model, please cite Eagle3:

```bibtex
@article{li2025eagle3,
  title={Eagle 3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
  author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2503.01840},
  year={2025}
}
```

## Acknowledgments

Trained on the [Alps supercomputer](https://www.cscs.ch/computers/alps) at CSCS (Swiss National Supercomputing Centre).