File size: 3,709 Bytes

---
base_model:
- allenai/OLMoE-1B-7B-0125-Instruct
datasets:
- anon8231489123/ShareGPT_Vicuna_unfiltered
---
# OLMoE-1B-7B-Eagle3 Draft Model

This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.

---

## 📦 Included Files

- `pytorch_model.bin`: Trained EAGLE Draft model weights
- `config.json`: Model configuration file (OLMoE architecture)
- `tokenizer_config.json`: Tokenizer configuration file
- `modeling_olmoe_kv.py`: OLMoE-specific model code (required for EAGLE inference)
- `eagle_data.json`: Training dataset (ShareGPT questions + OLMoE-generated answers)
- `.gitattributes`: Git LFS settings, etc.

---

## 🦅 What is the EAGLE Draft Model?

EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)  
by training a **draft decoder layer** separately.

- Fully compatible with **OLMoE-1B-7B-0125-Instruct** architecture
- The EAGLE Draft layer is structurally similar to the main model’s decoder
- During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model

---

## 📝 Training Data Description

- **eagle_data.json**  
  - Only **questions (prompts)** are extracted from the ShareGPT dataset
  - For each question, the **allenai/OLMoE-1B-7B-0125-Instruct** model generates its own answer
  - Thus, the **model’s self-generated answers** are used as ground truth to train the draft layer
  - This approach ensures the draft layer learns a distribution very close to the main model’s decoder,  
    maximizing EAGLE inference performance

---

## 🛠️ Usage

### 1. Using Model Weights/Config Files

- `pytorch_model.bin`, `config.json`, and `tokenizer_config.json`  
  can be used directly with HuggingFace Transformers or EAGLE code.

### 2. Integrating with EAGLE Inference Code

- Copy `modeling_olmoe_kv.py`  
  into the official EAGLE repo at `EAGLE/eagle/model/`.
- In your EAGLE inference script, import as:
  ```python
  from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
  ```

### 3. Example Code

```python
from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
    base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
    ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
    torch_dtype='bfloat16',
    low_cpu_mem_usage=True,
    total_token=-1
)

your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)

output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)
```

---

## ⚠️ Notes

- **eagle_data.json** contains only OLMoE-generated answers for public ShareGPT questions.
- The EAGLE Draft layer should be designed as close as possible to the main model’s decoder  
  for optimal inference efficiency.
- `modeling_olmoe_kv.py` **must** be included in your EAGLE inference code for correct operation.

---

## 📚 References

- [EAGLE: Fast Decoding for Large Language Models](https://github.com/SafeAILab/EAGLE)
- [allenai/OLMoE-1B-7B-0125-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct)
- [ShareGPT Dataset](https://huggingface.co/datasets/sharegpt)

---

For questions or feedback, please open an issue!