File size: 3,709 Bytes
a81b86f
 
 
1dc0866
 
a81b86f
0237301
 
 
 
d0c3269
0237301
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b12cfe7
 
 
 
0237301
 
 
b12cfe7
 
 
 
0237301
b12cfe7
 
 
 
 
 
 
 
 
 
 
 
0237301
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a81b86f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
base_model:
- allenai/OLMoE-1B-7B-0125-Instruct
datasets:
- anon8231489123/ShareGPT_Vicuna_unfiltered
---
# OLMoE-1B-7B-Eagle3 Draft Model

This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.

---

## 📦 Included Files

- `pytorch_model.bin`: Trained EAGLE Draft model weights
- `config.json`: Model configuration file (OLMoE architecture)
- `tokenizer_config.json`: Tokenizer configuration file
- `modeling_olmoe_kv.py`: OLMoE-specific model code (required for EAGLE inference)
- `eagle_data.json`: Training dataset (ShareGPT questions + OLMoE-generated answers)
- `.gitattributes`: Git LFS settings, etc.

---

## 🦅 What is the EAGLE Draft Model?

EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)  
by training a **draft decoder layer** separately.

- Fully compatible with **OLMoE-1B-7B-0125-Instruct** architecture
- The EAGLE Draft layer is structurally similar to the main model’s decoder
- During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model

---

## 📝 Training Data Description

- **eagle_data.json**  
  - Only **questions (prompts)** are extracted from the ShareGPT dataset
  - For each question, the **allenai/OLMoE-1B-7B-0125-Instruct** model generates its own answer
  - Thus, the **model’s self-generated answers** are used as ground truth to train the draft layer
  - This approach ensures the draft layer learns a distribution very close to the main model’s decoder,  
    maximizing EAGLE inference performance

---

## 🛠️ Usage

### 1. Using Model Weights/Config Files

- `pytorch_model.bin`, `config.json`, and `tokenizer_config.json`  
  can be used directly with HuggingFace Transformers or EAGLE code.

### 2. Integrating with EAGLE Inference Code

- Copy `modeling_olmoe_kv.py`  
  into the official EAGLE repo at `EAGLE/eagle/model/`.
- In your EAGLE inference script, import as:
  ```python
  from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
  ```

### 3. Example Code

```python
from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
    base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
    ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
    torch_dtype='bfloat16',
    low_cpu_mem_usage=True,
    total_token=-1
)

your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)

output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)
```

---

## ⚠️ Notes

- **eagle_data.json** contains only OLMoE-generated answers for public ShareGPT questions.
- The EAGLE Draft layer should be designed as close as possible to the main model’s decoder  
  for optimal inference efficiency.
- `modeling_olmoe_kv.py` **must** be included in your EAGLE inference code for correct operation.

---

## 📚 References

- [EAGLE: Fast Decoding for Large Language Models](https://github.com/SafeAILab/EAGLE)
- [allenai/OLMoE-1B-7B-0125-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct)
- [ShareGPT Dataset](https://huggingface.co/datasets/sharegpt)

---

For questions or feedback, please open an issue!