File size: 3,709 Bytes
a81b86f 1dc0866 a81b86f 0237301 d0c3269 0237301 b12cfe7 0237301 b12cfe7 0237301 b12cfe7 0237301 a81b86f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | ---
base_model:
- allenai/OLMoE-1B-7B-0125-Instruct
datasets:
- anon8231489123/ShareGPT_Vicuna_unfiltered
---
# OLMoE-1B-7B-Eagle3 Draft Model
This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.
---
## 📦 Included Files
- `pytorch_model.bin`: Trained EAGLE Draft model weights
- `config.json`: Model configuration file (OLMoE architecture)
- `tokenizer_config.json`: Tokenizer configuration file
- `modeling_olmoe_kv.py`: OLMoE-specific model code (required for EAGLE inference)
- `eagle_data.json`: Training dataset (ShareGPT questions + OLMoE-generated answers)
- `.gitattributes`: Git LFS settings, etc.
---
## 🦅 What is the EAGLE Draft Model?
EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)
by training a **draft decoder layer** separately.
- Fully compatible with **OLMoE-1B-7B-0125-Instruct** architecture
- The EAGLE Draft layer is structurally similar to the main model’s decoder
- During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model
---
## 📝 Training Data Description
- **eagle_data.json**
- Only **questions (prompts)** are extracted from the ShareGPT dataset
- For each question, the **allenai/OLMoE-1B-7B-0125-Instruct** model generates its own answer
- Thus, the **model’s self-generated answers** are used as ground truth to train the draft layer
- This approach ensures the draft layer learns a distribution very close to the main model’s decoder,
maximizing EAGLE inference performance
---
## 🛠️ Usage
### 1. Using Model Weights/Config Files
- `pytorch_model.bin`, `config.json`, and `tokenizer_config.json`
can be used directly with HuggingFace Transformers or EAGLE code.
### 2. Integrating with EAGLE Inference Code
- Copy `modeling_olmoe_kv.py`
into the official EAGLE repo at `EAGLE/eagle/model/`.
- In your EAGLE inference script, import as:
```python
from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
```
### 3. Example Code
```python
from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
torch_dtype='bfloat16',
low_cpu_mem_usage=True,
total_token=-1
)
your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)
output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)
```
---
## ⚠️ Notes
- **eagle_data.json** contains only OLMoE-generated answers for public ShareGPT questions.
- The EAGLE Draft layer should be designed as close as possible to the main model’s decoder
for optimal inference efficiency.
- `modeling_olmoe_kv.py` **must** be included in your EAGLE inference code for correct operation.
---
## 📚 References
- [EAGLE: Fast Decoding for Large Language Models](https://github.com/SafeAILab/EAGLE)
- [allenai/OLMoE-1B-7B-0125-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct)
- [ShareGPT Dataset](https://huggingface.co/datasets/sharegpt)
---
For questions or feedback, please open an issue! |