--- base_model: - allenai/OLMoE-1B-7B-0125-Instruct datasets: - anon8231489123/ShareGPT_Vicuna_unfiltered --- # OLMoE-1B-7B-Eagle3 Draft Model This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3. --- ## 📦 Included Files - `pytorch_model.bin`: Trained EAGLE Draft model weights - `config.json`: Model configuration file (OLMoE architecture) - `tokenizer_config.json`: Tokenizer configuration file - `modeling_olmoe_kv.py`: OLMoE-specific model code (required for EAGLE inference) - `eagle_data.json`: Training dataset (ShareGPT questions + OLMoE-generated answers) - `.gitattributes`: Git LFS settings, etc. --- ## 🦅 What is the EAGLE Draft Model? EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs) by training a **draft decoder layer** separately. - Fully compatible with **OLMoE-1B-7B-0125-Instruct** architecture - The EAGLE Draft layer is structurally similar to the main model’s decoder - During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model --- ## 📝 Training Data Description - **eagle_data.json** - Only **questions (prompts)** are extracted from the ShareGPT dataset - For each question, the **allenai/OLMoE-1B-7B-0125-Instruct** model generates its own answer - Thus, the **model’s self-generated answers** are used as ground truth to train the draft layer - This approach ensures the draft layer learns a distribution very close to the main model’s decoder, maximizing EAGLE inference performance --- ## 🛠️ Usage ### 1. Using Model Weights/Config Files - `pytorch_model.bin`, `config.json`, and `tokenizer_config.json` can be used directly with HuggingFace Transformers or EAGLE code. ### 2. Integrating with EAGLE Inference Code - Copy `modeling_olmoe_kv.py` into the official EAGLE repo at `EAGLE/eagle/model/`. - In your EAGLE inference script, import as: ```python from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM ``` ### 3. Example Code ```python from eagle.model.ea_model import EaModel from fastchat.model import get_conversation_template from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct') model = EaModel.from_pretrained( base_model_path='allenai/OLMoE-1B-7B-0125-Instruct', ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3', torch_dtype='bfloat16', low_cpu_mem_usage=True, total_token=-1 ) your_message = "Why we study math?" conv = get_conversation_template("vicuna") conv.append_message(conv.roles[0], your_message) conv.append_message(conv.roles[1], None) prompt = conv.get_prompt() input_ids = model.tokenizer([prompt]).input_ids input_ids = torch.as_tensor(input_ids).to(DEVICE) output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8) output = model.tokenizer.decode(output_ids[0]) print(output) ``` --- ## ⚠️ Notes - **eagle_data.json** contains only OLMoE-generated answers for public ShareGPT questions. - The EAGLE Draft layer should be designed as close as possible to the main model’s decoder for optimal inference efficiency. - `modeling_olmoe_kv.py` **must** be included in your EAGLE inference code for correct operation. --- ## 📚 References - [EAGLE: Fast Decoding for Large Language Models](https://github.com/SafeAILab/EAGLE) - [allenai/OLMoE-1B-7B-0125-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct) - [ShareGPT Dataset](https://huggingface.co/datasets/sharegpt) --- For questions or feedback, please open an issue!