--- license: apache-2.0 tags: - eagle3 - speculative-decoding - llama4 - vllm - testing --- # Llama4 Scout 17B Eagle3 Dummy Drafter This is a **dummy/test drafter model** for testing the Eagle3 speculative decoding implementation with Llama4 Scout 17B Instruct models in vLLM. ⚠️ **WARNING**: This is not a real model and should not be used for actual inference. It contains random weights and is only for testing purposes. ## Model Details - **Architecture**: Llama4ForCausalLM (Eagle3 drafter variant) - **Target Model**: Llama4 Scout 17B Instruct (specifically `RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16`) - **Base Model**: Based on the Instruct version of Llama4 17B Scout model - **Hidden Size**: 2048 - **Layers**: 1 (single decoder layer as per Eagle3 design) - **Vocabulary**: 128256 tokens - **Parameters**: ~322M ## Configuration This drafter model is specifically designed for the Instruct version of Llama4 Scout 17B and uses: - Eagle3 speculative decoding architecture - Single-layer transformer with auxiliary hidden state combination - Llama4 layer structure with RoPE (Rotary Position Embedding) - SGLang-compatible weight naming (midlayer.*) - Vocabulary mappings (t2d/d2t) for draft-to-target token conversion ## Usage This model is designed specifically for testing the vLLM Eagle3 implementation: ```python # Use with vLLM for testing Eagle3 speculative decoding with Llama4 Scout vllm serve RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 \ --speculative-config '{"method": "eagle3", "model": "nm-testing/llama4-scout-17b-eagle3-dummy-drafter", ...}' ``` ## Testing Purpose Only This model: - Contains random weights - Is not trained on any data - Should not be used for actual inference - Is only for vLLM development and testing ## Related - vLLM: https://github.com/vllm-project/vllm - Eagle3: Speculative decoding method