nm-testing
/

llama4-scout-17b-eagle3-dummy-drafter

speculative-decoding

Model card Files Files and versions

RelaxingSnorlax commited on Aug 27, 2025

Commit

de55a01

·

verified ·

1 Parent(s): ecdf199

Add detailed model card

Files changed (1) hide show

README.md +47 -17

README.md CHANGED Viewed

@@ -1,27 +1,57 @@
-# Llama4 Eagle Drafter Model (Test)
-This is a test Eagle drafter model for Llama4 with proper configuration and vocabulary mappings.
 ## Model Details
-- **Architecture**: Llama4ForCausalLM (Eagle draft variant)
-- **Hidden size**: 2048
-- **Layers**: 1 (single decoder layer for Eagle draft)
-- **Vocabulary**: 128256 tokens (Llama4)
-- **Includes**: d2t and t2d vocabulary mappings
 ## Configuration
-- Uses standard Llama4 architecture
-- Includes Eagle auxiliary state configuration
-- Has vocabulary mapping tensors (d2t/t2d) for draft-to-target conversion
-- Extended context support (262k max position embeddings)
 ## Usage
-This model is for testing Eagle speculative decoding with Llama4 in vLLM:
-```bash
-vllm serve <llama4-target-model> \
-    --speculative-config '{"method": "eagle", "model": "nm-testing/llama4-eagle-drafter", ...}'
 ```
-## Testing Purpose
-This model contains random weights and is only for vLLM Eagle implementation testing.

+---
+license: apache-2.0
+tags:
+- eagle3
+- speculative-decoding
+- llama4
+- vllm
+- testing
+---
+# Llama4 Scout 17B Eagle3 Dummy Drafter
+This is a **dummy/test drafter model** for testing the Eagle3 speculative decoding implementation with Llama4 Scout 17B Instruct models in vLLM.
+⚠️ **WARNING**: This is not a real model and should not be used for actual inference. It contains random weights and is only for testing purposes.
 ## Model Details
+- **Architecture**: Llama4ForCausalLM (Eagle3 drafter variant)
+- **Target Model**: Llama4 Scout 17B Instruct (specifically `RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16`)
+- **Base Model**: Based on the Instruct version of Llama4 17B Scout model
+- **Hidden Size**: 2048
+- **Layers**: 1 (single decoder layer as per Eagle3 design)
+- **Vocabulary**: 128256 tokens
+- **Parameters**: ~322M
 ## Configuration
+This drafter model is specifically designed for the Instruct version of Llama4 Scout 17B and uses:
+- Eagle3 speculative decoding architecture
+- Single-layer transformer with auxiliary hidden state combination
+- Llama4 layer structure with RoPE (Rotary Position Embedding)
+- SGLang-compatible weight naming (midlayer.*)
+- Vocabulary mappings (t2d/d2t) for draft-to-target token conversion
 ## Usage
+This model is designed specifically for testing the vLLM Eagle3 implementation:
+```python
+# Use with vLLM for testing Eagle3 speculative decoding with Llama4 Scout
+vllm serve RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 \
+    --speculative-config '{"method": "eagle3", "model": "nm-testing/llama4-scout-17b-eagle3-dummy-drafter", ...}'
 ```
+## Testing Purpose Only
+This model:
+- Contains random weights
+- Is not trained on any data
+- Should not be used for actual inference
+- Is only for vLLM development and testing
+## Related
+- vLLM: https://github.com/vllm-project/vllm
+- Eagle3: Speculative decoding method