RelaxingSnorlax commited on
Commit
de55a01
·
verified ·
1 Parent(s): ecdf199

Add detailed model card

Browse files
Files changed (1) hide show
  1. README.md +47 -17
README.md CHANGED
@@ -1,27 +1,57 @@
1
- # Llama4 Eagle Drafter Model (Test)
 
 
 
 
 
 
 
 
2
 
3
- This is a test Eagle drafter model for Llama4 with proper configuration and vocabulary mappings.
 
 
 
 
4
 
5
  ## Model Details
6
- - **Architecture**: Llama4ForCausalLM (Eagle draft variant)
7
- - **Hidden size**: 2048
8
- - **Layers**: 1 (single decoder layer for Eagle draft)
9
- - **Vocabulary**: 128256 tokens (Llama4)
10
- - **Includes**: d2t and t2d vocabulary mappings
 
 
 
11
 
12
  ## Configuration
13
- - Uses standard Llama4 architecture
14
- - Includes Eagle auxiliary state configuration
15
- - Has vocabulary mapping tensors (d2t/t2d) for draft-to-target conversion
16
- - Extended context support (262k max position embeddings)
 
 
 
17
 
18
  ## Usage
19
- This model is for testing Eagle speculative decoding with Llama4 in vLLM:
20
 
21
- ```bash
22
- vllm serve <llama4-target-model> \
23
- --speculative-config '{"method": "eagle", "model": "nm-testing/llama4-eagle-drafter", ...}'
 
 
 
24
  ```
25
 
26
- ## Testing Purpose
27
- This model contains random weights and is only for vLLM Eagle implementation testing.
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - eagle3
5
+ - speculative-decoding
6
+ - llama4
7
+ - vllm
8
+ - testing
9
+ ---
10
 
11
+ # Llama4 Scout 17B Eagle3 Dummy Drafter
12
+
13
+ This is a **dummy/test drafter model** for testing the Eagle3 speculative decoding implementation with Llama4 Scout 17B Instruct models in vLLM.
14
+
15
+ ⚠️ **WARNING**: This is not a real model and should not be used for actual inference. It contains random weights and is only for testing purposes.
16
 
17
  ## Model Details
18
+
19
+ - **Architecture**: Llama4ForCausalLM (Eagle3 drafter variant)
20
+ - **Target Model**: Llama4 Scout 17B Instruct (specifically `RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16`)
21
+ - **Base Model**: Based on the Instruct version of Llama4 17B Scout model
22
+ - **Hidden Size**: 2048
23
+ - **Layers**: 1 (single decoder layer as per Eagle3 design)
24
+ - **Vocabulary**: 128256 tokens
25
+ - **Parameters**: ~322M
26
 
27
  ## Configuration
28
+
29
+ This drafter model is specifically designed for the Instruct version of Llama4 Scout 17B and uses:
30
+ - Eagle3 speculative decoding architecture
31
+ - Single-layer transformer with auxiliary hidden state combination
32
+ - Llama4 layer structure with RoPE (Rotary Position Embedding)
33
+ - SGLang-compatible weight naming (midlayer.*)
34
+ - Vocabulary mappings (t2d/d2t) for draft-to-target token conversion
35
 
36
  ## Usage
 
37
 
38
+ This model is designed specifically for testing the vLLM Eagle3 implementation:
39
+
40
+ ```python
41
+ # Use with vLLM for testing Eagle3 speculative decoding with Llama4 Scout
42
+ vllm serve RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 \
43
+ --speculative-config '{"method": "eagle3", "model": "nm-testing/llama4-scout-17b-eagle3-dummy-drafter", ...}'
44
  ```
45
 
46
+ ## Testing Purpose Only
47
+
48
+ This model:
49
+ - Contains random weights
50
+ - Is not trained on any data
51
+ - Should not be used for actual inference
52
+ - Is only for vLLM development and testing
53
+
54
+ ## Related
55
+
56
+ - vLLM: https://github.com/vllm-project/vllm
57
+ - Eagle3: Speculative decoding method