INC4AI commited on
Commit
fcc0541
·
verified ·
1 Parent(s): 476f6c3

Upload quantized model TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning

Browse files
README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - FlameF0X/TinyMoE-100m-2x8-ultrachat
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - quantized
7
+ - nvfp4
8
+ - tuning
9
+ - low-bit-open-llm-leaderboard
10
+ ---
11
+
12
+ # TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning
13
+
14
+ ## Model Details
15
+
16
+ This model is a NVFP4 (NVIDIA FP4) quantization of [FlameF0X/TinyMoE-100m-2x8-ultrachat](https://huggingface.co/FlameF0X/TinyMoE-100m-2x8-ultrachat) generated by TUNING. Please follow the license of the original model.
17
+
18
+ ## Quantization Details
19
+
20
+ | Attribute | Value |
21
+ |-----------|-------|
22
+ | Base Model | [FlameF0X/TinyMoE-100m-2x8-ultrachat](https://huggingface.co/FlameF0X/TinyMoE-100m-2x8-ultrachat) |
23
+ | Quantization Tool | TUNING |
24
+ | Quantization Scheme | NVFP4 |
25
+ | Quantized Size | 94 MB |
26
+
27
+ ## Evaluation Results
28
+
29
+ | Task | Accuracy |
30
+ |------|----------|
31
+ | hellaswag | 0.2568 |
32
+ | mmlu | 0.2295 |
33
+ | mmlu_abstract_algebra | 0.2200 |
34
+ | mmlu_anatomy | 0.1852 |
35
+ | mmlu_astronomy | 0.1776 |
36
+ | mmlu_business_ethics | 0.3000 |
37
+ | mmlu_clinical_knowledge | 0.2151 |
38
+ | mmlu_college_biology | 0.2569 |
39
+ | mmlu_college_chemistry | 0.1800 |
40
+ | mmlu_college_computer_science | 0.2600 |
41
+ | mmlu_college_mathematics | 0.2100 |
42
+ | mmlu_college_medicine | 0.2081 |
43
+ | mmlu_college_physics | 0.2157 |
44
+ | mmlu_computer_security | 0.2800 |
45
+ | mmlu_conceptual_physics | 0.2638 |
46
+ | mmlu_econometrics | 0.2368 |
47
+ | mmlu_electrical_engineering | 0.2414 |
48
+ | mmlu_elementary_mathematics | 0.2116 |
49
+ | mmlu_formal_logic | 0.2778 |
50
+ | mmlu_global_facts | 0.1800 |
51
+ | mmlu_high_school_biology | 0.1774 |
52
+ | mmlu_high_school_chemistry | 0.1527 |
53
+ | mmlu_high_school_computer_science | 0.2500 |
54
+ | mmlu_high_school_european_history | 0.2182 |
55
+ | mmlu_high_school_geography | 0.1768 |
56
+ | mmlu_high_school_government_and_politics | 0.1969 |
57
+ | mmlu_high_school_macroeconomics | 0.2026 |
58
+ | mmlu_high_school_mathematics | 0.2111 |
59
+ | mmlu_high_school_microeconomics | 0.2101 |
60
+ | mmlu_high_school_physics | 0.1987 |
61
+ | mmlu_high_school_psychology | 0.1927 |
62
+ | mmlu_high_school_statistics | 0.1528 |
63
+ | mmlu_high_school_us_history | 0.2500 |
64
+ | mmlu_high_school_world_history | 0.2700 |
65
+ | mmlu_human_aging | 0.3184 |
66
+ | mmlu_human_sexuality | 0.2595 |
67
+ | mmlu_humanities | 0.2419 |
68
+ | mmlu_international_law | 0.2397 |
69
+ | mmlu_jurisprudence | 0.2593 |
70
+ | mmlu_logical_fallacies | 0.2209 |
71
+ | mmlu_machine_learning | 0.3125 |
72
+ | mmlu_management | 0.1748 |
73
+ | mmlu_marketing | 0.2906 |
74
+ | mmlu_medical_genetics | 0.3000 |
75
+ | mmlu_miscellaneous | 0.2375 |
76
+ | mmlu_moral_disputes | 0.2486 |
77
+ | mmlu_moral_scenarios | 0.2380 |
78
+ | mmlu_nutrition | 0.2288 |
79
+ | mmlu_other | 0.2404 |
80
+ | mmlu_philosophy | 0.1865 |
81
+ | mmlu_prehistory | 0.2160 |
82
+ | mmlu_professional_accounting | 0.2340 |
83
+ | mmlu_professional_law | 0.2458 |
84
+ | mmlu_professional_medicine | 0.1838 |
85
+ | mmlu_professional_psychology | 0.2500 |
86
+ | mmlu_public_relations | 0.2182 |
87
+ | mmlu_security_studies | 0.1878 |
88
+ | mmlu_social_sciences | 0.2171 |
89
+ | mmlu_sociology | 0.2438 |
90
+ | mmlu_stem | 0.2122 |
91
+ | mmlu_us_foreign_policy | 0.2800 |
92
+ | mmlu_virology | 0.2831 |
93
+ | mmlu_world_religions | 0.3216 |
94
+ | piqa | 0.5256 |
95
+
96
+ ## How to Use
97
+
98
+ ### HF Usage
99
+
100
+ **Step 1: Install [AutoRound](https://github.com/intel/auto-round)**
101
+
102
+ ```bash
103
+ pip install auto-round
104
+ ```
105
+
106
+ **Step 2: Load and run the quantized model**
107
+
108
+ ```python
109
+ from transformers import AutoModelForCausalLM, AutoTokenizer
110
+
111
+ model_name = "TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning"
112
+
113
+ # load the tokenizer and the model
114
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
115
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
116
+
117
+ # prepare the model input
118
+ prompt = "Write a quick sort algorithm."
119
+ messages = [{"role": "user", "content": prompt}]
120
+ text = tokenizer.apply_chat_template(
121
+ messages,
122
+ tokenize=False,
123
+ add_generation_prompt=True,
124
+ )
125
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
126
+
127
+ # conduct text completion
128
+ generated_ids = model.generate(**model_inputs, max_new_tokens=512)
129
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
130
+
131
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
132
+ print("content:", content)
133
+ ```
134
+
135
+ ### VLLM Usage
136
+
137
+ ```bash
138
+ vllm serve TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning \
139
+ --trust-remote-code \
140
+ --dtype bfloat16 \
141
+ --tensor_parallel_size 1
142
+ ```
143
+
144
+ If you encounter any issues, feel free to open an issue on the [AutoRound GitHub repo](https://github.com/intel/auto-round/issues) or provide feedback on the [Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard).
145
+
146
+ ## Ethical Considerations and Limitations
147
+
148
+ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
149
+ Therefore, before deploying any applications of the model, developers should perform safety testing.
150
+
151
+ ## Caveats and Recommendations
152
+
153
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
154
+ Here are a couple of useful links to learn more about Intel's AI software:
155
+
156
+ - [Intel Neural Compressor](https://github.com/intel/neural-compressor)
157
+ - [AutoRound](https://github.com/intel/auto-round)
158
+
159
+ ## Disclaimer
160
+
161
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
162
+
163
+ ## Cite
164
+
165
+ ```
166
+ @article{cheng2023optimize,
167
+ title={Optimize weight rounding via signed gradient descent for the quantization of llms},
168
+ author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi},
169
+ journal={arXiv preprint arXiv:2309.05516},
170
+ year={2023}
171
+ }
172
+ ```
173
+
174
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
175
+
176
+ ---
177
+
178
+ *This model is part of the [Intel Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard) initiative.*
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,312 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MixtralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 32002,
9
+ "head_dim": null,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 768,
14
+ "max_position_embeddings": 1024,
15
+ "model_type": "mixtral",
16
+ "num_attention_heads": 8,
17
+ "num_experts_per_tok": 2,
18
+ "num_hidden_layers": 10,
19
+ "num_key_value_heads": 4,
20
+ "num_local_experts": 8,
21
+ "output_router_logits": false,
22
+ "pad_token_id": 2,
23
+ "quantization_config": {
24
+ "act_bits": 4,
25
+ "act_data_type": "nv_fp4_with_static_gs",
26
+ "act_dynamic": true,
27
+ "act_group_size": 16,
28
+ "act_sym": true,
29
+ "autoround_version": "0.13.1",
30
+ "bits": 4,
31
+ "block_name_to_quantize": "model.layers",
32
+ "data_type": "nv_fp",
33
+ "extra_config": {
34
+ ".*mlp\\.gate.*": {
35
+ "act_bits": 16,
36
+ "act_data_type": "float",
37
+ "bits": 16,
38
+ "data_type": "float"
39
+ },
40
+ ".*model\\.layers\\.[0-9]\\.mlp\\.gate.*": {
41
+ "act_bits": 16,
42
+ "act_data_type": "float",
43
+ "bits": 16,
44
+ "data_type": "float"
45
+ },
46
+ ".*self_attn.*": {
47
+ "act_bits": 16,
48
+ "act_data_type": "float",
49
+ "bits": 16,
50
+ "data_type": "float"
51
+ },
52
+ "model.layers.0.self_attn.k_proj": {
53
+ "act_bits": 16,
54
+ "act_data_type": "float",
55
+ "bits": 16,
56
+ "data_type": "float"
57
+ },
58
+ "model.layers.0.self_attn.o_proj": {
59
+ "act_bits": 16,
60
+ "act_data_type": "float",
61
+ "bits": 16,
62
+ "data_type": "float"
63
+ },
64
+ "model.layers.0.self_attn.q_proj": {
65
+ "act_bits": 16,
66
+ "act_data_type": "float",
67
+ "bits": 16,
68
+ "data_type": "float"
69
+ },
70
+ "model.layers.0.self_attn.v_proj": {
71
+ "act_bits": 16,
72
+ "act_data_type": "float",
73
+ "bits": 16,
74
+ "data_type": "float"
75
+ },
76
+ "model.layers.1.self_attn.k_proj": {
77
+ "act_bits": 16,
78
+ "act_data_type": "float",
79
+ "bits": 16,
80
+ "data_type": "float"
81
+ },
82
+ "model.layers.1.self_attn.o_proj": {
83
+ "act_bits": 16,
84
+ "act_data_type": "float",
85
+ "bits": 16,
86
+ "data_type": "float"
87
+ },
88
+ "model.layers.1.self_attn.q_proj": {
89
+ "act_bits": 16,
90
+ "act_data_type": "float",
91
+ "bits": 16,
92
+ "data_type": "float"
93
+ },
94
+ "model.layers.1.self_attn.v_proj": {
95
+ "act_bits": 16,
96
+ "act_data_type": "float",
97
+ "bits": 16,
98
+ "data_type": "float"
99
+ },
100
+ "model.layers.2.self_attn.k_proj": {
101
+ "act_bits": 16,
102
+ "act_data_type": "float",
103
+ "bits": 16,
104
+ "data_type": "float"
105
+ },
106
+ "model.layers.2.self_attn.o_proj": {
107
+ "act_bits": 16,
108
+ "act_data_type": "float",
109
+ "bits": 16,
110
+ "data_type": "float"
111
+ },
112
+ "model.layers.2.self_attn.q_proj": {
113
+ "act_bits": 16,
114
+ "act_data_type": "float",
115
+ "bits": 16,
116
+ "data_type": "float"
117
+ },
118
+ "model.layers.2.self_attn.v_proj": {
119
+ "act_bits": 16,
120
+ "act_data_type": "float",
121
+ "bits": 16,
122
+ "data_type": "float"
123
+ },
124
+ "model.layers.3.self_attn.k_proj": {
125
+ "act_bits": 16,
126
+ "act_data_type": "float",
127
+ "bits": 16,
128
+ "data_type": "float"
129
+ },
130
+ "model.layers.3.self_attn.o_proj": {
131
+ "act_bits": 16,
132
+ "act_data_type": "float",
133
+ "bits": 16,
134
+ "data_type": "float"
135
+ },
136
+ "model.layers.3.self_attn.q_proj": {
137
+ "act_bits": 16,
138
+ "act_data_type": "float",
139
+ "bits": 16,
140
+ "data_type": "float"
141
+ },
142
+ "model.layers.3.self_attn.v_proj": {
143
+ "act_bits": 16,
144
+ "act_data_type": "float",
145
+ "bits": 16,
146
+ "data_type": "float"
147
+ },
148
+ "model.layers.4.self_attn.k_proj": {
149
+ "act_bits": 16,
150
+ "act_data_type": "float",
151
+ "bits": 16,
152
+ "data_type": "float"
153
+ },
154
+ "model.layers.4.self_attn.o_proj": {
155
+ "act_bits": 16,
156
+ "act_data_type": "float",
157
+ "bits": 16,
158
+ "data_type": "float"
159
+ },
160
+ "model.layers.4.self_attn.q_proj": {
161
+ "act_bits": 16,
162
+ "act_data_type": "float",
163
+ "bits": 16,
164
+ "data_type": "float"
165
+ },
166
+ "model.layers.4.self_attn.v_proj": {
167
+ "act_bits": 16,
168
+ "act_data_type": "float",
169
+ "bits": 16,
170
+ "data_type": "float"
171
+ },
172
+ "model.layers.5.self_attn.k_proj": {
173
+ "act_bits": 16,
174
+ "act_data_type": "float",
175
+ "bits": 16,
176
+ "data_type": "float"
177
+ },
178
+ "model.layers.5.self_attn.o_proj": {
179
+ "act_bits": 16,
180
+ "act_data_type": "float",
181
+ "bits": 16,
182
+ "data_type": "float"
183
+ },
184
+ "model.layers.5.self_attn.q_proj": {
185
+ "act_bits": 16,
186
+ "act_data_type": "float",
187
+ "bits": 16,
188
+ "data_type": "float"
189
+ },
190
+ "model.layers.5.self_attn.v_proj": {
191
+ "act_bits": 16,
192
+ "act_data_type": "float",
193
+ "bits": 16,
194
+ "data_type": "float"
195
+ },
196
+ "model.layers.6.self_attn.k_proj": {
197
+ "act_bits": 16,
198
+ "act_data_type": "float",
199
+ "bits": 16,
200
+ "data_type": "float"
201
+ },
202
+ "model.layers.6.self_attn.o_proj": {
203
+ "act_bits": 16,
204
+ "act_data_type": "float",
205
+ "bits": 16,
206
+ "data_type": "float"
207
+ },
208
+ "model.layers.6.self_attn.q_proj": {
209
+ "act_bits": 16,
210
+ "act_data_type": "float",
211
+ "bits": 16,
212
+ "data_type": "float"
213
+ },
214
+ "model.layers.6.self_attn.v_proj": {
215
+ "act_bits": 16,
216
+ "act_data_type": "float",
217
+ "bits": 16,
218
+ "data_type": "float"
219
+ },
220
+ "model.layers.7.self_attn.k_proj": {
221
+ "act_bits": 16,
222
+ "act_data_type": "float",
223
+ "bits": 16,
224
+ "data_type": "float"
225
+ },
226
+ "model.layers.7.self_attn.o_proj": {
227
+ "act_bits": 16,
228
+ "act_data_type": "float",
229
+ "bits": 16,
230
+ "data_type": "float"
231
+ },
232
+ "model.layers.7.self_attn.q_proj": {
233
+ "act_bits": 16,
234
+ "act_data_type": "float",
235
+ "bits": 16,
236
+ "data_type": "float"
237
+ },
238
+ "model.layers.7.self_attn.v_proj": {
239
+ "act_bits": 16,
240
+ "act_data_type": "float",
241
+ "bits": 16,
242
+ "data_type": "float"
243
+ },
244
+ "model.layers.8.self_attn.k_proj": {
245
+ "act_bits": 16,
246
+ "act_data_type": "float",
247
+ "bits": 16,
248
+ "data_type": "float"
249
+ },
250
+ "model.layers.8.self_attn.o_proj": {
251
+ "act_bits": 16,
252
+ "act_data_type": "float",
253
+ "bits": 16,
254
+ "data_type": "float"
255
+ },
256
+ "model.layers.8.self_attn.q_proj": {
257
+ "act_bits": 16,
258
+ "act_data_type": "float",
259
+ "bits": 16,
260
+ "data_type": "float"
261
+ },
262
+ "model.layers.8.self_attn.v_proj": {
263
+ "act_bits": 16,
264
+ "act_data_type": "float",
265
+ "bits": 16,
266
+ "data_type": "float"
267
+ },
268
+ "model.layers.9.self_attn.k_proj": {
269
+ "act_bits": 16,
270
+ "act_data_type": "float",
271
+ "bits": 16,
272
+ "data_type": "float"
273
+ },
274
+ "model.layers.9.self_attn.o_proj": {
275
+ "act_bits": 16,
276
+ "act_data_type": "float",
277
+ "bits": 16,
278
+ "data_type": "float"
279
+ },
280
+ "model.layers.9.self_attn.q_proj": {
281
+ "act_bits": 16,
282
+ "act_data_type": "float",
283
+ "bits": 16,
284
+ "data_type": "float"
285
+ },
286
+ "model.layers.9.self_attn.v_proj": {
287
+ "act_bits": 16,
288
+ "act_data_type": "float",
289
+ "bits": 16,
290
+ "data_type": "float"
291
+ }
292
+ },
293
+ "group_size": 16,
294
+ "low_gpu_mem_usage": true,
295
+ "packing_format": "auto_round:llm_compressor",
296
+ "quant_method": "auto-round",
297
+ "seqlen": 1024,
298
+ "sym": true
299
+ },
300
+ "rms_norm_eps": 1e-06,
301
+ "rope_parameters": {
302
+ "rope_theta": 1000000.0,
303
+ "rope_type": "default"
304
+ },
305
+ "router_aux_loss_coef": 0.001,
306
+ "router_jitter_noise": 0.0,
307
+ "sliding_window": 1024,
308
+ "tie_word_embeddings": false,
309
+ "transformers_version": "5.12.1",
310
+ "use_cache": false,
311
+ "vocab_size": 32064
312
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 32002
7
+ ],
8
+ "output_attentions": false,
9
+ "output_hidden_states": false,
10
+ "pad_token_id": 2,
11
+ "transformers_version": "5.12.1",
12
+ "use_cache": false
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:481651ce6bc3298ecef69d379b72d6d859426615cef0475a4b9328cc36650c79
3
+ size 98124864
quantization_config.json ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "act_bits": 4,
4
+ "data_type": "nv_fp",
5
+ "act_data_type": "nv_fp4_with_static_gs",
6
+ "group_size": 16,
7
+ "act_group_size": 16,
8
+ "sym": true,
9
+ "act_sym": true,
10
+ "act_dynamic": true,
11
+ "low_gpu_mem_usage": true,
12
+ "seqlen": 1024,
13
+ "autoround_version": "0.13.1",
14
+ "block_name_to_quantize": "model.layers",
15
+ "quant_method": "auto-round",
16
+ "packing_format": "auto_round:llm_compressor",
17
+ "extra_config": {
18
+ "model.layers.0.self_attn.q_proj": {
19
+ "bits": 16,
20
+ "data_type": "float",
21
+ "act_bits": 16,
22
+ "act_data_type": "float"
23
+ },
24
+ "model.layers.0.self_attn.k_proj": {
25
+ "bits": 16,
26
+ "data_type": "float",
27
+ "act_bits": 16,
28
+ "act_data_type": "float"
29
+ },
30
+ "model.layers.0.self_attn.v_proj": {
31
+ "bits": 16,
32
+ "data_type": "float",
33
+ "act_bits": 16,
34
+ "act_data_type": "float"
35
+ },
36
+ "model.layers.0.self_attn.o_proj": {
37
+ "bits": 16,
38
+ "data_type": "float",
39
+ "act_bits": 16,
40
+ "act_data_type": "float"
41
+ },
42
+ "model.layers.1.self_attn.q_proj": {
43
+ "bits": 16,
44
+ "data_type": "float",
45
+ "act_bits": 16,
46
+ "act_data_type": "float"
47
+ },
48
+ "model.layers.1.self_attn.k_proj": {
49
+ "bits": 16,
50
+ "data_type": "float",
51
+ "act_bits": 16,
52
+ "act_data_type": "float"
53
+ },
54
+ "model.layers.1.self_attn.v_proj": {
55
+ "bits": 16,
56
+ "data_type": "float",
57
+ "act_bits": 16,
58
+ "act_data_type": "float"
59
+ },
60
+ "model.layers.1.self_attn.o_proj": {
61
+ "bits": 16,
62
+ "data_type": "float",
63
+ "act_bits": 16,
64
+ "act_data_type": "float"
65
+ },
66
+ "model.layers.2.self_attn.q_proj": {
67
+ "bits": 16,
68
+ "data_type": "float",
69
+ "act_bits": 16,
70
+ "act_data_type": "float"
71
+ },
72
+ "model.layers.2.self_attn.k_proj": {
73
+ "bits": 16,
74
+ "data_type": "float",
75
+ "act_bits": 16,
76
+ "act_data_type": "float"
77
+ },
78
+ "model.layers.2.self_attn.v_proj": {
79
+ "bits": 16,
80
+ "data_type": "float",
81
+ "act_bits": 16,
82
+ "act_data_type": "float"
83
+ },
84
+ "model.layers.2.self_attn.o_proj": {
85
+ "bits": 16,
86
+ "data_type": "float",
87
+ "act_bits": 16,
88
+ "act_data_type": "float"
89
+ },
90
+ "model.layers.3.self_attn.q_proj": {
91
+ "bits": 16,
92
+ "data_type": "float",
93
+ "act_bits": 16,
94
+ "act_data_type": "float"
95
+ },
96
+ "model.layers.3.self_attn.k_proj": {
97
+ "bits": 16,
98
+ "data_type": "float",
99
+ "act_bits": 16,
100
+ "act_data_type": "float"
101
+ },
102
+ "model.layers.3.self_attn.v_proj": {
103
+ "bits": 16,
104
+ "data_type": "float",
105
+ "act_bits": 16,
106
+ "act_data_type": "float"
107
+ },
108
+ "model.layers.3.self_attn.o_proj": {
109
+ "bits": 16,
110
+ "data_type": "float",
111
+ "act_bits": 16,
112
+ "act_data_type": "float"
113
+ },
114
+ "model.layers.4.self_attn.q_proj": {
115
+ "bits": 16,
116
+ "data_type": "float",
117
+ "act_bits": 16,
118
+ "act_data_type": "float"
119
+ },
120
+ "model.layers.4.self_attn.k_proj": {
121
+ "bits": 16,
122
+ "data_type": "float",
123
+ "act_bits": 16,
124
+ "act_data_type": "float"
125
+ },
126
+ "model.layers.4.self_attn.v_proj": {
127
+ "bits": 16,
128
+ "data_type": "float",
129
+ "act_bits": 16,
130
+ "act_data_type": "float"
131
+ },
132
+ "model.layers.4.self_attn.o_proj": {
133
+ "bits": 16,
134
+ "data_type": "float",
135
+ "act_bits": 16,
136
+ "act_data_type": "float"
137
+ },
138
+ "model.layers.5.self_attn.q_proj": {
139
+ "bits": 16,
140
+ "data_type": "float",
141
+ "act_bits": 16,
142
+ "act_data_type": "float"
143
+ },
144
+ "model.layers.5.self_attn.k_proj": {
145
+ "bits": 16,
146
+ "data_type": "float",
147
+ "act_bits": 16,
148
+ "act_data_type": "float"
149
+ },
150
+ "model.layers.5.self_attn.v_proj": {
151
+ "bits": 16,
152
+ "data_type": "float",
153
+ "act_bits": 16,
154
+ "act_data_type": "float"
155
+ },
156
+ "model.layers.5.self_attn.o_proj": {
157
+ "bits": 16,
158
+ "data_type": "float",
159
+ "act_bits": 16,
160
+ "act_data_type": "float"
161
+ },
162
+ "model.layers.6.self_attn.q_proj": {
163
+ "bits": 16,
164
+ "data_type": "float",
165
+ "act_bits": 16,
166
+ "act_data_type": "float"
167
+ },
168
+ "model.layers.6.self_attn.k_proj": {
169
+ "bits": 16,
170
+ "data_type": "float",
171
+ "act_bits": 16,
172
+ "act_data_type": "float"
173
+ },
174
+ "model.layers.6.self_attn.v_proj": {
175
+ "bits": 16,
176
+ "data_type": "float",
177
+ "act_bits": 16,
178
+ "act_data_type": "float"
179
+ },
180
+ "model.layers.6.self_attn.o_proj": {
181
+ "bits": 16,
182
+ "data_type": "float",
183
+ "act_bits": 16,
184
+ "act_data_type": "float"
185
+ },
186
+ "model.layers.7.self_attn.q_proj": {
187
+ "bits": 16,
188
+ "data_type": "float",
189
+ "act_bits": 16,
190
+ "act_data_type": "float"
191
+ },
192
+ "model.layers.7.self_attn.k_proj": {
193
+ "bits": 16,
194
+ "data_type": "float",
195
+ "act_bits": 16,
196
+ "act_data_type": "float"
197
+ },
198
+ "model.layers.7.self_attn.v_proj": {
199
+ "bits": 16,
200
+ "data_type": "float",
201
+ "act_bits": 16,
202
+ "act_data_type": "float"
203
+ },
204
+ "model.layers.7.self_attn.o_proj": {
205
+ "bits": 16,
206
+ "data_type": "float",
207
+ "act_bits": 16,
208
+ "act_data_type": "float"
209
+ },
210
+ "model.layers.8.self_attn.q_proj": {
211
+ "bits": 16,
212
+ "data_type": "float",
213
+ "act_bits": 16,
214
+ "act_data_type": "float"
215
+ },
216
+ "model.layers.8.self_attn.k_proj": {
217
+ "bits": 16,
218
+ "data_type": "float",
219
+ "act_bits": 16,
220
+ "act_data_type": "float"
221
+ },
222
+ "model.layers.8.self_attn.v_proj": {
223
+ "bits": 16,
224
+ "data_type": "float",
225
+ "act_bits": 16,
226
+ "act_data_type": "float"
227
+ },
228
+ "model.layers.8.self_attn.o_proj": {
229
+ "bits": 16,
230
+ "data_type": "float",
231
+ "act_bits": 16,
232
+ "act_data_type": "float"
233
+ },
234
+ "model.layers.9.self_attn.q_proj": {
235
+ "bits": 16,
236
+ "data_type": "float",
237
+ "act_bits": 16,
238
+ "act_data_type": "float"
239
+ },
240
+ "model.layers.9.self_attn.k_proj": {
241
+ "bits": 16,
242
+ "data_type": "float",
243
+ "act_bits": 16,
244
+ "act_data_type": "float"
245
+ },
246
+ "model.layers.9.self_attn.v_proj": {
247
+ "bits": 16,
248
+ "data_type": "float",
249
+ "act_bits": 16,
250
+ "act_data_type": "float"
251
+ },
252
+ "model.layers.9.self_attn.o_proj": {
253
+ "bits": 16,
254
+ "data_type": "float",
255
+ "act_bits": 16,
256
+ "act_data_type": "float"
257
+ },
258
+ ".*mlp\\.gate.*": {
259
+ "bits": 16,
260
+ "data_type": "float",
261
+ "act_bits": 16,
262
+ "act_data_type": "float"
263
+ },
264
+ ".*self_attn.*": {
265
+ "bits": 16,
266
+ "data_type": "float",
267
+ "act_bits": 16,
268
+ "act_data_type": "float"
269
+ },
270
+ ".*model\\.layers\\.[0-9]\\.mlp\\.gate.*": {
271
+ "bits": 16,
272
+ "data_type": "float",
273
+ "act_bits": 16,
274
+ "act_data_type": "float"
275
+ }
276
+ }
277
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "is_local": false,
8
+ "legacy": false,
9
+ "local_files_only": false,
10
+ "max_length": 1024,
11
+ "model_max_length": 1000000000000000019884624838656,
12
+ "pad_token": "</s>",
13
+ "sp_model_kwargs": {},
14
+ "spaces_between_special_tokens": false,
15
+ "stride": 0,
16
+ "tokenizer_class": "TokenizersBackend",
17
+ "truncation_side": "right",
18
+ "truncation_strategy": "longest_first",
19
+ "unk_token": "<unk>",
20
+ "use_default_system_prompt": false
21
+ }