INC4AI commited on
Commit
248dc1e
·
verified ·
1 Parent(s): 2aee7c3

Upload quantized model MiniCPM5-1B-AutoRound-NVFP4-RTN

Browse files
README.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - openbmb/MiniCPM5-1B
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - quantized
7
+ - nvfp4
8
+ - autoround
9
+ - low-bit-open-llm-leaderboard
10
+ ---
11
+
12
+ # MiniCPM5-1B-AutoRound-NVFP4-RTN
13
+
14
+ ## Model Details
15
+
16
+ This model is a NVFP4 (NVIDIA FP4) quantization of [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) generated by [AutoRound](https://github.com/intel/auto-round). Please follow the license of the original model.
17
+
18
+ ## Quantization Details
19
+
20
+ | Attribute | Value |
21
+ |-----------|-------|
22
+ | Base Model | [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) |
23
+ | Quantization Tool | [AutoRound](https://github.com/intel/auto-round) |
24
+ | Quantization Scheme | NVFP4 |
25
+ | Original Size | 1089 MB |
26
+ | Quantized Size | 1363 MB |
27
+
28
+ ## Evaluation Results
29
+
30
+ | Task | Accuracy |
31
+ |------|----------|
32
+ | hellaswag | 0.3691 |
33
+ | mmlu | 0.4870 |
34
+ | mmlu_abstract_algebra | 0.3000 |
35
+ | mmlu_anatomy | 0.5407 |
36
+ | mmlu_astronomy | 0.5395 |
37
+ | mmlu_business_ethics | 0.4500 |
38
+ | mmlu_clinical_knowledge | 0.5434 |
39
+ | mmlu_college_biology | 0.5486 |
40
+ | mmlu_college_chemistry | 0.3800 |
41
+ | mmlu_college_computer_science | 0.4600 |
42
+ | mmlu_college_mathematics | 0.3800 |
43
+ | mmlu_college_medicine | 0.4855 |
44
+ | mmlu_college_physics | 0.3235 |
45
+ | mmlu_computer_security | 0.5700 |
46
+ | mmlu_conceptual_physics | 0.4000 |
47
+ | mmlu_econometrics | 0.2982 |
48
+ | mmlu_electrical_engineering | 0.5517 |
49
+ | mmlu_elementary_mathematics | 0.3519 |
50
+ | mmlu_formal_logic | 0.3413 |
51
+ | mmlu_global_facts | 0.2100 |
52
+ | mmlu_high_school_biology | 0.5806 |
53
+ | mmlu_high_school_chemistry | 0.4236 |
54
+ | mmlu_high_school_computer_science | 0.4500 |
55
+ | mmlu_high_school_european_history | 0.6061 |
56
+ | mmlu_high_school_geography | 0.5859 |
57
+ | mmlu_high_school_government_and_politics | 0.6321 |
58
+ | mmlu_high_school_macroeconomics | 0.4692 |
59
+ | mmlu_high_school_mathematics | 0.2926 |
60
+ | mmlu_high_school_microeconomics | 0.5042 |
61
+ | mmlu_high_school_physics | 0.2649 |
62
+ | mmlu_high_school_psychology | 0.6624 |
63
+ | mmlu_high_school_statistics | 0.3426 |
64
+ | mmlu_high_school_us_history | 0.5588 |
65
+ | mmlu_high_school_world_history | 0.6160 |
66
+ | mmlu_human_aging | 0.4888 |
67
+ | mmlu_human_sexuality | 0.6260 |
68
+ | mmlu_humanities | 0.4389 |
69
+ | mmlu_international_law | 0.7107 |
70
+ | mmlu_jurisprudence | 0.5926 |
71
+ | mmlu_logical_fallacies | 0.5828 |
72
+ | mmlu_machine_learning | 0.3839 |
73
+ | mmlu_management | 0.6408 |
74
+ | mmlu_marketing | 0.7521 |
75
+ | mmlu_medical_genetics | 0.6600 |
76
+ | mmlu_miscellaneous | 0.6564 |
77
+ | mmlu_moral_disputes | 0.5087 |
78
+ | mmlu_moral_scenarios | 0.2380 |
79
+ | mmlu_nutrition | 0.6307 |
80
+ | mmlu_other | 0.5555 |
81
+ | mmlu_philosophy | 0.5466 |
82
+ | mmlu_prehistory | 0.5278 |
83
+ | mmlu_professional_accounting | 0.3723 |
84
+ | mmlu_professional_law | 0.3677 |
85
+ | mmlu_professional_medicine | 0.4669 |
86
+ | mmlu_professional_psychology | 0.4935 |
87
+ | mmlu_public_relations | 0.5000 |
88
+ | mmlu_security_studies | 0.5714 |
89
+ | mmlu_social_sciences | 0.5583 |
90
+ | mmlu_sociology | 0.6766 |
91
+ | mmlu_stem | 0.4218 |
92
+ | mmlu_us_foreign_policy | 0.6700 |
93
+ | mmlu_virology | 0.4578 |
94
+ | mmlu_world_religions | 0.7193 |
95
+ | piqa | 0.6670 |
96
+
97
+ ## How to Use
98
+
99
+ ### HF Usage
100
+
101
+ **Step 1: Install [AutoRound](https://github.com/intel/auto-round)**
102
+
103
+ ```bash
104
+ pip install auto-round
105
+ ```
106
+
107
+ **Step 2: Load and run the quantized model**
108
+
109
+ ```python
110
+ from transformers import AutoModelForCausalLM, AutoTokenizer
111
+
112
+ model_name = "MiniCPM5-1B-AutoRound-NVFP4-RTN"
113
+
114
+ # load the tokenizer and the model
115
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
116
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
117
+
118
+ # prepare the model input
119
+ prompt = "Write a quick sort algorithm."
120
+ messages = [{"role": "user", "content": prompt}]
121
+ text = tokenizer.apply_chat_template(
122
+ messages,
123
+ tokenize=False,
124
+ add_generation_prompt=True,
125
+ )
126
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
127
+
128
+ # conduct text completion
129
+ generated_ids = model.generate(**model_inputs, max_new_tokens=512)
130
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
131
+
132
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
133
+ print("content:", content)
134
+ ```
135
+
136
+ ### VLLM Usage
137
+
138
+ ```bash
139
+ vllm serve MiniCPM5-1B-AutoRound-NVFP4-RTN \
140
+ --trust-remote-code \
141
+ --dtype bfloat16 \
142
+ --tensor_parallel_size 1
143
+ ```
144
+
145
+ If you encounter any issues, feel free to open an issue on the [AutoRound GitHub repo](https://github.com/intel/auto-round/issues) or provide feedback on the [Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard).
146
+
147
+ ## Ethical Considerations and Limitations
148
+
149
+ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
150
+ Therefore, before deploying any applications of the model, developers should perform safety testing.
151
+
152
+ ## Caveats and Recommendations
153
+
154
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
155
+ Here are a couple of useful links to learn more about Intel's AI software:
156
+
157
+ - [Intel Neural Compressor](https://github.com/intel/neural-compressor)
158
+ - [AutoRound](https://github.com/intel/auto-round)
159
+
160
+ ## Disclaimer
161
+
162
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
163
+
164
+ ## Cite
165
+
166
+ ```
167
+ @article{cheng2023optimize,
168
+ title={Optimize weight rounding via signed gradient descent for the quantization of llms},
169
+ author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi},
170
+ journal={arXiv preprint arXiv:2309.05516},
171
+ year={2023}
172
+ }
173
+ ```
174
+
175
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
176
+
177
+ ---
178
+
179
+ *This model is part of the [Intel Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard) initiative.*
chat_template.jinja ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token }}{%- if tools %}
2
+ {%- set tool_definitions %}
3
+ {{- "# Tools\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
4
+ {%- for tool in tools %}
5
+ {{- "\n" }}
6
+ {{- tool | tojson(ensure_ascii=False) }}
7
+ {%- endfor %}
8
+ {{- '\n</tools>\n\nTool usage guidelines:\n- You may call zero or more functions. If no function calls are needed, just answer normally and do not include any <function ... </function>.\n- When calling a function, return an XML object within <function ... </function> using:\n<function name="function-name"><param name="param-name">param-value</param></function>\n- param-value may be multi-line. If it contains <, & or newline characters, wrap it in a CDATA block: <param name="param-name"><![CDATA[...multi-line value...]]></param>' }}
9
+ {%- endset %}
10
+
11
+ {{- '<|im_start|>system\n' }}
12
+ {%- if messages[0].role == 'system' %}
13
+ {%- if '<tool_def_sep>' in messages[0].content %}
14
+ {{- messages[0].content.replace('<tool_def_sep>', tool_definitions) }}
15
+ {%- else %}
16
+ {{- messages[0].content + '\n\n' + tool_definitions }}
17
+ {%- endif %}
18
+ {%- else %}
19
+ {{- tool_definitions.lstrip() }}
20
+ {%- endif %}
21
+ {{- '<|im_end|>\n' }}
22
+ {%- else %}
23
+ {%- if messages[0].role == 'system' %}
24
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
25
+ {%- endif %}
26
+ {%- endif %}
27
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
28
+ {%- for message in messages[::-1] %}
29
+ {%- set index = (messages|length - 1) - loop.index0 %}
30
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
31
+ {%- set ns.multi_step_tool = false %}
32
+ {%- set ns.last_query_index = index %}
33
+ {%- endif %}
34
+ {%- endfor %}
35
+ {%- for message in messages %}
36
+ {%- if message.content is string %}
37
+ {%- set content = message.content %}
38
+ {%- else %}
39
+ {%- set content = '' %}
40
+ {%- endif %}
41
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
42
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
43
+ {%- elif message.role == "assistant" %}
44
+ {%- set reasoning_content = '' %}
45
+ {%- if message.reasoning_content is string %}
46
+ {%- set reasoning_content = message.reasoning_content %}
47
+ {%- else %}
48
+ {%- if '</think>' in content %}
49
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
50
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
51
+ {%- endif %}
52
+ {%- endif %}
53
+
54
+ {%- if message.tool_calls %}
55
+ {%- set content_parts = content.split('<tool_sep>') %}
56
+ {%- set processed_content = content_parts[0] %}
57
+ {%- set tool_calls_count = message.tool_calls|length %}
58
+ {%- set tool_sep_count = content_parts|length - 1 %}
59
+ {%- set min_count = [tool_calls_count, tool_sep_count]|min %}
60
+
61
+ {%- for i in range(1, content_parts|length) %}
62
+ {%- set tool_index = i - 1 %}
63
+ {%- if tool_index < tool_calls_count %}
64
+ {%- set tool_call = message.tool_calls[tool_index] %}
65
+ {%- if tool_call.function %}
66
+ {%- set tool_call = tool_call.function %}
67
+ {%- endif %}
68
+ {%- set single_tool_xml %}
69
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
70
+ {%- if tool_call.arguments %}
71
+ {%- set args_dict = tool_call.arguments %}
72
+ {%- for param_name, param_value in args_dict.items() %}
73
+ {{- '<param name="' ~ param_name ~ '">' }}
74
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
75
+ {{- '<![CDATA[' + param_value + ']]>' }}
76
+ {%- else %}
77
+ {{- param_value }}
78
+ {%- endif %}
79
+ {{- '</param>' }}
80
+ {%- endfor %}
81
+ {%- endif %}
82
+ {{- '</function>' }}
83
+ {%- endset %}
84
+ {%- set processed_content = processed_content + single_tool_xml + content_parts[i] %}
85
+ {%- else %}
86
+ {%- set processed_content = processed_content + content_parts[i] %}
87
+ {%- endif %}
88
+ {%- endfor %}
89
+
90
+ {%- if tool_calls_count > tool_sep_count %}
91
+ {%- for remaining_index in range(tool_sep_count, tool_calls_count) %}
92
+ {%- set tool_call = message.tool_calls[remaining_index] %}
93
+ {%- if tool_call.function %}
94
+ {%- set tool_call = tool_call.function %}
95
+ {%- endif %}
96
+ {%- set remaining_tool_xml %}
97
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
98
+ {%- if tool_call.arguments %}
99
+ {%- set args_dict = tool_call.arguments %}
100
+ {%- for param_name, param_value in args_dict.items() %}
101
+ {{- '<param name="' ~ param_name ~ '">' }}
102
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
103
+ {{- '<![CDATA[' + param_value + ']]>' }}
104
+ {%- else %}
105
+ {{- param_value }}
106
+ {%- endif %}
107
+ {{- '</param>' }}
108
+ {%- endfor %}
109
+ {%- endif %}
110
+ {{- '</function>' }}
111
+ {%- endset %}
112
+ {%- set processed_content = processed_content + remaining_tool_xml %}
113
+ {%- endfor %}
114
+ {%- endif %}
115
+
116
+ {%- set content = processed_content %}
117
+ {%- endif %}
118
+
119
+ {%- if loop.index0 > ns.last_query_index %}
120
+ {%- if reasoning_content %}
121
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
122
+ {%- else %}
123
+ {{- '<|im_start|>' + message.role + '\n' + content }}
124
+ {%- endif %}
125
+ {%- else %}
126
+ {{- '<|im_start|>' + message.role + '\n' + content }}
127
+ {%- endif %}
128
+
129
+ {%- if message.tool_calls and not has_tool_sep %}
130
+ {%- for tool_call in message.tool_calls %}
131
+ {%- if (loop.first and content) or (not loop.first) %}
132
+ {{- '\n' }}
133
+ {%- endif %}
134
+ {%- if tool_call.function %}
135
+ {%- set tool_call = tool_call.function %}
136
+ {%- endif %}
137
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
138
+ {%- if tool_call.arguments %}
139
+ {%- set args_dict = tool_call.arguments %}
140
+ {%- for param_name, param_value in args_dict.items() %}
141
+ {{- '<param name="' ~ param_name ~ '">' }}
142
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
143
+ {{- '<![CDATA[' + param_value + ']]>' }}
144
+ {%- else %}
145
+ {{- param_value }}
146
+ {%- endif %}
147
+ {{- '</param>' }}
148
+ {%- endfor %}
149
+ {%- endif %}
150
+ {{- '</function>' }}
151
+ {%- endfor %}
152
+ {%- endif %}
153
+ {{- '<|im_end|>\n' }}
154
+ {%- elif message.role == "tool" %}
155
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
156
+ {{- '<|im_start|>user' }}
157
+ {%- endif %}
158
+ {{- '\n<tool_response>\n' }}
159
+ {%- if message.content is string %}
160
+ {{- content }}
161
+ {%- else %}
162
+ {{- message.content | tojson(ensure_ascii=False) }}
163
+ {%- endif %}
164
+ {{- '\n</tool_response>' }}
165
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
166
+ {{- '<|im_end|>\n' }}
167
+ {%- endif %}
168
+ {%- endif %}
169
+ {%- endfor %}
170
+ {%- if add_generation_prompt %}
171
+ {{- '<|im_start|>assistant\n' }}
172
+ {%- if enable_thinking is defined %}
173
+ {%- if enable_thinking is false %}
174
+ {{- '<think>\n\n</think>\n\n' }}
175
+ {%- elif enable_thinking is true %}
176
+ {{- '<think>\n' }}
177
+ {%- endif %}
178
+ {%- endif %}
179
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,637 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": [
10
+ 1,
11
+ 130073
12
+ ],
13
+ "head_dim": 128,
14
+ "hidden_act": "silu",
15
+ "hidden_size": 1536,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 4608,
18
+ "max_position_embeddings": 131072,
19
+ "mlp_bias": false,
20
+ "model_type": "llama",
21
+ "num_attention_heads": 16,
22
+ "num_hidden_layers": 24,
23
+ "num_key_value_heads": 2,
24
+ "pad_token_id": 1,
25
+ "pretraining_tp": 1,
26
+ "quantization_config": {
27
+ "act_bits": 4,
28
+ "act_data_type": "nv_fp4_with_static_gs",
29
+ "act_dynamic": true,
30
+ "act_group_size": 16,
31
+ "act_sym": true,
32
+ "autoround_version": "0.13.0",
33
+ "bits": 4,
34
+ "block_name_to_quantize": "model.layers",
35
+ "data_type": "nv_fp",
36
+ "enable_quanted_input": false,
37
+ "extra_config": {
38
+ ".*self_attn.*": {
39
+ "act_bits": 16,
40
+ "act_data_type": "float",
41
+ "bits": 16,
42
+ "data_type": "float"
43
+ },
44
+ "model.layers.0.self_attn.k_proj": {
45
+ "act_bits": 16,
46
+ "act_data_type": "float",
47
+ "bits": 16,
48
+ "data_type": "float"
49
+ },
50
+ "model.layers.0.self_attn.o_proj": {
51
+ "act_bits": 16,
52
+ "act_data_type": "float",
53
+ "bits": 16,
54
+ "data_type": "float"
55
+ },
56
+ "model.layers.0.self_attn.q_proj": {
57
+ "act_bits": 16,
58
+ "act_data_type": "float",
59
+ "bits": 16,
60
+ "data_type": "float"
61
+ },
62
+ "model.layers.0.self_attn.v_proj": {
63
+ "act_bits": 16,
64
+ "act_data_type": "float",
65
+ "bits": 16,
66
+ "data_type": "float"
67
+ },
68
+ "model.layers.1.self_attn.k_proj": {
69
+ "act_bits": 16,
70
+ "act_data_type": "float",
71
+ "bits": 16,
72
+ "data_type": "float"
73
+ },
74
+ "model.layers.1.self_attn.o_proj": {
75
+ "act_bits": 16,
76
+ "act_data_type": "float",
77
+ "bits": 16,
78
+ "data_type": "float"
79
+ },
80
+ "model.layers.1.self_attn.q_proj": {
81
+ "act_bits": 16,
82
+ "act_data_type": "float",
83
+ "bits": 16,
84
+ "data_type": "float"
85
+ },
86
+ "model.layers.1.self_attn.v_proj": {
87
+ "act_bits": 16,
88
+ "act_data_type": "float",
89
+ "bits": 16,
90
+ "data_type": "float"
91
+ },
92
+ "model.layers.10.self_attn.k_proj": {
93
+ "act_bits": 16,
94
+ "act_data_type": "float",
95
+ "bits": 16,
96
+ "data_type": "float"
97
+ },
98
+ "model.layers.10.self_attn.o_proj": {
99
+ "act_bits": 16,
100
+ "act_data_type": "float",
101
+ "bits": 16,
102
+ "data_type": "float"
103
+ },
104
+ "model.layers.10.self_attn.q_proj": {
105
+ "act_bits": 16,
106
+ "act_data_type": "float",
107
+ "bits": 16,
108
+ "data_type": "float"
109
+ },
110
+ "model.layers.10.self_attn.v_proj": {
111
+ "act_bits": 16,
112
+ "act_data_type": "float",
113
+ "bits": 16,
114
+ "data_type": "float"
115
+ },
116
+ "model.layers.11.self_attn.k_proj": {
117
+ "act_bits": 16,
118
+ "act_data_type": "float",
119
+ "bits": 16,
120
+ "data_type": "float"
121
+ },
122
+ "model.layers.11.self_attn.o_proj": {
123
+ "act_bits": 16,
124
+ "act_data_type": "float",
125
+ "bits": 16,
126
+ "data_type": "float"
127
+ },
128
+ "model.layers.11.self_attn.q_proj": {
129
+ "act_bits": 16,
130
+ "act_data_type": "float",
131
+ "bits": 16,
132
+ "data_type": "float"
133
+ },
134
+ "model.layers.11.self_attn.v_proj": {
135
+ "act_bits": 16,
136
+ "act_data_type": "float",
137
+ "bits": 16,
138
+ "data_type": "float"
139
+ },
140
+ "model.layers.12.self_attn.k_proj": {
141
+ "act_bits": 16,
142
+ "act_data_type": "float",
143
+ "bits": 16,
144
+ "data_type": "float"
145
+ },
146
+ "model.layers.12.self_attn.o_proj": {
147
+ "act_bits": 16,
148
+ "act_data_type": "float",
149
+ "bits": 16,
150
+ "data_type": "float"
151
+ },
152
+ "model.layers.12.self_attn.q_proj": {
153
+ "act_bits": 16,
154
+ "act_data_type": "float",
155
+ "bits": 16,
156
+ "data_type": "float"
157
+ },
158
+ "model.layers.12.self_attn.v_proj": {
159
+ "act_bits": 16,
160
+ "act_data_type": "float",
161
+ "bits": 16,
162
+ "data_type": "float"
163
+ },
164
+ "model.layers.13.self_attn.k_proj": {
165
+ "act_bits": 16,
166
+ "act_data_type": "float",
167
+ "bits": 16,
168
+ "data_type": "float"
169
+ },
170
+ "model.layers.13.self_attn.o_proj": {
171
+ "act_bits": 16,
172
+ "act_data_type": "float",
173
+ "bits": 16,
174
+ "data_type": "float"
175
+ },
176
+ "model.layers.13.self_attn.q_proj": {
177
+ "act_bits": 16,
178
+ "act_data_type": "float",
179
+ "bits": 16,
180
+ "data_type": "float"
181
+ },
182
+ "model.layers.13.self_attn.v_proj": {
183
+ "act_bits": 16,
184
+ "act_data_type": "float",
185
+ "bits": 16,
186
+ "data_type": "float"
187
+ },
188
+ "model.layers.14.self_attn.k_proj": {
189
+ "act_bits": 16,
190
+ "act_data_type": "float",
191
+ "bits": 16,
192
+ "data_type": "float"
193
+ },
194
+ "model.layers.14.self_attn.o_proj": {
195
+ "act_bits": 16,
196
+ "act_data_type": "float",
197
+ "bits": 16,
198
+ "data_type": "float"
199
+ },
200
+ "model.layers.14.self_attn.q_proj": {
201
+ "act_bits": 16,
202
+ "act_data_type": "float",
203
+ "bits": 16,
204
+ "data_type": "float"
205
+ },
206
+ "model.layers.14.self_attn.v_proj": {
207
+ "act_bits": 16,
208
+ "act_data_type": "float",
209
+ "bits": 16,
210
+ "data_type": "float"
211
+ },
212
+ "model.layers.15.self_attn.k_proj": {
213
+ "act_bits": 16,
214
+ "act_data_type": "float",
215
+ "bits": 16,
216
+ "data_type": "float"
217
+ },
218
+ "model.layers.15.self_attn.o_proj": {
219
+ "act_bits": 16,
220
+ "act_data_type": "float",
221
+ "bits": 16,
222
+ "data_type": "float"
223
+ },
224
+ "model.layers.15.self_attn.q_proj": {
225
+ "act_bits": 16,
226
+ "act_data_type": "float",
227
+ "bits": 16,
228
+ "data_type": "float"
229
+ },
230
+ "model.layers.15.self_attn.v_proj": {
231
+ "act_bits": 16,
232
+ "act_data_type": "float",
233
+ "bits": 16,
234
+ "data_type": "float"
235
+ },
236
+ "model.layers.16.self_attn.k_proj": {
237
+ "act_bits": 16,
238
+ "act_data_type": "float",
239
+ "bits": 16,
240
+ "data_type": "float"
241
+ },
242
+ "model.layers.16.self_attn.o_proj": {
243
+ "act_bits": 16,
244
+ "act_data_type": "float",
245
+ "bits": 16,
246
+ "data_type": "float"
247
+ },
248
+ "model.layers.16.self_attn.q_proj": {
249
+ "act_bits": 16,
250
+ "act_data_type": "float",
251
+ "bits": 16,
252
+ "data_type": "float"
253
+ },
254
+ "model.layers.16.self_attn.v_proj": {
255
+ "act_bits": 16,
256
+ "act_data_type": "float",
257
+ "bits": 16,
258
+ "data_type": "float"
259
+ },
260
+ "model.layers.17.self_attn.k_proj": {
261
+ "act_bits": 16,
262
+ "act_data_type": "float",
263
+ "bits": 16,
264
+ "data_type": "float"
265
+ },
266
+ "model.layers.17.self_attn.o_proj": {
267
+ "act_bits": 16,
268
+ "act_data_type": "float",
269
+ "bits": 16,
270
+ "data_type": "float"
271
+ },
272
+ "model.layers.17.self_attn.q_proj": {
273
+ "act_bits": 16,
274
+ "act_data_type": "float",
275
+ "bits": 16,
276
+ "data_type": "float"
277
+ },
278
+ "model.layers.17.self_attn.v_proj": {
279
+ "act_bits": 16,
280
+ "act_data_type": "float",
281
+ "bits": 16,
282
+ "data_type": "float"
283
+ },
284
+ "model.layers.18.self_attn.k_proj": {
285
+ "act_bits": 16,
286
+ "act_data_type": "float",
287
+ "bits": 16,
288
+ "data_type": "float"
289
+ },
290
+ "model.layers.18.self_attn.o_proj": {
291
+ "act_bits": 16,
292
+ "act_data_type": "float",
293
+ "bits": 16,
294
+ "data_type": "float"
295
+ },
296
+ "model.layers.18.self_attn.q_proj": {
297
+ "act_bits": 16,
298
+ "act_data_type": "float",
299
+ "bits": 16,
300
+ "data_type": "float"
301
+ },
302
+ "model.layers.18.self_attn.v_proj": {
303
+ "act_bits": 16,
304
+ "act_data_type": "float",
305
+ "bits": 16,
306
+ "data_type": "float"
307
+ },
308
+ "model.layers.19.self_attn.k_proj": {
309
+ "act_bits": 16,
310
+ "act_data_type": "float",
311
+ "bits": 16,
312
+ "data_type": "float"
313
+ },
314
+ "model.layers.19.self_attn.o_proj": {
315
+ "act_bits": 16,
316
+ "act_data_type": "float",
317
+ "bits": 16,
318
+ "data_type": "float"
319
+ },
320
+ "model.layers.19.self_attn.q_proj": {
321
+ "act_bits": 16,
322
+ "act_data_type": "float",
323
+ "bits": 16,
324
+ "data_type": "float"
325
+ },
326
+ "model.layers.19.self_attn.v_proj": {
327
+ "act_bits": 16,
328
+ "act_data_type": "float",
329
+ "bits": 16,
330
+ "data_type": "float"
331
+ },
332
+ "model.layers.2.self_attn.k_proj": {
333
+ "act_bits": 16,
334
+ "act_data_type": "float",
335
+ "bits": 16,
336
+ "data_type": "float"
337
+ },
338
+ "model.layers.2.self_attn.o_proj": {
339
+ "act_bits": 16,
340
+ "act_data_type": "float",
341
+ "bits": 16,
342
+ "data_type": "float"
343
+ },
344
+ "model.layers.2.self_attn.q_proj": {
345
+ "act_bits": 16,
346
+ "act_data_type": "float",
347
+ "bits": 16,
348
+ "data_type": "float"
349
+ },
350
+ "model.layers.2.self_attn.v_proj": {
351
+ "act_bits": 16,
352
+ "act_data_type": "float",
353
+ "bits": 16,
354
+ "data_type": "float"
355
+ },
356
+ "model.layers.20.self_attn.k_proj": {
357
+ "act_bits": 16,
358
+ "act_data_type": "float",
359
+ "bits": 16,
360
+ "data_type": "float"
361
+ },
362
+ "model.layers.20.self_attn.o_proj": {
363
+ "act_bits": 16,
364
+ "act_data_type": "float",
365
+ "bits": 16,
366
+ "data_type": "float"
367
+ },
368
+ "model.layers.20.self_attn.q_proj": {
369
+ "act_bits": 16,
370
+ "act_data_type": "float",
371
+ "bits": 16,
372
+ "data_type": "float"
373
+ },
374
+ "model.layers.20.self_attn.v_proj": {
375
+ "act_bits": 16,
376
+ "act_data_type": "float",
377
+ "bits": 16,
378
+ "data_type": "float"
379
+ },
380
+ "model.layers.21.self_attn.k_proj": {
381
+ "act_bits": 16,
382
+ "act_data_type": "float",
383
+ "bits": 16,
384
+ "data_type": "float"
385
+ },
386
+ "model.layers.21.self_attn.o_proj": {
387
+ "act_bits": 16,
388
+ "act_data_type": "float",
389
+ "bits": 16,
390
+ "data_type": "float"
391
+ },
392
+ "model.layers.21.self_attn.q_proj": {
393
+ "act_bits": 16,
394
+ "act_data_type": "float",
395
+ "bits": 16,
396
+ "data_type": "float"
397
+ },
398
+ "model.layers.21.self_attn.v_proj": {
399
+ "act_bits": 16,
400
+ "act_data_type": "float",
401
+ "bits": 16,
402
+ "data_type": "float"
403
+ },
404
+ "model.layers.22.self_attn.k_proj": {
405
+ "act_bits": 16,
406
+ "act_data_type": "float",
407
+ "bits": 16,
408
+ "data_type": "float"
409
+ },
410
+ "model.layers.22.self_attn.o_proj": {
411
+ "act_bits": 16,
412
+ "act_data_type": "float",
413
+ "bits": 16,
414
+ "data_type": "float"
415
+ },
416
+ "model.layers.22.self_attn.q_proj": {
417
+ "act_bits": 16,
418
+ "act_data_type": "float",
419
+ "bits": 16,
420
+ "data_type": "float"
421
+ },
422
+ "model.layers.22.self_attn.v_proj": {
423
+ "act_bits": 16,
424
+ "act_data_type": "float",
425
+ "bits": 16,
426
+ "data_type": "float"
427
+ },
428
+ "model.layers.23.self_attn.k_proj": {
429
+ "act_bits": 16,
430
+ "act_data_type": "float",
431
+ "bits": 16,
432
+ "data_type": "float"
433
+ },
434
+ "model.layers.23.self_attn.o_proj": {
435
+ "act_bits": 16,
436
+ "act_data_type": "float",
437
+ "bits": 16,
438
+ "data_type": "float"
439
+ },
440
+ "model.layers.23.self_attn.q_proj": {
441
+ "act_bits": 16,
442
+ "act_data_type": "float",
443
+ "bits": 16,
444
+ "data_type": "float"
445
+ },
446
+ "model.layers.23.self_attn.v_proj": {
447
+ "act_bits": 16,
448
+ "act_data_type": "float",
449
+ "bits": 16,
450
+ "data_type": "float"
451
+ },
452
+ "model.layers.3.self_attn.k_proj": {
453
+ "act_bits": 16,
454
+ "act_data_type": "float",
455
+ "bits": 16,
456
+ "data_type": "float"
457
+ },
458
+ "model.layers.3.self_attn.o_proj": {
459
+ "act_bits": 16,
460
+ "act_data_type": "float",
461
+ "bits": 16,
462
+ "data_type": "float"
463
+ },
464
+ "model.layers.3.self_attn.q_proj": {
465
+ "act_bits": 16,
466
+ "act_data_type": "float",
467
+ "bits": 16,
468
+ "data_type": "float"
469
+ },
470
+ "model.layers.3.self_attn.v_proj": {
471
+ "act_bits": 16,
472
+ "act_data_type": "float",
473
+ "bits": 16,
474
+ "data_type": "float"
475
+ },
476
+ "model.layers.4.self_attn.k_proj": {
477
+ "act_bits": 16,
478
+ "act_data_type": "float",
479
+ "bits": 16,
480
+ "data_type": "float"
481
+ },
482
+ "model.layers.4.self_attn.o_proj": {
483
+ "act_bits": 16,
484
+ "act_data_type": "float",
485
+ "bits": 16,
486
+ "data_type": "float"
487
+ },
488
+ "model.layers.4.self_attn.q_proj": {
489
+ "act_bits": 16,
490
+ "act_data_type": "float",
491
+ "bits": 16,
492
+ "data_type": "float"
493
+ },
494
+ "model.layers.4.self_attn.v_proj": {
495
+ "act_bits": 16,
496
+ "act_data_type": "float",
497
+ "bits": 16,
498
+ "data_type": "float"
499
+ },
500
+ "model.layers.5.self_attn.k_proj": {
501
+ "act_bits": 16,
502
+ "act_data_type": "float",
503
+ "bits": 16,
504
+ "data_type": "float"
505
+ },
506
+ "model.layers.5.self_attn.o_proj": {
507
+ "act_bits": 16,
508
+ "act_data_type": "float",
509
+ "bits": 16,
510
+ "data_type": "float"
511
+ },
512
+ "model.layers.5.self_attn.q_proj": {
513
+ "act_bits": 16,
514
+ "act_data_type": "float",
515
+ "bits": 16,
516
+ "data_type": "float"
517
+ },
518
+ "model.layers.5.self_attn.v_proj": {
519
+ "act_bits": 16,
520
+ "act_data_type": "float",
521
+ "bits": 16,
522
+ "data_type": "float"
523
+ },
524
+ "model.layers.6.self_attn.k_proj": {
525
+ "act_bits": 16,
526
+ "act_data_type": "float",
527
+ "bits": 16,
528
+ "data_type": "float"
529
+ },
530
+ "model.layers.6.self_attn.o_proj": {
531
+ "act_bits": 16,
532
+ "act_data_type": "float",
533
+ "bits": 16,
534
+ "data_type": "float"
535
+ },
536
+ "model.layers.6.self_attn.q_proj": {
537
+ "act_bits": 16,
538
+ "act_data_type": "float",
539
+ "bits": 16,
540
+ "data_type": "float"
541
+ },
542
+ "model.layers.6.self_attn.v_proj": {
543
+ "act_bits": 16,
544
+ "act_data_type": "float",
545
+ "bits": 16,
546
+ "data_type": "float"
547
+ },
548
+ "model.layers.7.self_attn.k_proj": {
549
+ "act_bits": 16,
550
+ "act_data_type": "float",
551
+ "bits": 16,
552
+ "data_type": "float"
553
+ },
554
+ "model.layers.7.self_attn.o_proj": {
555
+ "act_bits": 16,
556
+ "act_data_type": "float",
557
+ "bits": 16,
558
+ "data_type": "float"
559
+ },
560
+ "model.layers.7.self_attn.q_proj": {
561
+ "act_bits": 16,
562
+ "act_data_type": "float",
563
+ "bits": 16,
564
+ "data_type": "float"
565
+ },
566
+ "model.layers.7.self_attn.v_proj": {
567
+ "act_bits": 16,
568
+ "act_data_type": "float",
569
+ "bits": 16,
570
+ "data_type": "float"
571
+ },
572
+ "model.layers.8.self_attn.k_proj": {
573
+ "act_bits": 16,
574
+ "act_data_type": "float",
575
+ "bits": 16,
576
+ "data_type": "float"
577
+ },
578
+ "model.layers.8.self_attn.o_proj": {
579
+ "act_bits": 16,
580
+ "act_data_type": "float",
581
+ "bits": 16,
582
+ "data_type": "float"
583
+ },
584
+ "model.layers.8.self_attn.q_proj": {
585
+ "act_bits": 16,
586
+ "act_data_type": "float",
587
+ "bits": 16,
588
+ "data_type": "float"
589
+ },
590
+ "model.layers.8.self_attn.v_proj": {
591
+ "act_bits": 16,
592
+ "act_data_type": "float",
593
+ "bits": 16,
594
+ "data_type": "float"
595
+ },
596
+ "model.layers.9.self_attn.k_proj": {
597
+ "act_bits": 16,
598
+ "act_data_type": "float",
599
+ "bits": 16,
600
+ "data_type": "float"
601
+ },
602
+ "model.layers.9.self_attn.o_proj": {
603
+ "act_bits": 16,
604
+ "act_data_type": "float",
605
+ "bits": 16,
606
+ "data_type": "float"
607
+ },
608
+ "model.layers.9.self_attn.q_proj": {
609
+ "act_bits": 16,
610
+ "act_data_type": "float",
611
+ "bits": 16,
612
+ "data_type": "float"
613
+ },
614
+ "model.layers.9.self_attn.v_proj": {
615
+ "act_bits": 16,
616
+ "act_data_type": "float",
617
+ "bits": 16,
618
+ "data_type": "float"
619
+ }
620
+ },
621
+ "group_size": 16,
622
+ "iters": 0,
623
+ "low_gpu_mem_usage": true,
624
+ "packing_format": "auto_round:llm_compressor",
625
+ "quant_method": "auto-round",
626
+ "sym": true
627
+ },
628
+ "rms_norm_eps": 1e-06,
629
+ "rope_parameters": {
630
+ "rope_theta": 5000000,
631
+ "rope_type": "default"
632
+ },
633
+ "tie_word_embeddings": false,
634
+ "transformers_version": "5.9.0",
635
+ "use_cache": true,
636
+ "vocab_size": 130560
637
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 1,
7
+ 130073
8
+ ],
9
+ "pad_token_id": 1,
10
+ "temperature": 0.9,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.9.0"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a35ec15e47e75f775d396182823bc3d013c209a4b5ef74d4079cbb67aae2a898
3
+ size 1428753888
quantization_config.json ADDED
@@ -0,0 +1,602 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "act_bits": 4,
4
+ "data_type": "nv_fp",
5
+ "act_data_type": "nv_fp4_with_static_gs",
6
+ "group_size": 16,
7
+ "act_group_size": 16,
8
+ "sym": true,
9
+ "act_sym": true,
10
+ "act_dynamic": true,
11
+ "enable_quanted_input": false,
12
+ "iters": 0,
13
+ "low_gpu_mem_usage": true,
14
+ "autoround_version": "0.13.0",
15
+ "block_name_to_quantize": "model.layers",
16
+ "quant_method": "auto-round",
17
+ "packing_format": "auto_round:llm_compressor",
18
+ "extra_config": {
19
+ "model.layers.0.self_attn.q_proj": {
20
+ "bits": 16,
21
+ "data_type": "float",
22
+ "act_bits": 16,
23
+ "act_data_type": "float"
24
+ },
25
+ "model.layers.0.self_attn.k_proj": {
26
+ "bits": 16,
27
+ "data_type": "float",
28
+ "act_bits": 16,
29
+ "act_data_type": "float"
30
+ },
31
+ "model.layers.0.self_attn.v_proj": {
32
+ "bits": 16,
33
+ "data_type": "float",
34
+ "act_bits": 16,
35
+ "act_data_type": "float"
36
+ },
37
+ "model.layers.0.self_attn.o_proj": {
38
+ "bits": 16,
39
+ "data_type": "float",
40
+ "act_bits": 16,
41
+ "act_data_type": "float"
42
+ },
43
+ "model.layers.1.self_attn.q_proj": {
44
+ "bits": 16,
45
+ "data_type": "float",
46
+ "act_bits": 16,
47
+ "act_data_type": "float"
48
+ },
49
+ "model.layers.1.self_attn.k_proj": {
50
+ "bits": 16,
51
+ "data_type": "float",
52
+ "act_bits": 16,
53
+ "act_data_type": "float"
54
+ },
55
+ "model.layers.1.self_attn.v_proj": {
56
+ "bits": 16,
57
+ "data_type": "float",
58
+ "act_bits": 16,
59
+ "act_data_type": "float"
60
+ },
61
+ "model.layers.1.self_attn.o_proj": {
62
+ "bits": 16,
63
+ "data_type": "float",
64
+ "act_bits": 16,
65
+ "act_data_type": "float"
66
+ },
67
+ "model.layers.2.self_attn.q_proj": {
68
+ "bits": 16,
69
+ "data_type": "float",
70
+ "act_bits": 16,
71
+ "act_data_type": "float"
72
+ },
73
+ "model.layers.2.self_attn.k_proj": {
74
+ "bits": 16,
75
+ "data_type": "float",
76
+ "act_bits": 16,
77
+ "act_data_type": "float"
78
+ },
79
+ "model.layers.2.self_attn.v_proj": {
80
+ "bits": 16,
81
+ "data_type": "float",
82
+ "act_bits": 16,
83
+ "act_data_type": "float"
84
+ },
85
+ "model.layers.2.self_attn.o_proj": {
86
+ "bits": 16,
87
+ "data_type": "float",
88
+ "act_bits": 16,
89
+ "act_data_type": "float"
90
+ },
91
+ "model.layers.3.self_attn.q_proj": {
92
+ "bits": 16,
93
+ "data_type": "float",
94
+ "act_bits": 16,
95
+ "act_data_type": "float"
96
+ },
97
+ "model.layers.3.self_attn.k_proj": {
98
+ "bits": 16,
99
+ "data_type": "float",
100
+ "act_bits": 16,
101
+ "act_data_type": "float"
102
+ },
103
+ "model.layers.3.self_attn.v_proj": {
104
+ "bits": 16,
105
+ "data_type": "float",
106
+ "act_bits": 16,
107
+ "act_data_type": "float"
108
+ },
109
+ "model.layers.3.self_attn.o_proj": {
110
+ "bits": 16,
111
+ "data_type": "float",
112
+ "act_bits": 16,
113
+ "act_data_type": "float"
114
+ },
115
+ "model.layers.4.self_attn.q_proj": {
116
+ "bits": 16,
117
+ "data_type": "float",
118
+ "act_bits": 16,
119
+ "act_data_type": "float"
120
+ },
121
+ "model.layers.4.self_attn.k_proj": {
122
+ "bits": 16,
123
+ "data_type": "float",
124
+ "act_bits": 16,
125
+ "act_data_type": "float"
126
+ },
127
+ "model.layers.4.self_attn.v_proj": {
128
+ "bits": 16,
129
+ "data_type": "float",
130
+ "act_bits": 16,
131
+ "act_data_type": "float"
132
+ },
133
+ "model.layers.4.self_attn.o_proj": {
134
+ "bits": 16,
135
+ "data_type": "float",
136
+ "act_bits": 16,
137
+ "act_data_type": "float"
138
+ },
139
+ "model.layers.5.self_attn.q_proj": {
140
+ "bits": 16,
141
+ "data_type": "float",
142
+ "act_bits": 16,
143
+ "act_data_type": "float"
144
+ },
145
+ "model.layers.5.self_attn.k_proj": {
146
+ "bits": 16,
147
+ "data_type": "float",
148
+ "act_bits": 16,
149
+ "act_data_type": "float"
150
+ },
151
+ "model.layers.5.self_attn.v_proj": {
152
+ "bits": 16,
153
+ "data_type": "float",
154
+ "act_bits": 16,
155
+ "act_data_type": "float"
156
+ },
157
+ "model.layers.5.self_attn.o_proj": {
158
+ "bits": 16,
159
+ "data_type": "float",
160
+ "act_bits": 16,
161
+ "act_data_type": "float"
162
+ },
163
+ "model.layers.6.self_attn.q_proj": {
164
+ "bits": 16,
165
+ "data_type": "float",
166
+ "act_bits": 16,
167
+ "act_data_type": "float"
168
+ },
169
+ "model.layers.6.self_attn.k_proj": {
170
+ "bits": 16,
171
+ "data_type": "float",
172
+ "act_bits": 16,
173
+ "act_data_type": "float"
174
+ },
175
+ "model.layers.6.self_attn.v_proj": {
176
+ "bits": 16,
177
+ "data_type": "float",
178
+ "act_bits": 16,
179
+ "act_data_type": "float"
180
+ },
181
+ "model.layers.6.self_attn.o_proj": {
182
+ "bits": 16,
183
+ "data_type": "float",
184
+ "act_bits": 16,
185
+ "act_data_type": "float"
186
+ },
187
+ "model.layers.7.self_attn.q_proj": {
188
+ "bits": 16,
189
+ "data_type": "float",
190
+ "act_bits": 16,
191
+ "act_data_type": "float"
192
+ },
193
+ "model.layers.7.self_attn.k_proj": {
194
+ "bits": 16,
195
+ "data_type": "float",
196
+ "act_bits": 16,
197
+ "act_data_type": "float"
198
+ },
199
+ "model.layers.7.self_attn.v_proj": {
200
+ "bits": 16,
201
+ "data_type": "float",
202
+ "act_bits": 16,
203
+ "act_data_type": "float"
204
+ },
205
+ "model.layers.7.self_attn.o_proj": {
206
+ "bits": 16,
207
+ "data_type": "float",
208
+ "act_bits": 16,
209
+ "act_data_type": "float"
210
+ },
211
+ "model.layers.8.self_attn.q_proj": {
212
+ "bits": 16,
213
+ "data_type": "float",
214
+ "act_bits": 16,
215
+ "act_data_type": "float"
216
+ },
217
+ "model.layers.8.self_attn.k_proj": {
218
+ "bits": 16,
219
+ "data_type": "float",
220
+ "act_bits": 16,
221
+ "act_data_type": "float"
222
+ },
223
+ "model.layers.8.self_attn.v_proj": {
224
+ "bits": 16,
225
+ "data_type": "float",
226
+ "act_bits": 16,
227
+ "act_data_type": "float"
228
+ },
229
+ "model.layers.8.self_attn.o_proj": {
230
+ "bits": 16,
231
+ "data_type": "float",
232
+ "act_bits": 16,
233
+ "act_data_type": "float"
234
+ },
235
+ "model.layers.9.self_attn.q_proj": {
236
+ "bits": 16,
237
+ "data_type": "float",
238
+ "act_bits": 16,
239
+ "act_data_type": "float"
240
+ },
241
+ "model.layers.9.self_attn.k_proj": {
242
+ "bits": 16,
243
+ "data_type": "float",
244
+ "act_bits": 16,
245
+ "act_data_type": "float"
246
+ },
247
+ "model.layers.9.self_attn.v_proj": {
248
+ "bits": 16,
249
+ "data_type": "float",
250
+ "act_bits": 16,
251
+ "act_data_type": "float"
252
+ },
253
+ "model.layers.9.self_attn.o_proj": {
254
+ "bits": 16,
255
+ "data_type": "float",
256
+ "act_bits": 16,
257
+ "act_data_type": "float"
258
+ },
259
+ "model.layers.10.self_attn.q_proj": {
260
+ "bits": 16,
261
+ "data_type": "float",
262
+ "act_bits": 16,
263
+ "act_data_type": "float"
264
+ },
265
+ "model.layers.10.self_attn.k_proj": {
266
+ "bits": 16,
267
+ "data_type": "float",
268
+ "act_bits": 16,
269
+ "act_data_type": "float"
270
+ },
271
+ "model.layers.10.self_attn.v_proj": {
272
+ "bits": 16,
273
+ "data_type": "float",
274
+ "act_bits": 16,
275
+ "act_data_type": "float"
276
+ },
277
+ "model.layers.10.self_attn.o_proj": {
278
+ "bits": 16,
279
+ "data_type": "float",
280
+ "act_bits": 16,
281
+ "act_data_type": "float"
282
+ },
283
+ "model.layers.11.self_attn.q_proj": {
284
+ "bits": 16,
285
+ "data_type": "float",
286
+ "act_bits": 16,
287
+ "act_data_type": "float"
288
+ },
289
+ "model.layers.11.self_attn.k_proj": {
290
+ "bits": 16,
291
+ "data_type": "float",
292
+ "act_bits": 16,
293
+ "act_data_type": "float"
294
+ },
295
+ "model.layers.11.self_attn.v_proj": {
296
+ "bits": 16,
297
+ "data_type": "float",
298
+ "act_bits": 16,
299
+ "act_data_type": "float"
300
+ },
301
+ "model.layers.11.self_attn.o_proj": {
302
+ "bits": 16,
303
+ "data_type": "float",
304
+ "act_bits": 16,
305
+ "act_data_type": "float"
306
+ },
307
+ "model.layers.12.self_attn.q_proj": {
308
+ "bits": 16,
309
+ "data_type": "float",
310
+ "act_bits": 16,
311
+ "act_data_type": "float"
312
+ },
313
+ "model.layers.12.self_attn.k_proj": {
314
+ "bits": 16,
315
+ "data_type": "float",
316
+ "act_bits": 16,
317
+ "act_data_type": "float"
318
+ },
319
+ "model.layers.12.self_attn.v_proj": {
320
+ "bits": 16,
321
+ "data_type": "float",
322
+ "act_bits": 16,
323
+ "act_data_type": "float"
324
+ },
325
+ "model.layers.12.self_attn.o_proj": {
326
+ "bits": 16,
327
+ "data_type": "float",
328
+ "act_bits": 16,
329
+ "act_data_type": "float"
330
+ },
331
+ "model.layers.13.self_attn.q_proj": {
332
+ "bits": 16,
333
+ "data_type": "float",
334
+ "act_bits": 16,
335
+ "act_data_type": "float"
336
+ },
337
+ "model.layers.13.self_attn.k_proj": {
338
+ "bits": 16,
339
+ "data_type": "float",
340
+ "act_bits": 16,
341
+ "act_data_type": "float"
342
+ },
343
+ "model.layers.13.self_attn.v_proj": {
344
+ "bits": 16,
345
+ "data_type": "float",
346
+ "act_bits": 16,
347
+ "act_data_type": "float"
348
+ },
349
+ "model.layers.13.self_attn.o_proj": {
350
+ "bits": 16,
351
+ "data_type": "float",
352
+ "act_bits": 16,
353
+ "act_data_type": "float"
354
+ },
355
+ "model.layers.14.self_attn.q_proj": {
356
+ "bits": 16,
357
+ "data_type": "float",
358
+ "act_bits": 16,
359
+ "act_data_type": "float"
360
+ },
361
+ "model.layers.14.self_attn.k_proj": {
362
+ "bits": 16,
363
+ "data_type": "float",
364
+ "act_bits": 16,
365
+ "act_data_type": "float"
366
+ },
367
+ "model.layers.14.self_attn.v_proj": {
368
+ "bits": 16,
369
+ "data_type": "float",
370
+ "act_bits": 16,
371
+ "act_data_type": "float"
372
+ },
373
+ "model.layers.14.self_attn.o_proj": {
374
+ "bits": 16,
375
+ "data_type": "float",
376
+ "act_bits": 16,
377
+ "act_data_type": "float"
378
+ },
379
+ "model.layers.15.self_attn.q_proj": {
380
+ "bits": 16,
381
+ "data_type": "float",
382
+ "act_bits": 16,
383
+ "act_data_type": "float"
384
+ },
385
+ "model.layers.15.self_attn.k_proj": {
386
+ "bits": 16,
387
+ "data_type": "float",
388
+ "act_bits": 16,
389
+ "act_data_type": "float"
390
+ },
391
+ "model.layers.15.self_attn.v_proj": {
392
+ "bits": 16,
393
+ "data_type": "float",
394
+ "act_bits": 16,
395
+ "act_data_type": "float"
396
+ },
397
+ "model.layers.15.self_attn.o_proj": {
398
+ "bits": 16,
399
+ "data_type": "float",
400
+ "act_bits": 16,
401
+ "act_data_type": "float"
402
+ },
403
+ "model.layers.16.self_attn.q_proj": {
404
+ "bits": 16,
405
+ "data_type": "float",
406
+ "act_bits": 16,
407
+ "act_data_type": "float"
408
+ },
409
+ "model.layers.16.self_attn.k_proj": {
410
+ "bits": 16,
411
+ "data_type": "float",
412
+ "act_bits": 16,
413
+ "act_data_type": "float"
414
+ },
415
+ "model.layers.16.self_attn.v_proj": {
416
+ "bits": 16,
417
+ "data_type": "float",
418
+ "act_bits": 16,
419
+ "act_data_type": "float"
420
+ },
421
+ "model.layers.16.self_attn.o_proj": {
422
+ "bits": 16,
423
+ "data_type": "float",
424
+ "act_bits": 16,
425
+ "act_data_type": "float"
426
+ },
427
+ "model.layers.17.self_attn.q_proj": {
428
+ "bits": 16,
429
+ "data_type": "float",
430
+ "act_bits": 16,
431
+ "act_data_type": "float"
432
+ },
433
+ "model.layers.17.self_attn.k_proj": {
434
+ "bits": 16,
435
+ "data_type": "float",
436
+ "act_bits": 16,
437
+ "act_data_type": "float"
438
+ },
439
+ "model.layers.17.self_attn.v_proj": {
440
+ "bits": 16,
441
+ "data_type": "float",
442
+ "act_bits": 16,
443
+ "act_data_type": "float"
444
+ },
445
+ "model.layers.17.self_attn.o_proj": {
446
+ "bits": 16,
447
+ "data_type": "float",
448
+ "act_bits": 16,
449
+ "act_data_type": "float"
450
+ },
451
+ "model.layers.18.self_attn.q_proj": {
452
+ "bits": 16,
453
+ "data_type": "float",
454
+ "act_bits": 16,
455
+ "act_data_type": "float"
456
+ },
457
+ "model.layers.18.self_attn.k_proj": {
458
+ "bits": 16,
459
+ "data_type": "float",
460
+ "act_bits": 16,
461
+ "act_data_type": "float"
462
+ },
463
+ "model.layers.18.self_attn.v_proj": {
464
+ "bits": 16,
465
+ "data_type": "float",
466
+ "act_bits": 16,
467
+ "act_data_type": "float"
468
+ },
469
+ "model.layers.18.self_attn.o_proj": {
470
+ "bits": 16,
471
+ "data_type": "float",
472
+ "act_bits": 16,
473
+ "act_data_type": "float"
474
+ },
475
+ "model.layers.19.self_attn.q_proj": {
476
+ "bits": 16,
477
+ "data_type": "float",
478
+ "act_bits": 16,
479
+ "act_data_type": "float"
480
+ },
481
+ "model.layers.19.self_attn.k_proj": {
482
+ "bits": 16,
483
+ "data_type": "float",
484
+ "act_bits": 16,
485
+ "act_data_type": "float"
486
+ },
487
+ "model.layers.19.self_attn.v_proj": {
488
+ "bits": 16,
489
+ "data_type": "float",
490
+ "act_bits": 16,
491
+ "act_data_type": "float"
492
+ },
493
+ "model.layers.19.self_attn.o_proj": {
494
+ "bits": 16,
495
+ "data_type": "float",
496
+ "act_bits": 16,
497
+ "act_data_type": "float"
498
+ },
499
+ "model.layers.20.self_attn.q_proj": {
500
+ "bits": 16,
501
+ "data_type": "float",
502
+ "act_bits": 16,
503
+ "act_data_type": "float"
504
+ },
505
+ "model.layers.20.self_attn.k_proj": {
506
+ "bits": 16,
507
+ "data_type": "float",
508
+ "act_bits": 16,
509
+ "act_data_type": "float"
510
+ },
511
+ "model.layers.20.self_attn.v_proj": {
512
+ "bits": 16,
513
+ "data_type": "float",
514
+ "act_bits": 16,
515
+ "act_data_type": "float"
516
+ },
517
+ "model.layers.20.self_attn.o_proj": {
518
+ "bits": 16,
519
+ "data_type": "float",
520
+ "act_bits": 16,
521
+ "act_data_type": "float"
522
+ },
523
+ "model.layers.21.self_attn.q_proj": {
524
+ "bits": 16,
525
+ "data_type": "float",
526
+ "act_bits": 16,
527
+ "act_data_type": "float"
528
+ },
529
+ "model.layers.21.self_attn.k_proj": {
530
+ "bits": 16,
531
+ "data_type": "float",
532
+ "act_bits": 16,
533
+ "act_data_type": "float"
534
+ },
535
+ "model.layers.21.self_attn.v_proj": {
536
+ "bits": 16,
537
+ "data_type": "float",
538
+ "act_bits": 16,
539
+ "act_data_type": "float"
540
+ },
541
+ "model.layers.21.self_attn.o_proj": {
542
+ "bits": 16,
543
+ "data_type": "float",
544
+ "act_bits": 16,
545
+ "act_data_type": "float"
546
+ },
547
+ "model.layers.22.self_attn.q_proj": {
548
+ "bits": 16,
549
+ "data_type": "float",
550
+ "act_bits": 16,
551
+ "act_data_type": "float"
552
+ },
553
+ "model.layers.22.self_attn.k_proj": {
554
+ "bits": 16,
555
+ "data_type": "float",
556
+ "act_bits": 16,
557
+ "act_data_type": "float"
558
+ },
559
+ "model.layers.22.self_attn.v_proj": {
560
+ "bits": 16,
561
+ "data_type": "float",
562
+ "act_bits": 16,
563
+ "act_data_type": "float"
564
+ },
565
+ "model.layers.22.self_attn.o_proj": {
566
+ "bits": 16,
567
+ "data_type": "float",
568
+ "act_bits": 16,
569
+ "act_data_type": "float"
570
+ },
571
+ "model.layers.23.self_attn.q_proj": {
572
+ "bits": 16,
573
+ "data_type": "float",
574
+ "act_bits": 16,
575
+ "act_data_type": "float"
576
+ },
577
+ "model.layers.23.self_attn.k_proj": {
578
+ "bits": 16,
579
+ "data_type": "float",
580
+ "act_bits": 16,
581
+ "act_data_type": "float"
582
+ },
583
+ "model.layers.23.self_attn.v_proj": {
584
+ "bits": 16,
585
+ "data_type": "float",
586
+ "act_bits": 16,
587
+ "act_data_type": "float"
588
+ },
589
+ "model.layers.23.self_attn.o_proj": {
590
+ "bits": 16,
591
+ "data_type": "float",
592
+ "act_bits": 16,
593
+ "act_data_type": "float"
594
+ },
595
+ ".*self_attn.*": {
596
+ "bits": 16,
597
+ "data_type": "float",
598
+ "act_bits": 16,
599
+ "act_data_type": "float"
600
+ }
601
+ }
602
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "</s>",
7
+ "is_local": false,
8
+ "legacy": true,
9
+ "local_files_only": false,
10
+ "model_max_length": 1000000000000000019884624838656,
11
+ "pad_token": "</s>",
12
+ "sp_model_kwargs": {},
13
+ "spaces_between_special_tokens": false,
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "<unk>",
16
+ "use_default_system_prompt": false
17
+ }