Upload quantized model MiniCPM5-1B-AutoRound-NVFP4-RTN

Browse files

Files changed (8) hide show

README.md +179 -0
chat_template.jinja +179 -0
config.json +637 -0
generation_config.json +13 -0
model.safetensors +3 -0
quantization_config.json +602 -0
tokenizer.json +0 -0
tokenizer_config.json +17 -0

README.md ADDED Viewed

	@@ -0,0 +1,179 @@

+---
+base_model:
+- openbmb/MiniCPM5-1B
+pipeline_tag: text-generation
+tags:
+- quantized
+- nvfp4
+- autoround
+- low-bit-open-llm-leaderboard
+---
+# MiniCPM5-1B-AutoRound-NVFP4-RTN
+## Model Details
+This model is a NVFP4 (NVIDIA FP4) quantization of [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) generated by [AutoRound](https://github.com/intel/auto-round). Please follow the license of the original model.
+## Quantization Details
+| Attribute | Value |
+|-----------|-------|
+| Base Model | [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) |
+| Quantization Tool | [AutoRound](https://github.com/intel/auto-round) |
+| Quantization Scheme | NVFP4 |
+| Original Size | 1089 MB |
+| Quantized Size | 1363 MB |
+## Evaluation Results
+| Task | Accuracy |
+|------|----------|
+| hellaswag | 0.3691 |
+| mmlu | 0.4870 |
+| mmlu_abstract_algebra | 0.3000 |
+| mmlu_anatomy | 0.5407 |
+| mmlu_astronomy | 0.5395 |
+| mmlu_business_ethics | 0.4500 |
+| mmlu_clinical_knowledge | 0.5434 |
+| mmlu_college_biology | 0.5486 |
+| mmlu_college_chemistry | 0.3800 |
+| mmlu_college_computer_science | 0.4600 |
+| mmlu_college_mathematics | 0.3800 |
+| mmlu_college_medicine | 0.4855 |
+| mmlu_college_physics | 0.3235 |
+| mmlu_computer_security | 0.5700 |
+| mmlu_conceptual_physics | 0.4000 |
+| mmlu_econometrics | 0.2982 |
+| mmlu_electrical_engineering | 0.5517 |
+| mmlu_elementary_mathematics | 0.3519 |
+| mmlu_formal_logic | 0.3413 |
+| mmlu_global_facts | 0.2100 |
+| mmlu_high_school_biology | 0.5806 |
+| mmlu_high_school_chemistry | 0.4236 |
+| mmlu_high_school_computer_science | 0.4500 |
+| mmlu_high_school_european_history | 0.6061 |
+| mmlu_high_school_geography | 0.5859 |
+| mmlu_high_school_government_and_politics | 0.6321 |
+| mmlu_high_school_macroeconomics | 0.4692 |
+| mmlu_high_school_mathematics | 0.2926 |
+| mmlu_high_school_microeconomics | 0.5042 |
+| mmlu_high_school_physics | 0.2649 |
+| mmlu_high_school_psychology | 0.6624 |
+| mmlu_high_school_statistics | 0.3426 |
+| mmlu_high_school_us_history | 0.5588 |
+| mmlu_high_school_world_history | 0.6160 |
+| mmlu_human_aging | 0.4888 |
+| mmlu_human_sexuality | 0.6260 |
+| mmlu_humanities | 0.4389 |
+| mmlu_international_law | 0.7107 |
+| mmlu_jurisprudence | 0.5926 |
+| mmlu_logical_fallacies | 0.5828 |
+| mmlu_machine_learning | 0.3839 |
+| mmlu_management | 0.6408 |
+| mmlu_marketing | 0.7521 |
+| mmlu_medical_genetics | 0.6600 |
+| mmlu_miscellaneous | 0.6564 |
+| mmlu_moral_disputes | 0.5087 |
+| mmlu_moral_scenarios | 0.2380 |
+| mmlu_nutrition | 0.6307 |
+| mmlu_other | 0.5555 |
+| mmlu_philosophy | 0.5466 |
+| mmlu_prehistory | 0.5278 |
+| mmlu_professional_accounting | 0.3723 |
+| mmlu_professional_law | 0.3677 |
+| mmlu_professional_medicine | 0.4669 |
+| mmlu_professional_psychology | 0.4935 |
+| mmlu_public_relations | 0.5000 |
+| mmlu_security_studies | 0.5714 |
+| mmlu_social_sciences | 0.5583 |
+| mmlu_sociology | 0.6766 |
+| mmlu_stem | 0.4218 |
+| mmlu_us_foreign_policy | 0.6700 |
+| mmlu_virology | 0.4578 |
+| mmlu_world_religions | 0.7193 |
+| piqa | 0.6670 |
+## How to Use
+### HF Usage
+**Step 1: Install [AutoRound](https://github.com/intel/auto-round)**
+```bash
+pip install auto-round
+```
+**Step 2: Load and run the quantized model**
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "MiniCPM5-1B-AutoRound-NVFP4-RTN"
+# load the tokenizer and the model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
+# prepare the model input
+prompt = "Write a quick sort algorithm."
+messages = [{"role": "user", "content": prompt}]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# conduct text completion
+generated_ids = model.generate(**model_inputs, max_new_tokens=512)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
+content = tokenizer.decode(output_ids, skip_special_tokens=True)
+print("content:", content)
+```
+### VLLM Usage
+```bash
+vllm serve MiniCPM5-1B-AutoRound-NVFP4-RTN \
+    --trust-remote-code \
+    --dtype bfloat16 \
+    --tensor_parallel_size 1
+```
+If you encounter any issues, feel free to open an issue on the [AutoRound GitHub repo](https://github.com/intel/auto-round/issues) or provide feedback on the [Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard).
+## Ethical Considerations and Limitations
+The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
+Therefore, before deploying any applications of the model, developers should perform safety testing.
+## Caveats and Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
+Here are a couple of useful links to learn more about Intel's AI software:
+- [Intel Neural Compressor](https://github.com/intel/neural-compressor)
+- [AutoRound](https://github.com/intel/auto-round)
+## Disclaimer
+The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
+## Cite
+```
+@article{cheng2023optimize,
+  title={Optimize weight rounding via signed gradient descent for the quantization of llms},
+  author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi},
+  journal={arXiv preprint arXiv:2309.05516},
+  year={2023}
+}
+```
+[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
+---
+*This model is part of the [Intel Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard) initiative.*

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,179 @@

+{{- bos_token }}{%- if tools %}
+    {%- set tool_definitions %}
+        {{- "# Tools\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+        {%- for tool in tools %}
+            {{- "\n" }}
+            {{- tool | tojson(ensure_ascii=False) }}
+        {%- endfor %}
+        {{- '\n</tools>\n\nTool usage guidelines:\n- You may call zero or more functions. If no function calls are needed, just answer normally and do not include any <function ... </function>.\n- When calling a function, return an XML object within <function ... </function> using:\n<function name="function-name"><param name="param-name">param-value</param></function>\n- param-value may be multi-line. If it contains <, & or newline characters, wrap it in a CDATA block: <param name="param-name"><![CDATA[...multi-line value...]]></param>' }}
+    {%- endset %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {%- if '<tool_def_sep>' in messages[0].content %}
+            {{- messages[0].content.replace('<tool_def_sep>', tool_definitions) }}
+        {%- else %}
+            {{- messages[0].content + '\n\n' + tool_definitions }}
+        {%- endif %}
+    {%- else %}
+        {{- tool_definitions.lstrip() }}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- set content_parts = content.split('<tool_sep>') %}
+            {%- set processed_content = content_parts[0] %}
+            {%- set tool_calls_count = message.tool_calls|length %}
+            {%- set tool_sep_count = content_parts|length - 1 %}
+            {%- set min_count = [tool_calls_count, tool_sep_count]|min %}
+            {%- for i in range(1, content_parts|length) %}
+                {%- set tool_index = i - 1 %}
+                {%- if tool_index < tool_calls_count %}
+                    {%- set tool_call = message.tool_calls[tool_index] %}
+                    {%- if tool_call.function %}
+                        {%- set tool_call = tool_call.function %}
+                    {%- endif %}
+                    {%- set single_tool_xml %}
+                        {{- '<function name="' ~ tool_call.name ~ '">' }}
+                        {%- if tool_call.arguments %}
+                            {%- set args_dict = tool_call.arguments %}
+                            {%- for param_name, param_value in args_dict.items() %}
+                                {{- '<param name="' ~ param_name ~ '">' }}
+                                {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
+                                    {{- '<![CDATA[' + param_value + ']]>' }}
+                                {%- else %}
+                                    {{- param_value }}
+                                {%- endif %}
+                                {{- '</param>' }}
+                            {%- endfor %}
+                        {%- endif %}
+                        {{- '</function>' }}
+                    {%- endset %}
+                    {%- set processed_content = processed_content + single_tool_xml + content_parts[i] %}
+                {%- else %}
+                    {%- set processed_content = processed_content + content_parts[i] %}
+                {%- endif %}
+            {%- endfor %}
+            {%- if tool_calls_count > tool_sep_count %}
+                {%- for remaining_index in range(tool_sep_count, tool_calls_count) %}
+                    {%- set tool_call = message.tool_calls[remaining_index] %}
+                    {%- if tool_call.function %}
+                        {%- set tool_call = tool_call.function %}
+                    {%- endif %}
+                    {%- set remaining_tool_xml %}
+                        {{- '<function name="' ~ tool_call.name ~ '">' }}
+                        {%- if tool_call.arguments %}
+                            {%- set args_dict = tool_call.arguments %}
+                            {%- for param_name, param_value in args_dict.items() %}
+                                {{- '<param name="' ~ param_name ~ '">' }}
+                                {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
+                                    {{- '<![CDATA[' + param_value + ']]>' }}
+                                {%- else %}
+                                    {{- param_value }}
+                                {%- endif %}
+                                {{- '</param>' }}
+                            {%- endfor %}
+                        {%- endif %}
+                        {{- '</function>' }}
+                    {%- endset %}
+                    {%- set processed_content = processed_content + remaining_tool_xml %}
+                {%- endfor %}
+            {%- endif %}
+            {%- set content = processed_content %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if reasoning_content %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and not has_tool_sep %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<function name="' ~ tool_call.name ~ '">' }}
+                {%- if tool_call.arguments %}
+                    {%- set args_dict = tool_call.arguments %}
+                    {%- for param_name, param_value in args_dict.items() %}
+                        {{- '<param name="' ~ param_name ~ '">' }}
+                        {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
+                            {{- '<![CDATA[' + param_value + ']]>' }}
+                        {%- else %}
+                            {{- param_value }}
+                        {%- endif %}
+                        {{- '</param>' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {%- if message.content is string %}
+            {{- content }}
+        {%- else %}
+            {{- message.content | tojson(ensure_ascii=False) }}
+        {%- endif %}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined %}
+        {%- if enable_thinking is false %}
+            {{- '<think>\n\n</think>\n\n' }}
+        {%- elif enable_thinking is true %}
+            {{- '<think>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,637 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dtype": "bfloat16",
+  "eos_token_id": [
+    1,
+    130073
+  ],
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1536,
+  "initializer_range": 0.02,
+  "intermediate_size": 4608,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "pad_token_id": 1,
+  "pretraining_tp": 1,
+  "quantization_config": {
+    "act_bits": 4,
+    "act_data_type": "nv_fp4_with_static_gs",
+    "act_dynamic": true,
+    "act_group_size": 16,
+    "act_sym": true,
+    "autoround_version": "0.13.0",
+    "bits": 4,
+    "block_name_to_quantize": "model.layers",
+    "data_type": "nv_fp",
+    "enable_quanted_input": false,
+    "extra_config": {
+      ".*self_attn.*": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.10.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.10.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.10.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.10.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.11.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.11.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.11.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.11.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.12.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.12.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.12.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.12.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.13.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.13.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.13.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.13.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.14.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.14.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.14.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.14.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.15.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.15.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.15.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.15.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.16.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.16.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.16.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.16.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.17.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.17.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.17.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.17.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.18.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.18.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.18.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.18.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.19.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.19.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.19.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.19.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.20.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.20.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.20.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.20.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.21.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.21.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.21.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.21.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.22.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.22.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.22.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.22.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.23.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.23.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.23.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.23.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      }
+    },
+    "group_size": 16,
+    "iters": 0,
+    "low_gpu_mem_usage": true,
+    "packing_format": "auto_round:llm_compressor",
+    "quant_method": "auto-round",
+    "sym": true
+  },
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 5000000,
+    "rope_type": "default"
+  },
+  "tie_word_embeddings": false,
+  "transformers_version": "5.9.0",
+  "use_cache": true,
+  "vocab_size": 130560
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    130073
+  ],
+  "pad_token_id": 1,
+  "temperature": 0.9,
+  "top_p": 0.95,
+  "transformers_version": "5.9.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a35ec15e47e75f775d396182823bc3d013c209a4b5ef74d4079cbb67aae2a898
+size 1428753888

quantization_config.json ADDED Viewed

	@@ -0,0 +1,602 @@

+{
+  "bits": 4,
+  "act_bits": 4,
+  "data_type": "nv_fp",
+  "act_data_type": "nv_fp4_with_static_gs",
+  "group_size": 16,
+  "act_group_size": 16,
+  "sym": true,
+  "act_sym": true,
+  "act_dynamic": true,
+  "enable_quanted_input": false,
+  "iters": 0,
+  "low_gpu_mem_usage": true,
+  "autoround_version": "0.13.0",
+  "block_name_to_quantize": "model.layers",
+  "quant_method": "auto-round",
+  "packing_format": "auto_round:llm_compressor",
+  "extra_config": {
+    "model.layers.0.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.0.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.0.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.0.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.10.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.10.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.10.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.10.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.11.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.11.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.11.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.11.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.12.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.12.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.12.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.12.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.13.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.13.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.13.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.13.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.14.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.14.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.14.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.14.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.15.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.15.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.15.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.15.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.16.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.16.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.16.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.16.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.17.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.17.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.17.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.17.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.18.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.18.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.18.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.18.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.19.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.19.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.19.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.19.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.20.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.20.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.20.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.20.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.21.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.21.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.21.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.21.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.22.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.22.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.22.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.22.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.23.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.23.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.23.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.23.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    ".*self_attn.*": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    }
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "add_prefix_space": null,
+  "backend": "tokenizers",
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "is_local": false,
+  "legacy": true,
+  "local_files_only": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}