Upload quantized model TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning

Browse files

Files changed (8) hide show

README.md +178 -0
chat_template.jinja +54 -0
config.json +312 -0
generation_config.json +13 -0
model.safetensors +3 -0
quantization_config.json +277 -0
tokenizer.json +0 -0
tokenizer_config.json +21 -0

README.md ADDED Viewed

	@@ -0,0 +1,178 @@

+---
+base_model:
+- FlameF0X/TinyMoE-100m-2x8-ultrachat
+pipeline_tag: text-generation
+tags:
+- quantized
+- nvfp4
+- tuning
+- low-bit-open-llm-leaderboard
+---
+# TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning
+## Model Details
+This model is a NVFP4 (NVIDIA FP4) quantization of [FlameF0X/TinyMoE-100m-2x8-ultrachat](https://huggingface.co/FlameF0X/TinyMoE-100m-2x8-ultrachat) generated by TUNING. Please follow the license of the original model.
+## Quantization Details
+| Attribute | Value |
+|-----------|-------|
+| Base Model | [FlameF0X/TinyMoE-100m-2x8-ultrachat](https://huggingface.co/FlameF0X/TinyMoE-100m-2x8-ultrachat) |
+| Quantization Tool | TUNING |
+| Quantization Scheme | NVFP4 |
+| Quantized Size | 94 MB |
+## Evaluation Results
+| Task | Accuracy |
+|------|----------|
+| hellaswag | 0.2568 |
+| mmlu | 0.2295 |
+| mmlu_abstract_algebra | 0.2200 |
+| mmlu_anatomy | 0.1852 |
+| mmlu_astronomy | 0.1776 |
+| mmlu_business_ethics | 0.3000 |
+| mmlu_clinical_knowledge | 0.2151 |
+| mmlu_college_biology | 0.2569 |
+| mmlu_college_chemistry | 0.1800 |
+| mmlu_college_computer_science | 0.2600 |
+| mmlu_college_mathematics | 0.2100 |
+| mmlu_college_medicine | 0.2081 |
+| mmlu_college_physics | 0.2157 |
+| mmlu_computer_security | 0.2800 |
+| mmlu_conceptual_physics | 0.2638 |
+| mmlu_econometrics | 0.2368 |
+| mmlu_electrical_engineering | 0.2414 |
+| mmlu_elementary_mathematics | 0.2116 |
+| mmlu_formal_logic | 0.2778 |
+| mmlu_global_facts | 0.1800 |
+| mmlu_high_school_biology | 0.1774 |
+| mmlu_high_school_chemistry | 0.1527 |
+| mmlu_high_school_computer_science | 0.2500 |
+| mmlu_high_school_european_history | 0.2182 |
+| mmlu_high_school_geography | 0.1768 |
+| mmlu_high_school_government_and_politics | 0.1969 |
+| mmlu_high_school_macroeconomics | 0.2026 |
+| mmlu_high_school_mathematics | 0.2111 |
+| mmlu_high_school_microeconomics | 0.2101 |
+| mmlu_high_school_physics | 0.1987 |
+| mmlu_high_school_psychology | 0.1927 |
+| mmlu_high_school_statistics | 0.1528 |
+| mmlu_high_school_us_history | 0.2500 |
+| mmlu_high_school_world_history | 0.2700 |
+| mmlu_human_aging | 0.3184 |
+| mmlu_human_sexuality | 0.2595 |
+| mmlu_humanities | 0.2419 |
+| mmlu_international_law | 0.2397 |
+| mmlu_jurisprudence | 0.2593 |
+| mmlu_logical_fallacies | 0.2209 |
+| mmlu_machine_learning | 0.3125 |
+| mmlu_management | 0.1748 |
+| mmlu_marketing | 0.2906 |
+| mmlu_medical_genetics | 0.3000 |
+| mmlu_miscellaneous | 0.2375 |
+| mmlu_moral_disputes | 0.2486 |
+| mmlu_moral_scenarios | 0.2380 |
+| mmlu_nutrition | 0.2288 |
+| mmlu_other | 0.2404 |
+| mmlu_philosophy | 0.1865 |
+| mmlu_prehistory | 0.2160 |
+| mmlu_professional_accounting | 0.2340 |
+| mmlu_professional_law | 0.2458 |
+| mmlu_professional_medicine | 0.1838 |
+| mmlu_professional_psychology | 0.2500 |
+| mmlu_public_relations | 0.2182 |
+| mmlu_security_studies | 0.1878 |
+| mmlu_social_sciences | 0.2171 |
+| mmlu_sociology | 0.2438 |
+| mmlu_stem | 0.2122 |
+| mmlu_us_foreign_policy | 0.2800 |
+| mmlu_virology | 0.2831 |
+| mmlu_world_religions | 0.3216 |
+| piqa | 0.5256 |
+## How to Use
+### HF Usage
+**Step 1: Install [AutoRound](https://github.com/intel/auto-round)**
+```bash
+pip install auto-round
+```
+**Step 2: Load and run the quantized model**
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning"
+# load the tokenizer and the model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
+# prepare the model input
+prompt = "Write a quick sort algorithm."
+messages = [{"role": "user", "content": prompt}]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# conduct text completion
+generated_ids = model.generate(**model_inputs, max_new_tokens=512)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
+content = tokenizer.decode(output_ids, skip_special_tokens=True)
+print("content:", content)
+```
+### VLLM Usage
+```bash
+vllm serve TinyMoE-100m-2x8-ultrachat-AutoRound-NVFP4-Tuning \
+    --trust-remote-code \
+    --dtype bfloat16 \
+    --tensor_parallel_size 1
+```
+If you encounter any issues, feel free to open an issue on the [AutoRound GitHub repo](https://github.com/intel/auto-round/issues) or provide feedback on the [Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard).
+## Ethical Considerations and Limitations
+The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
+Therefore, before deploying any applications of the model, developers should perform safety testing.
+## Caveats and Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
+Here are a couple of useful links to learn more about Intel's AI software:
+- [Intel Neural Compressor](https://github.com/intel/neural-compressor)
+- [AutoRound](https://github.com/intel/auto-round)
+## Disclaimer
+The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
+## Cite
+```
+@article{cheng2023optimize,
+  title={Optimize weight rounding via signed gradient descent for the quantization of llms},
+  author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi},
+  journal={arXiv preprint arXiv:2309.05516},
+  year={2023}
+}
+```
+[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
+---
+*This model is part of the [Intel Low-Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard) initiative.*

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,312 @@

+{
+  "architectures": [
+    "MixtralForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "dtype": "bfloat16",
+  "eos_token_id": 32002,
+  "head_dim": null,
+  "hidden_act": "silu",
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 768,
+  "max_position_embeddings": 1024,
+  "model_type": "mixtral",
+  "num_attention_heads": 8,
+  "num_experts_per_tok": 2,
+  "num_hidden_layers": 10,
+  "num_key_value_heads": 4,
+  "num_local_experts": 8,
+  "output_router_logits": false,
+  "pad_token_id": 2,
+  "quantization_config": {
+    "act_bits": 4,
+    "act_data_type": "nv_fp4_with_static_gs",
+    "act_dynamic": true,
+    "act_group_size": 16,
+    "act_sym": true,
+    "autoround_version": "0.13.1",
+    "bits": 4,
+    "block_name_to_quantize": "model.layers",
+    "data_type": "nv_fp",
+    "extra_config": {
+      ".*mlp\\.gate.*": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      ".*model\\.layers\\.[0-9]\\.mlp\\.gate.*": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      ".*self_attn.*": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.0.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.1.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.2.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.3.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.4.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.5.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.6.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.7.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.8.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.k_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.o_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.q_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      },
+      "model.layers.9.self_attn.v_proj": {
+        "act_bits": 16,
+        "act_data_type": "float",
+        "bits": 16,
+        "data_type": "float"
+      }
+    },
+    "group_size": 16,
+    "low_gpu_mem_usage": true,
+    "packing_format": "auto_round:llm_compressor",
+    "quant_method": "auto-round",
+    "seqlen": 1024,
+    "sym": true
+  },
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000.0,
+    "rope_type": "default"
+  },
+  "router_aux_loss_coef": 0.001,
+  "router_jitter_noise": 0.0,
+  "sliding_window": 1024,
+  "tie_word_embeddings": false,
+  "transformers_version": "5.12.1",
+  "use_cache": false,
+  "vocab_size": 32064
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "do_sample": true,
+  "eos_token_id": [
+    32002
+  ],
+  "output_attentions": false,
+  "output_hidden_states": false,
+  "pad_token_id": 2,
+  "transformers_version": "5.12.1",
+  "use_cache": false
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:481651ce6bc3298ecef69d379b72d6d859426615cef0475a4b9328cc36650c79
+size 98124864

quantization_config.json ADDED Viewed

	@@ -0,0 +1,277 @@

+{
+  "bits": 4,
+  "act_bits": 4,
+  "data_type": "nv_fp",
+  "act_data_type": "nv_fp4_with_static_gs",
+  "group_size": 16,
+  "act_group_size": 16,
+  "sym": true,
+  "act_sym": true,
+  "act_dynamic": true,
+  "low_gpu_mem_usage": true,
+  "seqlen": 1024,
+  "autoround_version": "0.13.1",
+  "block_name_to_quantize": "model.layers",
+  "quant_method": "auto-round",
+  "packing_format": "auto_round:llm_compressor",
+  "extra_config": {
+    "model.layers.0.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.0.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.0.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.0.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.1.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.2.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.3.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.4.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.5.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.6.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.7.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.8.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.q_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.k_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.v_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    "model.layers.9.self_attn.o_proj": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    ".*mlp\\.gate.*": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    ".*self_attn.*": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    },
+    ".*model\\.layers\\.[0-9]\\.mlp\\.gate.*": {
+      "bits": 16,
+      "data_type": "float",
+      "act_bits": 16,
+      "act_data_type": "float"
+    }
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "add_prefix_space": null,
+  "backend": "tokenizers",
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "is_local": false,
+  "legacy": false,
+  "local_files_only": false,
+  "max_length": 1024,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "stride": 0,
+  "tokenizer_class": "TokenizersBackend",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}