Instructions to use unsloth/Qwen3-235B-A22B-Instruct-2507 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use unsloth/Qwen3-235B-A22B-Instruct-2507 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="unsloth/Qwen3-235B-A22B-Instruct-2507") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-235B-A22B-Instruct-2507") model = AutoModelForMultimodalLM.from_pretrained("unsloth/Qwen3-235B-A22B-Instruct-2507") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use unsloth/Qwen3-235B-A22B-Instruct-2507 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "unsloth/Qwen3-235B-A22B-Instruct-2507" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/Qwen3-235B-A22B-Instruct-2507", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/unsloth/Qwen3-235B-A22B-Instruct-2507
- SGLang
How to use unsloth/Qwen3-235B-A22B-Instruct-2507 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "unsloth/Qwen3-235B-A22B-Instruct-2507" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/Qwen3-235B-A22B-Instruct-2507", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "unsloth/Qwen3-235B-A22B-Instruct-2507" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/Qwen3-235B-A22B-Instruct-2507", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use unsloth/Qwen3-235B-A22B-Instruct-2507 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/Qwen3-235B-A22B-Instruct-2507 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/Qwen3-235B-A22B-Instruct-2507 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for unsloth/Qwen3-235B-A22B-Instruct-2507 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="unsloth/Qwen3-235B-A22B-Instruct-2507", max_seq_length=2048, ) - Docker Model Runner
How to use unsloth/Qwen3-235B-A22B-Instruct-2507 with Docker Model Runner:
docker model run hf.co/unsloth/Qwen3-235B-A22B-Instruct-2507
Upload folder using huggingface_hub
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- .gitattributes +1 -0
- LICENSE +202 -0
- README.md +241 -0
- added_tokens.json +28 -0
- chat_template.jinja +99 -0
- config.json +39 -0
- generation_config.json +13 -0
- merges.txt +0 -0
- model-00001-of-00118.safetensors +3 -0
- model-00002-of-00118.safetensors +3 -0
- model-00003-of-00118.safetensors +3 -0
- model-00004-of-00118.safetensors +3 -0
- model-00005-of-00118.safetensors +3 -0
- model-00006-of-00118.safetensors +3 -0
- model-00007-of-00118.safetensors +3 -0
- model-00008-of-00118.safetensors +3 -0
- model-00009-of-00118.safetensors +3 -0
- model-00010-of-00118.safetensors +3 -0
- model-00011-of-00118.safetensors +3 -0
- model-00012-of-00118.safetensors +3 -0
- model-00013-of-00118.safetensors +3 -0
- model-00014-of-00118.safetensors +3 -0
- model-00015-of-00118.safetensors +3 -0
- model-00016-of-00118.safetensors +3 -0
- model-00017-of-00118.safetensors +3 -0
- model-00018-of-00118.safetensors +3 -0
- model-00019-of-00118.safetensors +3 -0
- model-00020-of-00118.safetensors +3 -0
- model-00021-of-00118.safetensors +3 -0
- model-00022-of-00118.safetensors +3 -0
- model-00023-of-00118.safetensors +3 -0
- model-00024-of-00118.safetensors +3 -0
- model-00025-of-00118.safetensors +3 -0
- model-00026-of-00118.safetensors +3 -0
- model-00027-of-00118.safetensors +3 -0
- model-00028-of-00118.safetensors +3 -0
- model-00029-of-00118.safetensors +3 -0
- model-00030-of-00118.safetensors +3 -0
- model-00031-of-00118.safetensors +3 -0
- model-00032-of-00118.safetensors +3 -0
- model-00033-of-00118.safetensors +3 -0
- model-00034-of-00118.safetensors +3 -0
- model-00035-of-00118.safetensors +3 -0
- model-00036-of-00118.safetensors +3 -0
- model-00037-of-00118.safetensors +3 -0
- model-00038-of-00118.safetensors +3 -0
- model-00039-of-00118.safetensors +3 -0
- model-00040-of-00118.safetensors +3 -0
- model-00041-of-00118.safetensors +3 -0
- model-00042-of-00118.safetensors +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
Apache License
|
| 3 |
+
Version 2.0, January 2004
|
| 4 |
+
http://www.apache.org/licenses/
|
| 5 |
+
|
| 6 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 7 |
+
|
| 8 |
+
1. Definitions.
|
| 9 |
+
|
| 10 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
| 11 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
| 12 |
+
|
| 13 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
| 14 |
+
the copyright owner that is granting the License.
|
| 15 |
+
|
| 16 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
| 17 |
+
other entities that control, are controlled by, or are under common
|
| 18 |
+
control with that entity. For the purposes of this definition,
|
| 19 |
+
"control" means (i) the power, direct or indirect, to cause the
|
| 20 |
+
direction or management of such entity, whether by contract or
|
| 21 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
| 22 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
| 23 |
+
|
| 24 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
| 25 |
+
exercising permissions granted by this License.
|
| 26 |
+
|
| 27 |
+
"Source" form shall mean the preferred form for making modifications,
|
| 28 |
+
including but not limited to software source code, documentation
|
| 29 |
+
source, and configuration files.
|
| 30 |
+
|
| 31 |
+
"Object" form shall mean any form resulting from mechanical
|
| 32 |
+
transformation or translation of a Source form, including but
|
| 33 |
+
not limited to compiled object code, generated documentation,
|
| 34 |
+
and conversions to other media types.
|
| 35 |
+
|
| 36 |
+
"Work" shall mean the work of authorship, whether in Source or
|
| 37 |
+
Object form, made available under the License, as indicated by a
|
| 38 |
+
copyright notice that is included in or attached to the work
|
| 39 |
+
(an example is provided in the Appendix below).
|
| 40 |
+
|
| 41 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
| 42 |
+
form, that is based on (or derived from) the Work and for which the
|
| 43 |
+
editorial revisions, annotations, elaborations, or other modifications
|
| 44 |
+
represent, as a whole, an original work of authorship. For the purposes
|
| 45 |
+
of this License, Derivative Works shall not include works that remain
|
| 46 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
| 47 |
+
the Work and Derivative Works thereof.
|
| 48 |
+
|
| 49 |
+
"Contribution" shall mean any work of authorship, including
|
| 50 |
+
the original version of the Work and any modifications or additions
|
| 51 |
+
to that Work or Derivative Works thereof, that is intentionally
|
| 52 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
| 53 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
| 54 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
| 55 |
+
means any form of electronic, verbal, or written communication sent
|
| 56 |
+
to the Licensor or its representatives, including but not limited to
|
| 57 |
+
communication on electronic mailing lists, source code control systems,
|
| 58 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
| 59 |
+
Licensor for the purpose of discussing and improving the Work, but
|
| 60 |
+
excluding communication that is conspicuously marked or otherwise
|
| 61 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
| 62 |
+
|
| 63 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
| 64 |
+
on behalf of whom a Contribution has been received by Licensor and
|
| 65 |
+
subsequently incorporated within the Work.
|
| 66 |
+
|
| 67 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
| 68 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 69 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 70 |
+
copyright license to reproduce, prepare Derivative Works of,
|
| 71 |
+
publicly display, publicly perform, sublicense, and distribute the
|
| 72 |
+
Work and such Derivative Works in Source or Object form.
|
| 73 |
+
|
| 74 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
| 75 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 76 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 77 |
+
(except as stated in this section) patent license to make, have made,
|
| 78 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
| 79 |
+
where such license applies only to those patent claims licensable
|
| 80 |
+
by such Contributor that are necessarily infringed by their
|
| 81 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
| 82 |
+
with the Work to which such Contribution(s) was submitted. If You
|
| 83 |
+
institute patent litigation against any entity (including a
|
| 84 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
| 85 |
+
or a Contribution incorporated within the Work constitutes direct
|
| 86 |
+
or contributory patent infringement, then any patent licenses
|
| 87 |
+
granted to You under this License for that Work shall terminate
|
| 88 |
+
as of the date such litigation is filed.
|
| 89 |
+
|
| 90 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
| 91 |
+
Work or Derivative Works thereof in any medium, with or without
|
| 92 |
+
modifications, and in Source or Object form, provided that You
|
| 93 |
+
meet the following conditions:
|
| 94 |
+
|
| 95 |
+
(a) You must give any other recipients of the Work or
|
| 96 |
+
Derivative Works a copy of this License; and
|
| 97 |
+
|
| 98 |
+
(b) You must cause any modified files to carry prominent notices
|
| 99 |
+
stating that You changed the files; and
|
| 100 |
+
|
| 101 |
+
(c) You must retain, in the Source form of any Derivative Works
|
| 102 |
+
that You distribute, all copyright, patent, trademark, and
|
| 103 |
+
attribution notices from the Source form of the Work,
|
| 104 |
+
excluding those notices that do not pertain to any part of
|
| 105 |
+
the Derivative Works; and
|
| 106 |
+
|
| 107 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
| 108 |
+
distribution, then any Derivative Works that You distribute must
|
| 109 |
+
include a readable copy of the attribution notices contained
|
| 110 |
+
within such NOTICE file, excluding those notices that do not
|
| 111 |
+
pertain to any part of the Derivative Works, in at least one
|
| 112 |
+
of the following places: within a NOTICE text file distributed
|
| 113 |
+
as part of the Derivative Works; within the Source form or
|
| 114 |
+
documentation, if provided along with the Derivative Works; or,
|
| 115 |
+
within a display generated by the Derivative Works, if and
|
| 116 |
+
wherever such third-party notices normally appear. The contents
|
| 117 |
+
of the NOTICE file are for informational purposes only and
|
| 118 |
+
do not modify the License. You may add Your own attribution
|
| 119 |
+
notices within Derivative Works that You distribute, alongside
|
| 120 |
+
or as an addendum to the NOTICE text from the Work, provided
|
| 121 |
+
that such additional attribution notices cannot be construed
|
| 122 |
+
as modifying the License.
|
| 123 |
+
|
| 124 |
+
You may add Your own copyright statement to Your modifications and
|
| 125 |
+
may provide additional or different license terms and conditions
|
| 126 |
+
for use, reproduction, or distribution of Your modifications, or
|
| 127 |
+
for any such Derivative Works as a whole, provided Your use,
|
| 128 |
+
reproduction, and distribution of the Work otherwise complies with
|
| 129 |
+
the conditions stated in this License.
|
| 130 |
+
|
| 131 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
| 132 |
+
any Contribution intentionally submitted for inclusion in the Work
|
| 133 |
+
by You to the Licensor shall be under the terms and conditions of
|
| 134 |
+
this License, without any additional terms or conditions.
|
| 135 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
| 136 |
+
the terms of any separate license agreement you may have executed
|
| 137 |
+
with Licensor regarding such Contributions.
|
| 138 |
+
|
| 139 |
+
6. Trademarks. This License does not grant permission to use the trade
|
| 140 |
+
names, trademarks, service marks, or product names of the Licensor,
|
| 141 |
+
except as required for reasonable and customary use in describing the
|
| 142 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
| 143 |
+
|
| 144 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
| 145 |
+
agreed to in writing, Licensor provides the Work (and each
|
| 146 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
| 147 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
| 148 |
+
implied, including, without limitation, any warranties or conditions
|
| 149 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
| 150 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
| 151 |
+
appropriateness of using or redistributing the Work and assume any
|
| 152 |
+
risks associated with Your exercise of permissions under this License.
|
| 153 |
+
|
| 154 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
| 155 |
+
whether in tort (including negligence), contract, or otherwise,
|
| 156 |
+
unless required by applicable law (such as deliberate and grossly
|
| 157 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
| 158 |
+
liable to You for damages, including any direct, indirect, special,
|
| 159 |
+
incidental, or consequential damages of any character arising as a
|
| 160 |
+
result of this License or out of the use or inability to use the
|
| 161 |
+
Work (including but not limited to damages for loss of goodwill,
|
| 162 |
+
work stoppage, computer failure or malfunction, or any and all
|
| 163 |
+
other commercial damages or losses), even if such Contributor
|
| 164 |
+
has been advised of the possibility of such damages.
|
| 165 |
+
|
| 166 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
| 167 |
+
the Work or Derivative Works thereof, You may choose to offer,
|
| 168 |
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
| 169 |
+
or other liability obligations and/or rights consistent with this
|
| 170 |
+
License. However, in accepting such obligations, You may act only
|
| 171 |
+
on Your own behalf and on Your sole responsibility, not on behalf
|
| 172 |
+
of any other Contributor, and only if You agree to indemnify,
|
| 173 |
+
defend, and hold each Contributor harmless for any liability
|
| 174 |
+
incurred by, or claims asserted against, such Contributor by reason
|
| 175 |
+
of your accepting any such warranty or additional liability.
|
| 176 |
+
|
| 177 |
+
END OF TERMS AND CONDITIONS
|
| 178 |
+
|
| 179 |
+
APPENDIX: How to apply the Apache License to your work.
|
| 180 |
+
|
| 181 |
+
To apply the Apache License to your work, attach the following
|
| 182 |
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
| 183 |
+
replaced with your own identifying information. (Don't include
|
| 184 |
+
the brackets!) The text should be enclosed in the appropriate
|
| 185 |
+
comment syntax for the file format. We also recommend that a
|
| 186 |
+
file or class name and description of purpose be included on the
|
| 187 |
+
same "printed page" as the copyright notice for easier
|
| 188 |
+
identification within third-party archives.
|
| 189 |
+
|
| 190 |
+
Copyright 2024 Alibaba Cloud
|
| 191 |
+
|
| 192 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 193 |
+
you may not use this file except in compliance with the License.
|
| 194 |
+
You may obtain a copy of the License at
|
| 195 |
+
|
| 196 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 197 |
+
|
| 198 |
+
Unless required by applicable law or agreed to in writing, software
|
| 199 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 200 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 201 |
+
See the License for the specific language governing permissions and
|
| 202 |
+
limitations under the License.
|
README.md
ADDED
|
@@ -0,0 +1,241 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- unsloth
|
| 4 |
+
base_model:
|
| 5 |
+
- Qwen/Qwen3-235B-A22B-Instruct-2507
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
+
---
|
| 11 |
+
> [!NOTE]
|
| 12 |
+
> Includes Unsloth **chat template fixes**! <br> For `llama.cpp`, use `--jinja`
|
| 13 |
+
>
|
| 14 |
+
|
| 15 |
+
<div>
|
| 16 |
+
<p style="margin-top: 0;margin-bottom: 0;">
|
| 17 |
+
<em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
|
| 18 |
+
</p>
|
| 19 |
+
<div style="display: flex; gap: 5px; align-items: center; ">
|
| 20 |
+
<a href="https://github.com/unslothai/unsloth/">
|
| 21 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
|
| 22 |
+
</a>
|
| 23 |
+
<a href="https://discord.gg/unsloth">
|
| 24 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
|
| 25 |
+
</a>
|
| 26 |
+
<a href="https://docs.unsloth.ai/">
|
| 27 |
+
<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
|
| 28 |
+
</a>
|
| 29 |
+
</div>
|
| 30 |
+
</div>
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
# Qwen3-235B-A22B-Instruct-2507
|
| 34 |
+
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
|
| 35 |
+
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
|
| 36 |
+
</a>
|
| 37 |
+
|
| 38 |
+
## Highlights
|
| 39 |
+
|
| 40 |
+
We introduce the updated version of the **Qwen3-235B-A22B non-thinking mode**, named **Qwen3-235B-A22B-Instruct-2507**, featuring the following key enhancements:
|
| 41 |
+
|
| 42 |
+
- **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**.
|
| 43 |
+
- **Substantial gains** in long-tail knowledge coverage across **multiple languages**.
|
| 44 |
+
- **Markedly better alignment** with user preferences in **subjective and open-ended tasks**, enabling more helpful responses and higher-quality text generation.
|
| 45 |
+
- **Enhanced capabilities** in **256K long-context understanding**.
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+

|
| 49 |
+
|
| 50 |
+
## Model Overview
|
| 51 |
+
|
| 52 |
+
**Qwen3-235B-A22B-Instruct-2507** has the following features:
|
| 53 |
+
- Type: Causal Language Models
|
| 54 |
+
- Training Stage: Pretraining & Post-training
|
| 55 |
+
- Number of Parameters: 235B in total and 22B activated
|
| 56 |
+
- Number of Paramaters (Non-Embedding): 234B
|
| 57 |
+
- Number of Layers: 94
|
| 58 |
+
- Number of Attention Heads (GQA): 64 for Q and 4 for KV
|
| 59 |
+
- Number of Experts: 128
|
| 60 |
+
- Number of Activated Experts: 8
|
| 61 |
+
- Context Length: **262,144 natively**.
|
| 62 |
+
|
| 63 |
+
**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
|
| 64 |
+
|
| 65 |
+
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
## Performance
|
| 69 |
+
|
| 70 |
+
| | Deepseek-V3-0324 | GPT-4o-0327 | Claude Opus 4 Non-thinking | Kimi K2 | Qwen3-235B-A22B Non-thinking | Qwen3-235B-A22B-Instruct-2507 |
|
| 71 |
+
|--- | --- | --- | --- | --- | --- | ---|
|
| 72 |
+
| **Knowledge** | | | | | | |
|
| 73 |
+
| MMLU-Pro | 81.2 | 79.8 | **86.6** | 81.1 | 75.2 | 83.0 |
|
| 74 |
+
| MMLU-Redux | 90.4 | 91.3 | **94.2** | 92.7 | 89.2 | 93.1 |
|
| 75 |
+
| GPQA | 68.4 | 66.9 | 74.9 | 75.1 | 62.9 | **77.5** |
|
| 76 |
+
| SuperGPQA | 57.3 | 51.0 | 56.5 | 57.2 | 48.2 | **62.6** |
|
| 77 |
+
| SimpleQA | 27.2 | 40.3 | 22.8 | 31.0 | 12.2 | **54.3** |
|
| 78 |
+
| CSimpleQA | 71.1 | 60.2 | 68.0 | 74.5 | 60.8 | **84.3** |
|
| 79 |
+
| **Reasoning** | | | | | | |
|
| 80 |
+
| AIME25 | 46.6 | 26.7 | 33.9 | 49.5 | 24.7 | **70.3** |
|
| 81 |
+
| HMMT25 | 27.5 | 7.9 | 15.9 | 38.8 | 10.0 | **55.4** |
|
| 82 |
+
| ARC-AGI | 9.0 | 8.8 | 30.3 | 13.3 | 4.3 | **41.8** |
|
| 83 |
+
| ZebraLogic | 83.4 | 52.6 | - | 89.0 | 37.7 | **95.0** |
|
| 84 |
+
| LiveBench 20241125 | 66.9 | 63.7 | 74.6 | **76.4** | 62.5 | 75.4 |
|
| 85 |
+
| **Coding** | | | | | | |
|
| 86 |
+
| LiveCodeBench v6 (25.02-25.05) | 45.2 | 35.8 | 44.6 | 48.9 | 32.9 | **51.8** |
|
| 87 |
+
| MultiPL-E | 82.2 | 82.7 | **88.5** | 85.7 | 79.3 | 87.9 |
|
| 88 |
+
| Aider-Polyglot | 55.1 | 45.3 | **70.7** | 59.0 | 59.6 | 57.3 |
|
| 89 |
+
| **Alignment** | | | | | | |
|
| 90 |
+
| IFEval | 82.3 | 83.9 | 87.4 | **89.8** | 83.2 | 88.7 |
|
| 91 |
+
| Arena-Hard v2* | 45.6 | 61.9 | 51.5 | 66.1 | 52.0 | **79.2** |
|
| 92 |
+
| Creative Writing v3 | 81.6 | 84.9 | 83.8 | **88.1** | 80.4 | 87.5 |
|
| 93 |
+
| WritingBench | 74.5 | 75.5 | 79.2 | **86.2** | 77.0 | 85.2 |
|
| 94 |
+
| **Agent** | | | | | | |
|
| 95 |
+
| BFCL-v3 | 64.7 | 66.5 | 60.1 | 65.2 | 68.0 | **70.9** |
|
| 96 |
+
| TAU-Retail | 49.6 | 60.3# | **81.4** | 70.7 | 65.2 | 71.3 |
|
| 97 |
+
| TAU-Airline | 32.0 | 42.8# | **59.6** | 53.5 | 32.0 | 44.0 |
|
| 98 |
+
| **Multilingualism** | | | | | | |
|
| 99 |
+
| MultiIF | 66.5 | 70.4 | - | 76.2 | 70.2 | **77.5** |
|
| 100 |
+
| MMLU-ProX | 75.8 | 76.2 | - | 74.5 | 73.2 | **79.4** |
|
| 101 |
+
| INCLUDE | 80.1 | **82.1** | - | 76.9 | 75.6 | 79.5 |
|
| 102 |
+
| PolyMATH | 32.2 | 25.5 | 30.0 | 44.8 | 27.0 | **50.2** |
|
| 103 |
+
|
| 104 |
+
*: For reproducibility, we report the win rates evaluated by GPT-4.1.
|
| 105 |
+
|
| 106 |
+
\#: Results were generated using GPT-4o-20241120, as access to the native function calling API of GPT-4o-0327 was unavailable.
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
## Quickstart
|
| 110 |
+
|
| 111 |
+
The code of Qwen3-MoE has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
|
| 112 |
+
|
| 113 |
+
With `transformers<4.51.0`, you will encounter the following error:
|
| 114 |
+
```
|
| 115 |
+
KeyError: 'qwen3_moe'
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
|
| 119 |
+
```python
|
| 120 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 121 |
+
|
| 122 |
+
model_name = "Qwen/Qwen3-235B-A22B-Instruct-2507"
|
| 123 |
+
|
| 124 |
+
# load the tokenizer and the model
|
| 125 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 126 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 127 |
+
model_name,
|
| 128 |
+
torch_dtype="auto",
|
| 129 |
+
device_map="auto"
|
| 130 |
+
)
|
| 131 |
+
|
| 132 |
+
# prepare the model input
|
| 133 |
+
prompt = "Give me a short introduction to large language model."
|
| 134 |
+
messages = [
|
| 135 |
+
{"role": "user", "content": prompt}
|
| 136 |
+
]
|
| 137 |
+
text = tokenizer.apply_chat_template(
|
| 138 |
+
messages,
|
| 139 |
+
tokenize=False,
|
| 140 |
+
add_generation_prompt=True,
|
| 141 |
+
)
|
| 142 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 143 |
+
|
| 144 |
+
# conduct text completion
|
| 145 |
+
generated_ids = model.generate(
|
| 146 |
+
**model_inputs,
|
| 147 |
+
max_new_tokens=16384
|
| 148 |
+
)
|
| 149 |
+
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
| 150 |
+
|
| 151 |
+
content = tokenizer.decode(output_ids, skip_special_tokens=True)
|
| 152 |
+
|
| 153 |
+
print("content:", content)
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
|
| 157 |
+
- SGLang:
|
| 158 |
+
```shell
|
| 159 |
+
python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Instruct-2507 --tp 8 --context-length 262144
|
| 160 |
+
```
|
| 161 |
+
- vLLM:
|
| 162 |
+
```shell
|
| 163 |
+
vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
| 167 |
+
|
| 168 |
+
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
| 169 |
+
|
| 170 |
+
## Agentic Use
|
| 171 |
+
|
| 172 |
+
Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
|
| 173 |
+
|
| 174 |
+
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
|
| 175 |
+
```python
|
| 176 |
+
from qwen_agent.agents import Assistant
|
| 177 |
+
|
| 178 |
+
# Define LLM
|
| 179 |
+
llm_cfg = {
|
| 180 |
+
'model': 'Qwen3-235B-A22B-Instruct-2507',
|
| 181 |
+
|
| 182 |
+
# Use a custom endpoint compatible with OpenAI API:
|
| 183 |
+
'model_server': 'http://localhost:8000/v1', # api_base
|
| 184 |
+
'api_key': 'EMPTY',
|
| 185 |
+
}
|
| 186 |
+
|
| 187 |
+
# Define Tools
|
| 188 |
+
tools = [
|
| 189 |
+
{'mcpServers': { # You can specify the MCP configuration file
|
| 190 |
+
'time': {
|
| 191 |
+
'command': 'uvx',
|
| 192 |
+
'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
|
| 193 |
+
},
|
| 194 |
+
"fetch": {
|
| 195 |
+
"command": "uvx",
|
| 196 |
+
"args": ["mcp-server-fetch"]
|
| 197 |
+
}
|
| 198 |
+
}
|
| 199 |
+
},
|
| 200 |
+
'code_interpreter', # Built-in tools
|
| 201 |
+
]
|
| 202 |
+
|
| 203 |
+
# Define Agent
|
| 204 |
+
bot = Assistant(llm=llm_cfg, function_list=tools)
|
| 205 |
+
|
| 206 |
+
# Streaming generation
|
| 207 |
+
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
|
| 208 |
+
for responses in bot.run(messages=messages):
|
| 209 |
+
pass
|
| 210 |
+
print(responses)
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
## Best Practices
|
| 214 |
+
|
| 215 |
+
To achieve optimal performance, we recommend the following settings:
|
| 216 |
+
|
| 217 |
+
1. **Sampling Parameters**:
|
| 218 |
+
- We suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
|
| 219 |
+
- For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
|
| 220 |
+
|
| 221 |
+
2. **Adequate Output Length**: We recommend using an output length of 16,384 tokens for most queries, which is adequate for instruct models.
|
| 222 |
+
|
| 223 |
+
3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
|
| 224 |
+
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
|
| 225 |
+
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
|
| 226 |
+
|
| 227 |
+
### Citation
|
| 228 |
+
|
| 229 |
+
If you find our work helpful, feel free to give us a cite.
|
| 230 |
+
|
| 231 |
+
```
|
| 232 |
+
@misc{qwen3technicalreport,
|
| 233 |
+
title={Qwen3 Technical Report},
|
| 234 |
+
author={Qwen Team},
|
| 235 |
+
year={2025},
|
| 236 |
+
eprint={2505.09388},
|
| 237 |
+
archivePrefix={arXiv},
|
| 238 |
+
primaryClass={cs.CL},
|
| 239 |
+
url={https://arxiv.org/abs/2505.09388},
|
| 240 |
+
}
|
| 241 |
+
```
|
added_tokens.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"</think>": 151668,
|
| 3 |
+
"</tool_call>": 151658,
|
| 4 |
+
"</tool_response>": 151666,
|
| 5 |
+
"<think>": 151667,
|
| 6 |
+
"<tool_call>": 151657,
|
| 7 |
+
"<tool_response>": 151665,
|
| 8 |
+
"<|box_end|>": 151649,
|
| 9 |
+
"<|box_start|>": 151648,
|
| 10 |
+
"<|endoftext|>": 151643,
|
| 11 |
+
"<|file_sep|>": 151664,
|
| 12 |
+
"<|fim_middle|>": 151660,
|
| 13 |
+
"<|fim_pad|>": 151662,
|
| 14 |
+
"<|fim_prefix|>": 151659,
|
| 15 |
+
"<|fim_suffix|>": 151661,
|
| 16 |
+
"<|im_end|>": 151645,
|
| 17 |
+
"<|im_start|>": 151644,
|
| 18 |
+
"<|image_pad|>": 151655,
|
| 19 |
+
"<|object_ref_end|>": 151647,
|
| 20 |
+
"<|object_ref_start|>": 151646,
|
| 21 |
+
"<|quad_end|>": 151651,
|
| 22 |
+
"<|quad_start|>": 151650,
|
| 23 |
+
"<|repo_name|>": 151663,
|
| 24 |
+
"<|video_pad|>": 151656,
|
| 25 |
+
"<|vision_end|>": 151653,
|
| 26 |
+
"<|vision_pad|>": 151654,
|
| 27 |
+
"<|vision_start|>": 151652
|
| 28 |
+
}
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- if tools %}
|
| 2 |
+
{{- '<|im_start|>system\n' }}
|
| 3 |
+
{%- if messages[0].role == 'system' %}
|
| 4 |
+
{{- messages[0].content + '\n\n' }}
|
| 5 |
+
{%- endif %}
|
| 6 |
+
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
| 7 |
+
{%- for tool in tools %}
|
| 8 |
+
{{- "\n" }}
|
| 9 |
+
{{- tool | tojson }}
|
| 10 |
+
{%- endfor %}
|
| 11 |
+
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
| 12 |
+
{%- else %}
|
| 13 |
+
{%- if messages[0].role == 'system' %}
|
| 14 |
+
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
| 15 |
+
{%- endif %}
|
| 16 |
+
{%- endif %}
|
| 17 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 18 |
+
{%- for forward_message in messages %}
|
| 19 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 20 |
+
{%- set message = messages[index] %}
|
| 21 |
+
{%- set current_content = message.content if message.content is defined and message.content is not none else '' %}
|
| 22 |
+
{%- set tool_start = '<tool_response>' %}
|
| 23 |
+
{%- set tool_start_length = tool_start|length %}
|
| 24 |
+
{%- set start_of_message = current_content[:tool_start_length] %}
|
| 25 |
+
{%- set tool_end = '</tool_response>' %}
|
| 26 |
+
{%- set tool_end_length = tool_end|length %}
|
| 27 |
+
{%- set start_pos = (current_content|length) - tool_end_length %}
|
| 28 |
+
{%- if start_pos < 0 %}
|
| 29 |
+
{%- set start_pos = 0 %}
|
| 30 |
+
{%- endif %}
|
| 31 |
+
{%- set end_of_message = current_content[start_pos:] %}
|
| 32 |
+
{%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
|
| 33 |
+
{%- set ns.multi_step_tool = false %}
|
| 34 |
+
{%- set ns.last_query_index = index %}
|
| 35 |
+
{%- endif %}
|
| 36 |
+
{%- endfor %}
|
| 37 |
+
{%- for message in messages %}
|
| 38 |
+
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
| 39 |
+
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
| 40 |
+
{%- elif message.role == "assistant" %}
|
| 41 |
+
{%- set m_content = message.content if message.content is defined and message.content is not none else '' %}
|
| 42 |
+
{%- set content = m_content %}
|
| 43 |
+
{%- set reasoning_content = '' %}
|
| 44 |
+
{%- if message.reasoning_content is defined and message.reasoning_content is not none %}
|
| 45 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 46 |
+
{%- else %}
|
| 47 |
+
{%- if '</think>' in m_content %}
|
| 48 |
+
{%- set content = (m_content.split('</think>')|last).lstrip('\n') %}
|
| 49 |
+
{%- set reasoning_content = (m_content.split('</think>')|first).rstrip('\n') %}
|
| 50 |
+
{%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\n') %}
|
| 51 |
+
{%- endif %}
|
| 52 |
+
{%- endif %}
|
| 53 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 54 |
+
{%- if loop.last or (not loop.last and (not reasoning_content.strip() == '')) %}
|
| 55 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
| 56 |
+
{%- else %}
|
| 57 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- else %}
|
| 60 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 61 |
+
{%- endif %}
|
| 62 |
+
{%- if message.tool_calls %}
|
| 63 |
+
{%- for tool_call in message.tool_calls %}
|
| 64 |
+
{%- if (loop.first and content) or (not loop.first) %}
|
| 65 |
+
{{- '\n' }}
|
| 66 |
+
{%- endif %}
|
| 67 |
+
{%- if tool_call.function %}
|
| 68 |
+
{%- set tool_call = tool_call.function %}
|
| 69 |
+
{%- endif %}
|
| 70 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 71 |
+
{{- tool_call.name }}
|
| 72 |
+
{{- '", "arguments": ' }}
|
| 73 |
+
{%- if tool_call.arguments is string %}
|
| 74 |
+
{{- tool_call.arguments }}
|
| 75 |
+
{%- else %}
|
| 76 |
+
{{- tool_call.arguments | tojson }}
|
| 77 |
+
{%- endif %}
|
| 78 |
+
{{- '}\n</tool_call>' }}
|
| 79 |
+
{%- endfor %}
|
| 80 |
+
{%- endif %}
|
| 81 |
+
{{- '<|im_end|>\n' }}
|
| 82 |
+
{%- elif message.role == "tool" %}
|
| 83 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 84 |
+
{{- '<|im_start|>user' }}
|
| 85 |
+
{%- endif %}
|
| 86 |
+
{{- '\n<tool_response>\n' }}
|
| 87 |
+
{{- message.content }}
|
| 88 |
+
{{- '\n</tool_response>' }}
|
| 89 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
| 90 |
+
{{- '<|im_end|>\n' }}
|
| 91 |
+
{%- endif %}
|
| 92 |
+
{%- endif %}
|
| 93 |
+
{%- endfor %}
|
| 94 |
+
{%- if add_generation_prompt %}
|
| 95 |
+
{{- '<|im_start|>assistant\n' }}
|
| 96 |
+
{%- if enable_thinking is defined and enable_thinking is false %}
|
| 97 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 98 |
+
{%- endif %}
|
| 99 |
+
{%- endif %}
|
config.json
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen3MoeForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"decoder_sparse_step": 1,
|
| 8 |
+
"eos_token_id": 151645,
|
| 9 |
+
"head_dim": 128,
|
| 10 |
+
"hidden_act": "silu",
|
| 11 |
+
"hidden_size": 4096,
|
| 12 |
+
"initializer_range": 0.02,
|
| 13 |
+
"intermediate_size": 12288,
|
| 14 |
+
"max_position_embeddings": 262144,
|
| 15 |
+
"max_window_layers": 94,
|
| 16 |
+
"mlp_only_layers": [],
|
| 17 |
+
"model_type": "qwen3_moe",
|
| 18 |
+
"moe_intermediate_size": 1536,
|
| 19 |
+
"norm_topk_prob": true,
|
| 20 |
+
"num_attention_heads": 64,
|
| 21 |
+
"num_experts": 128,
|
| 22 |
+
"num_experts_per_tok": 8,
|
| 23 |
+
"num_hidden_layers": 94,
|
| 24 |
+
"num_key_value_heads": 4,
|
| 25 |
+
"output_router_logits": false,
|
| 26 |
+
"pad_token_id": 151654,
|
| 27 |
+
"rms_norm_eps": 1e-06,
|
| 28 |
+
"rope_scaling": null,
|
| 29 |
+
"rope_theta": 5000000,
|
| 30 |
+
"router_aux_loss_coef": 0.001,
|
| 31 |
+
"sliding_window": null,
|
| 32 |
+
"tie_word_embeddings": false,
|
| 33 |
+
"torch_dtype": "bfloat16",
|
| 34 |
+
"transformers_version": "4.53.2",
|
| 35 |
+
"unsloth_fixed": true,
|
| 36 |
+
"use_cache": true,
|
| 37 |
+
"use_sliding_window": false,
|
| 38 |
+
"vocab_size": 151936
|
| 39 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 151643,
|
| 3 |
+
"do_sample": true,
|
| 4 |
+
"eos_token_id": [
|
| 5 |
+
151645,
|
| 6 |
+
151643
|
| 7 |
+
],
|
| 8 |
+
"pad_token_id": 151643,
|
| 9 |
+
"temperature": 0.7,
|
| 10 |
+
"top_k": 20,
|
| 11 |
+
"top_p": 0.8,
|
| 12 |
+
"transformers_version": "4.51.0"
|
| 13 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model-00001-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:447f8fbf68721a2d5cecffaa61a4526ba084b1c9de0283cca8984bc14fdbf3ab
|
| 3 |
+
size 3991955880
|
model-00002-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:72af83e8f0d43e54e33cd2714a977510cf9dab3c06d90500a7f23dc785a97cd9
|
| 3 |
+
size 3994081856
|
model-00003-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7f9ec818fea58d6598e8bc0973ca2fbdddb52d26588901005366a6617e138fb2
|
| 3 |
+
size 3994081856
|
model-00004-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:01e5c196c6f83ef050524ce9da26f22c026cefa4513b31cff06039e70a7bf40a
|
| 3 |
+
size 3988822448
|
model-00005-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c22789b4759a7935ca9dbb51dc44b9dd71656793fa9882c7320a43daee4d98a8
|
| 3 |
+
size 3994081784
|
model-00006-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2e563083b775a29aee5bcb867d4a422d9e72a3b410b2432d04776b9928b5a97b
|
| 3 |
+
size 3994081856
|
model-00007-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c76ad1a70056019ee0dfc8e51ce4fb6f0850abad3c230f305ca2bf67998cb829
|
| 3 |
+
size 3994081856
|
model-00008-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a459ca3ff449a4a75d4aa380cad0af895a92e1a063ec457f4dc27a8e92ae81cb
|
| 3 |
+
size 3994081856
|
model-00009-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1b16606cedcc0f7a0fc15629231a30a07b5637d47da90ca43681ee2057ec65b8
|
| 3 |
+
size 3988822456
|
model-00010-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bf7a8fbcf4ada6f67c351ae294ca34487bb7c2089571c058c46b31fce2101fc2
|
| 3 |
+
size 3994081784
|
model-00011-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7025ba6dd9370fe368ea782b5c28d03695d97bc2dd148c1f4e19059cdb213aab
|
| 3 |
+
size 3994081856
|
model-00012-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5bd59b7c8516fcfe29de641d9cc5fee5c1651504c0cbd251049145c04d6f6a07
|
| 3 |
+
size 3994081856
|
model-00013-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8e713c4dc7bbd3318181370e5541811264cf3695937e509e14d8ccaa910029cc
|
| 3 |
+
size 3994081944
|
model-00014-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b30e3a053e0d944f968a8f260b15f5bb2bd85ae6672f56aac30040d296cf8ba5
|
| 3 |
+
size 3988822776
|
model-00015-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:062424a25df6b24be8841882eb5d2abb957148cf09f2e9d29f865b73568834e7
|
| 3 |
+
size 3994082088
|
model-00016-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:df7b88e1c2d5ea7289af8a0880d239563f62b6ecb5e4dc0a34b20d5700a0bf48
|
| 3 |
+
size 3994082176
|
model-00017-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:410fd5557ce7eec0c40a15d09ad256680d65e28128ab0a180bd69049fbe05c2c
|
| 3 |
+
size 3994082176
|
model-00018-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ccfcd2bfce65451b4fabfac37aba72ec0d4ba9f16cf77705320091fc72de9a56
|
| 3 |
+
size 3994082176
|
model-00019-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:18cbd01dc37c03e793922128a23d44f04b80e1b5bd29c3eefbe068d8cc6155d8
|
| 3 |
+
size 3988822776
|
model-00020-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cd40c113b9e6870c9f941f0efae194fc224a689133106e7d53ad48526d11cc47
|
| 3 |
+
size 3994082088
|
model-00021-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:82bfeab2fd2d5a1e678a2d659ec0b8d0e1a231690e159ed7bd943b59a667ea33
|
| 3 |
+
size 3994082160
|
model-00022-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:40d92f95fc073f1d46ac9f089669417d1eb44e9e4046e91ec5213a10286a34ff
|
| 3 |
+
size 3994082176
|
model-00023-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b5a74d161c35daabd49d90c7e2d9a73f049ab51575a244ffa90fb869e0f31dee
|
| 3 |
+
size 3994082176
|
model-00024-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6d75e33a85b5bfd8de1ad5b61b0e54c5f04bed8b5782c79ee03fd22f8744abb3
|
| 3 |
+
size 3939555920
|
model-00025-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6d8acb27ac1bac598ba35f97831945dcf07557e0da30fdca8443259d7aacca42
|
| 3 |
+
size 3993016808
|
model-00026-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ea263fc399e9473aea8cc42adb4280bdc9cd4a763c2362a37e61040f149131bd
|
| 3 |
+
size 3994082160
|
model-00027-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5e4f3de86d634b02aeee58382dcff68f405abf4afbe0617c4a56bf070c9f3c7f
|
| 3 |
+
size 3994082176
|
model-00028-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fb1ea120c42c761084f05988c5955980bfaa5aafe3339ac0b141b09529aaa4d1
|
| 3 |
+
size 3994082176
|
model-00029-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:189ce7c5d4ca21450bc2964225b74180b8ad772c1e828fc45c0cfcc779287c60
|
| 3 |
+
size 3998276288
|
model-00030-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:03da01a0f239b5325e0fcf253813de2e9507faff036525df197c3634e8ea24f9
|
| 3 |
+
size 3997211632
|
model-00031-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09667e2b0f389c0481780da715b5951e873f717f0bf03da61341c6a2475611d7
|
| 3 |
+
size 3994082160
|
model-00032-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:799b6ed83c23e2706a3bcdfd700c7a2f552f70bacb891ce58cf9d109fedaf0a8
|
| 3 |
+
size 3994082176
|
model-00033-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:50e7c2b84d772958111cd115066605aa55db550365423cd57f61b9bb3bb88cd6
|
| 3 |
+
size 3994082176
|
model-00034-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:36a4ffe48ee7b4a5d8f2fbba2eebbedd3f61f5450b0a22cff03222b4b29f8550
|
| 3 |
+
size 3994082200
|
model-00035-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5b077a58a27872786ffff22bf0aeb58de25d372c3206bf7b214f1bdb60b45201
|
| 3 |
+
size 3988822680
|
model-00036-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:674872e3bb982c32268f4c237469f9dc93bac314d1de941729c0dbc6ec88a0e1
|
| 3 |
+
size 3994082152
|
model-00037-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a94df010405f7936b1576306496be65ecd4dce223a621a42d0331b79d65b7480
|
| 3 |
+
size 3994082176
|
model-00038-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:11104df709a30e6a35b70a781ba88e8411f58dac0d2be1615edb1f6e4c450c38
|
| 3 |
+
size 3994082176
|
model-00039-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c98c5bcbcbf22f7aefb99d9ae19d972ac55e9f247047b5b9435209a1ed18d3ed
|
| 3 |
+
size 3994082200
|
model-00040-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c315d4d6cfdfcabe880be7eaa3da9ccb43f1089a6ac142f3cfbba7b383cf822e
|
| 3 |
+
size 3988822696
|
model-00041-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ff05deb8ae1d9f2895d36231eb48cd715efef7be62b7fb14d9b35756e5742fb
|
| 3 |
+
size 3994082144
|
model-00042-of-00118.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d4dba05f7e8166b8d3077624686d07dd0ea7ed8746edaa29253eba8904842dc
|
| 3 |
+
size 3994082176
|