iGenius-AI-Team Federico D'Ambrosio federico d'ambrosio martin cimmino commited on
Commit
96389c0
·
0 Parent(s):

squash commits

Browse files

Co-authored-by: Federico D'Ambrosio <Federico D'Ambrosio@users.noreply.huggingface.co>
Co-authored-by: federico d'ambrosio <federico d'ambrosio@users.noreply.huggingface.co>
Co-authored-by: martin cimmino <martin cimmino@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,378 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ license_name: mit
5
+ license_link: https://www.domyn.com/legal/software-licenses/domyn-small
6
+ pipeline_tag: text-generation
7
+ language:
8
+ - en
9
+ - it
10
+ - es
11
+ - fr
12
+ - de
13
+ tags:
14
+ - reasoning
15
+ - dual-mode
16
+ - thinking
17
+ - tool-calling
18
+ - agentic
19
+ - multilingual
20
+ ---
21
+
22
+ <p align="center">
23
+ <picture>
24
+ <source media="(prefers-color-scheme: dark)" srcset="https://cdn.prod.website-files.com/682dcb35e7fd7a313cae803a/6835cb0fb6bd22a7cba7d645_domyn-logo-payoff-primary-white.svg">
25
+ <img alt="Domyn" src="https://cdn.prod.website-files.com/682dcb35e7fd7a313cae803a/6835ca86622d317630fa3861_domyn-logo-primary.svg" width="400">
26
+ </picture>
27
+ </p>
28
+
29
+ # Domyn Small
30
+
31
+ Domyn Small is a 10B-parameter open-weight reasoning model designed for resource-constrained, agentic, and fine-tunable deployments. It pairs a dual-mode (thinking on/off) inference design with grouped-query attention, a native 32k context window (extensible to 131k via YaRN), and tool calling. On reasoning benchmarks it reaches accuracy comparable to leading 7–10B reasoning peers while spending roughly **2–4× fewer reasoning tokens** — placing it on a favourable accuracy/cost Pareto frontier for production inference and downstream fine-tuning.
32
+
33
+ Fine-tune Domyn Small to your domain to unlock its real power and to retain full ownership and control over the resulting model.
34
+
35
+ ## Highlights
36
+
37
+ - **Token-efficient reasoning** — ~32% of Qwen3.5-9B's reasoning-token budget and ~35% of OLMo-3-7B-Think's at comparable accuracy on several reasoning tasks ([Token Efficiency](#token-efficiency)).
38
+ - **Dual-mode inference** — `thinking on` for deep multi-step reasoning, `thinking off` for fast, compact output. Toggleable from the system prompt or the API.
39
+ - **Tool calling** — first-class function calling via `<tool_call>` XML tags, with a chat template that handles tool injection automatically. Strong BFCL V3 single-turn results (75.9 Non-Live / 68.3 Live) at ~280 mean tokens per problem.
40
+ - **Expandable context** — 32,768 tokens natively, extensible to 131,072 (128k) via YaRN at inference time.
41
+ - **Multilingual** — 50+ languages with explicit coverage; optimised for English and the Tier-A European set (Italian, Spanish, French, German).
42
+
43
+ ## Model Overview
44
+
45
+ - **Developed by**: Domyn S.p.A.
46
+ - **Version**: 1.0
47
+ - **Released and last updated on**: May 2026
48
+ - **Input / Output**: Text-only / Text-only
49
+ - **Model size**: ~10B parameters
50
+ - **Attention**: Grouped-Query Attention (48 query heads, 8 KV heads)
51
+ - **Tokenizer**: 256,000-token SentencePiece BPE vocabulary
52
+ - **Native context**: 32,768 tokens
53
+ - **Extended context**: 131,072 tokens (YaRN, 4× at inference time)
54
+ - **Language(s)**: 50+ languages; optimised for English and the Tier-A European set (Italian, Spanish, French, German)
55
+ - **Base model**: Initialised from Italia 10B and continually pre-trained on 503B tokens
56
+ - **Knowledge cut-off date**: September 2024 (based on pre-training dataset cut-off)
57
+ - **License**: MIT
58
+
59
+ A full architecture and training-recipe specification is available in the Domyn Small technical report.
60
+
61
+ ## Quickstart
62
+
63
+ ```python
64
+ from openai import OpenAI
65
+
66
+ client = OpenAI(
67
+ base_url="http://<your-vllm-host>/v1",
68
+ api_key="none",
69
+ )
70
+
71
+ response = client.chat.completions.create(
72
+ model="domyn/Domyn-Small-v1.0",
73
+ messages=[
74
+ {"role": "system", "content": "You are Domyn Small, a helpful assistant."},
75
+ {"role": "user", "content": "What is the capital of Italy?"},
76
+ ],
77
+ )
78
+ print(response.choices[0].message.content)
79
+ ```
80
+
81
+ ## Deployment
82
+
83
+ > We recommend **vLLM ≥ 0.9.2** for all the snippets below.
84
+
85
+ ### vLLM — Basic
86
+
87
+ ```bash
88
+ vllm serve domyn/Domyn-Small-v1.0 \
89
+ --tensor-parallel-size 1 \
90
+ --dtype bfloat16 \
91
+ --max-model-len 32768 \
92
+ --max-num-seqs 256 \
93
+ --gpu-memory-utilization 0.9
94
+ ```
95
+
96
+ ### vLLM — With Reasoning Parsing
97
+
98
+ To have vLLM automatically extract the model's `<think>` blocks and expose them as a structured `reasoning_content` field, add `--reasoning-parser olmo3`. Domyn Small emits the identical `<think>…</think>` format as OLMo 3, so the OLMo 3 parser plugin works directly — no Domyn-specific parser is required.
99
+
100
+ ```bash
101
+ vllm serve domyn/Domyn-Small-v1.0 \
102
+ --tensor-parallel-size 1 \
103
+ --dtype bfloat16 \
104
+ --max-model-len 32768 \
105
+ --max-num-seqs 256 \
106
+ --gpu-memory-utilization 0.9 \
107
+ --reasoning-parser olmo3
108
+ ```
109
+
110
+ ### vLLM — Extended Context with YaRN
111
+
112
+ > YaRN scaling may impact model quality on inputs shorter than 32k. Enable it only when you actually need contexts beyond the native 32,768-token window.
113
+
114
+ ```bash
115
+ vllm serve domyn/Domyn-Small-v1.0 \
116
+ --tensor-parallel-size 1 \
117
+ --dtype bfloat16 \
118
+ # vLLM < 0.12.0
119
+ --rope-scaling '{"rope_type": "yarn", "factor": 4, "original_max_position_embeddings": 32768}' \
120
+ # vLLM >= 0.12.0
121
+ --hf-overrides '{"rope_parameters": {"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768}}' \
122
+ --max-model-len 131072
123
+ ```
124
+
125
+ ### vLLM — With Tool Calling
126
+
127
+ Tool calling requires three extra flags and the bundled plugin files (shipped with this model checkpoint):
128
+
129
+ ```bash
130
+ vllm serve domyn/Domyn-Small-v1.0 \
131
+ --tensor-parallel-size 1 \
132
+ --dtype bfloat16 \
133
+ --max-model-len 32768 \
134
+ --max-num-seqs 256 \
135
+ --gpu-memory-utilization 0.9 \
136
+ --enable-auto-tool-choice \
137
+ --tool-call-parser xml_tool_call \
138
+ --tool-parser-plugin /path/to/tool_parser_plugin.py \
139
+ --chat-template /path/to/chat_template.jinja
140
+ ```
141
+
142
+ Replace `/path/to/` with the actual paths to the files bundled with the checkpoint.
143
+
144
+ ### Transformers
145
+
146
+ ```python
147
+ import torch
148
+ from transformers import AutoModelForCausalLM, AutoTokenizer
149
+
150
+ model_id = "domyn/Domyn-Small-v1.0"
151
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
152
+ model = AutoModelForCausalLM.from_pretrained(
153
+ model_id, dtype=torch.bfloat16, device_map="auto"
154
+ )
155
+
156
+ messages = [
157
+ {
158
+ "role": "system",
159
+ "content": "You are Domyn Small, a helpful assistant. thinking on",
160
+ },
161
+ {"role": "user", "content": "Solve step by step: what is 17 × 24?"},
162
+ ]
163
+
164
+ inputs = tokenizer.apply_chat_template(
165
+ messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
166
+ )
167
+
168
+ outputs = model.generate(**inputs, max_new_tokens=128)
169
+ print(
170
+ tokenizer.decode(
171
+ outputs[0][inputs["input_ids"].shape[-1] :], skip_special_tokens=True
172
+ )
173
+ )
174
+ ```
175
+
176
+ ## Thinking Mode
177
+
178
+ Domyn Small supports chain-of-thought reasoning controlled by a directive in the system prompt:
179
+
180
+ - **Thinking off** (default): omit the directive, or include `thinking off`.
181
+ - **Thinking on**: append `thinking on` to your system prompt.
182
+
183
+ ```python
184
+ messages = [
185
+ {"role": "system", "content": "You are Domyn Small, a helpful assistant. thinking on"},
186
+ {"role": "user", "content": "Solve step by step: what is 17 × 24?"},
187
+ ]
188
+ ```
189
+
190
+ When thinking is on, the model emits its reasoning inside `<think>…</think>` tags before the final answer.
191
+
192
+ Alternatively, you can control reasoning by passing `enable_thinking` as an extra request parameter. This has the same effect as adding `thinking on` to the system prompt. Because `enable_thinking` is not part of the standard OpenAI schema, it must be forwarded to vLLM via the OpenAI client's `extra_body` field:
193
+
194
+ ```python
195
+ response = client.chat.completions.create(
196
+ model="domyn/Domyn-Small-v1.0",
197
+ messages=[
198
+ {"role": "user", "content": "Solve step by step: what is 17 × 24?"},
199
+ ],
200
+ extra_body={"chat_template_kwargs": {"enable_thinking": True}},
201
+ )
202
+ ```
203
+
204
+ ### Recommended Sampling Parameters
205
+
206
+ | Mode | temperature | top_p | top_k | min_p |
207
+ |------|-------------|-------|-------|-------|
208
+ | Thinking **off** | 0.1 | 0.95 | 50 | 0.1 |
209
+ | Thinking **on** | 0.6 | 0.90 | 25 | 0.1 |
210
+
211
+ > Do **not** use greedy decoding in thinking mode — it degrades reasoning quality and may cause repetition.
212
+
213
+ ## Tool Calling
214
+
215
+ ### How It Works
216
+
217
+ Domyn Small has been trained to call functions using `<tool_call>` XML tags. The chat template handles tool formatting automatically: **you do not need to write tool instructions in your system prompt.**
218
+
219
+ When you pass a `tools` list to the API, the chat template prepends a structured tool-instruction block to the system prompt automatically. Your own system message (for persona or context) is appended after that block. The final rendered system block looks like:
220
+
221
+ ```
222
+ <auto-generated tool instruction containing the tools JSON>
223
+ <your system message>
224
+ thinking on/off
225
+ ```
226
+
227
+ This means your system prompt stays clean — just describe the assistant's persona or context.
228
+
229
+ ### Python Example
230
+
231
+ ```python
232
+ from openai import OpenAI
233
+
234
+ client = OpenAI(
235
+ base_url="http://<your-vllm-host>/v1",
236
+ api_key="none",
237
+ )
238
+
239
+ tools = [
240
+ {
241
+ "type": "function",
242
+ "function": {
243
+ "name": "get_weather_forecast",
244
+ "description": "Get the weather forecast for a location on a given date.",
245
+ "parameters": {
246
+ "type": "object",
247
+ "properties": {
248
+ "location": {"type": "string", "description": "City name"},
249
+ "date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
250
+ },
251
+ "required": ["location", "date"],
252
+ },
253
+ },
254
+ }
255
+ ]
256
+
257
+ response = client.chat.completions.create(
258
+ model="domyn/Domyn-Small-v1.0",
259
+ messages=[
260
+ {"role": "system", "content": "You are Domyn Small, a helpful assistant."},
261
+ {"role": "user", "content": "What's the weather like in Rome today?"},
262
+ ],
263
+ tools=tools,
264
+ temperature=0.0,
265
+ )
266
+
267
+ choice = response.choices[0]
268
+ if choice.finish_reason == "tool_calls":
269
+ for tc in choice.message.tool_calls:
270
+ print(f"Function: {tc.function.name}")
271
+ print(f"Arguments: {tc.function.arguments}")
272
+ ```
273
+
274
+ ## Evaluations
275
+
276
+ Domyn Small is evaluated against four peer models in the 7–10B parameter class: **Qwen3.5-9B**, **OLMo-3-7B-Think**, **Llama-3.1-Nemotron-Nano-8B-v1**, and **Ministral-3-8B-Reasoning**. All scores are in thinking-on mode at 32,768-token sequence length (RULER extends to 131,072 via YaRN).
277
+
278
+ | Category | Benchmark | Domyn Small | Qwen3.5-9B | OLMo-3-7B-Think | Nemotron-Nano | Ministral-3-8B |
279
+ |---|---|---|---|---|---|---|
280
+ | Reasoning | MATH-500 | **93.2** | 97.4 | 96.8 | 95.4 | 89.2 |
281
+ | | AIME 2025 (avg@48) | **35.7** | 90.0 | 70.4 | 51.2 | 32.3 |
282
+ | | GPQA-Diamond | **50.0** | 82.7 | 50.8 | 42.4 | 43.9 |
283
+ | Code | HumanEval (pass@1) | **96.3** | 93.3 | 95.7 | 91.5 | 86.6 |
284
+ | | LiveCodeBench (pass@1)| **55.0** | 86.2 | 74.8 | 67.2 | 46.0 |
285
+ | | MBPP (pass@1) | **76.8** | 76.8 | 86.6 | 77.6 | 66.6 |
286
+ | General Knowledge | MMLU | **80.3** | 84.6 | 75.2 | 56.0 | 75.3 |
287
+ | | MMLU-PRO | **67.7** | 84.4 | 64.0 | 28.8 | 62.0 |
288
+ | Instruction | IFEval (strict) | **79.9** | 91.0 | 83.7 | 70.4 | 62.5 |
289
+ | Multilingual | MGSM | **73.1** | 88.9 | 64.0 | 19.9 | 75.5 |
290
+ | Long context | RULER 32k | **59.5** | 89.8 | 69.8 | 34.0 | 88.7 |
291
+ | | RULER 64k | **29.6** | 87.9 | 17.2 | 18.7 | 85.9 |
292
+ | Tool calling | BFCL V3 Non-Live | **75.9** | 78.1 | 61.1 | 63.3 | — |
293
+ | | BFCL V3 Live | **68.3** | 78.4 | 66.9 | 40.2 | — |
294
+ | | BFCL V3 Multi-Turn | **7.0** | 50.6 | 2.1 | 0.1 | — |
295
+
296
+ Domyn Small attains its single-turn BFCL results at ~280 mean tokens per problem against ~590 for Qwen3.5-9B and ~2,429 for OLMo-3-7B-Think — the best accuracy-per-token tool-calling profile in the peer set among models that fully engage the reasoning path. Ministral-3-8B is excluded from the BFCL comparison: during evaluation it consistently failed to close the `[/THINK]` reasoning delimiter, making its structured outputs unparseable by the benchmark.
297
+
298
+ ## Token Efficiency
299
+
300
+ The table below compares mean generated tokens per problem (thinking on, lower is better) against the strongest accuracy peer in the set, Qwen3.5-9B. Grand means weight each benchmark by its problem count.
301
+
302
+ | Category | Benchmark | Domyn Small | Qwen3.5-9B |
303
+ |---|---|---|---|
304
+ | Reasoning | MATH-500 | **2,261** | 7,614 |
305
+ | | AIME 2025 | **5,190** | 18,668 |
306
+ | | GPQA-Diamond | **3,396** | 8,976 |
307
+ | | **Grand mean** | **2,690** | 8,440 |
308
+ | Code | HumanEval | **1,884** | 1,144 |
309
+ | | LCB-Gen | **5,010** | 12,739 |
310
+ | | MBPP | **2,420** | 1,927 |
311
+ | | **Grand mean** | **3,312** | 5,870 |
312
+ | General Knowledge | MMLU | **1,236** | 3,262 |
313
+ | | MMLU-PRO | **2,947** | 4,666 |
314
+ | | **Grand mean** | **2,026** | 3,910 |
315
+ | Instruction | IFEval | **775** | 3,874 |
316
+ | Multilingual | MGSM | **796** | 3,140 |
317
+
318
+ On the reasoning suite Domyn Small produces approximately **32% of Qwen3.5-9B's** token budget — a 3.1× saving at comparable accuracy on several benchmarks.
319
+
320
+ ## Dual-Mode Comparison (Thinking ON vs. OFF)
321
+
322
+ Effect of the reasoning toggle on Domyn Small. Same evaluation harness; thinking-on AIME 2025 is reported as avg@48, other thinking-on entries are single-pass.
323
+
324
+ | Benchmark | Thinking off | Thinking on | Δ |
325
+ |---|---|---|---|
326
+ | MATH-500 | 91.4 | 93.2 | +1.8 |
327
+ | AIME 2025 | 31.0 | 35.7 | +4.7 |
328
+ | LiveCodeBench | 33.8 | 55.0 | +21.2 |
329
+ | MBPP | 54.6 | 76.8 | +22.2 |
330
+ | HumanEval | 69.5 | 96.3 | +26.8 |
331
+ | GPQA-Diamond | 40.0 | 50.0 | +10.0 |
332
+ | MMLU-PRO | 60.0 | 67.7 | +7.7 |
333
+ | MGSM | 59.7 | 73.1 | +13.4 |
334
+ | IFEval (prompt strict) | 78.6 | 79.9 | +1.3 |
335
+
336
+ The toggle helps most when the bottleneck is multi-step search or program synthesis (code, science reasoning, multilingual math); it helps least when the bottleneck is recall or format compliance.
337
+
338
+ ## Intended Uses
339
+
340
+ ### Primary Use Cases
341
+
342
+ Domyn Small is intended for commercial and research use in multiple languages:
343
+
344
+ - Regulated-industry use cases in **resource-constrained environments** that need reduced computational cost and faster response times in production.
345
+ - **Fine-tuning to any desired domain knowledge** across industries, to equip the model with the context and expertise needed to excel on real-world applications.
346
+ - **Agentic applications**, especially agents that need to solve coding and mathematical problems and perform sequential, tool-calling tasks.
347
+
348
+ ### Out-of-Scope Use Cases
349
+
350
+ Domyn Small is not specifically designed or evaluated for all downstream purposes. As with any language model, developers should carefully evaluate accuracy, safety, and fairness before applying it to specific downstream scenarios, particularly high-risk ones. Developers should also ensure compliance with all applicable laws and regulations (including, but not limited to, privacy and trade compliance) relevant to their use case.
351
+
352
+
353
+ ## EU AI Act Compliance
354
+
355
+ Domyn Small is released as a general-purpose AI (GPAI) model under the EU AI Act. Article 53 transparency obligations are discharged via this model card, the Domyn Small technical report (architecture, training data composition, training stages, evaluations, and known limitations end-to-end), and the MIT-licensed open-weights release. The training-data summary required by Article 53(1)(d) is provided as a companion artefact to the model release.
356
+
357
+ To uphold data-subject rights and comply with the AI Act and EU copyright framework, we operate an opt-out procedure for rights holders. Anyone who believes their copyrighted material was inadvertently included in our training corpora can contact `copyright@domyn.com`, and we will exclude the affected data from subsequent model iterations.
358
+
359
+ ## Citation
360
+
361
+ If you find this work valuable, please consider citing it:
362
+
363
+ ```bibtex
364
+ @misc{domynsmall2026,
365
+ title = {Domyn Small},
366
+ author = {Domyn S.p.A.},
367
+ year = {2026},
368
+ eprint = {TBD},
369
+ note = {Technical report, forthcoming},
370
+ }
371
+ ```
372
+
373
+ ## Contacts
374
+
375
+ - For general inquiries about Domyn Small, please contact: `models@domyn.com`
376
+ - For copyright-related complaints, please contact: `copyright@domyn.com`
377
+
378
+ *Affected rightsholders and their authorised representatives, including collective management organisations, may submit sufficiently precise and adequately substantiated complaints electronically concerning any non-compliance with our commitments under the Copyright Chapter of the GPAI Code of Practice. We commit to handling such complaints diligently, impartially, and within a reasonable timeframe, except in cases where the complaint is manifestly unfounded or has already been addressed. This mechanism complements, but does not limit, the available legal measures, remedies, and sanctions under Union and national copyright law.*
chat_template.jinja ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#- This template extends the base model template to support tool calling.
2
+ Tools are injected into the system prompt using <tools> XML tags.
3
+ Tool calls use <tool_call> XML tags, tool responses use <tool_response> tags.
4
+
5
+ Handles all combinations:
6
+ 1. tools + user system message → single system block: tool instructions + user content
7
+ 2. tools + NO system message → single synthetic system block with tool instructions
8
+ 3. no tools + user system message → single system block with user content
9
+ 4. no tools + no system message → minimal default system block
10
+
11
+ Thinking mode: if any system message contains "thinking on", all system
12
+ blocks end with "thinking on" and the generation prompt emits an open <think>.
13
+ Otherwise "thinking off" is used and <think></think> is immediately closed.
14
+
15
+ "thinking on" / "thinking off" is stripped from user system content and
16
+ re-appended as the final line of the system block to keep it consistent.
17
+ -#}
18
+ {%- set loop_messages = messages %}
19
+ {%- set ns = namespace(thinking=false, has_tools=false, has_system=false, system_emitted=false) %}
20
+
21
+ {#- ===== Method A: Check kwarg ===== -#}
22
+ {%- if enable_thinking is defined and enable_thinking %}
23
+ {%- set ns.thinking = true %}
24
+ {%- endif %}
25
+
26
+ {#- ===== PASS 1: Scan messages to detect thinking mode and system presence ===== -#}
27
+ {%- for message in loop_messages %}
28
+ {%- if message['role'] == 'system' %}
29
+ {%- set ns.has_system = true %}
30
+ {%- if 'thinking on' in message['content'] %}
31
+ {%- set ns.thinking = true %}
32
+ {%- endif %}
33
+ {%- endif %}
34
+ {%- endfor %}
35
+
36
+ {#- ===== Build tool instruction block if tools provided ===== -#}
37
+ {%- if tools is defined and tools is not none and tools | length > 0 %}
38
+ {%- set ns.has_tools = true %}
39
+ {%- set tool_instruction %}
40
+ You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.
41
+ You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
42
+ Here are the available tools:
43
+ <tools>
44
+ {{ tools | tojson }}
45
+ </tools>
46
+ For each function call, return a JSON object with function name and arguments within <tool_call></tool_call> XML tags as follows:
47
+ <tool_call>
48
+ {"name": <function-name>, "arguments": <args-dict>}
49
+ </tool_call>
50
+ {%- endset %}
51
+ {%- endif %}
52
+
53
+ {#- ===== Thinking mode suffix — appended to every system block ===== -#}
54
+ {%- set thinking_suffix = "thinking on" if ns.thinking else "thinking off" %}
55
+
56
+ {#- ===== SYNTHETIC SYSTEM BLOCK (only if NO system messages in conversation) ===== -#}
57
+ {%- if not ns.has_system %}
58
+ {%- if ns.has_tools %}
59
+ {{- '<extra_id_0>System\n' }}
60
+ {{- tool_instruction + '\n\n' }}
61
+ {{- thinking_suffix + '\n' }}
62
+ {%- else %}
63
+ {{- '<extra_id_0>System\n' }}
64
+ {{- thinking_suffix + '\n' }}
65
+ {%- endif %}
66
+ {%- set ns.system_emitted = true %}
67
+ {%- endif %}
68
+
69
+ {#- ===== PASS 2: Render messages ===== -#}
70
+ {%- for message in loop_messages %}
71
+
72
+ {#- ---- SYSTEM MESSAGE ---- -#}
73
+ {%- if message['role'] == 'system' %}
74
+ {#- Strip thinking directives from the content — we handle them via thinking_suffix -#}
75
+ {%- set clean_content = message['content'] | replace('thinking on', '') | replace('thinking off', '') | trim %}
76
+ {{- '<extra_id_0>System\n' }}
77
+ {%- if ns.has_tools %}
78
+ {{- tool_instruction + '\n' }}
79
+ {%- endif %}
80
+ {%- if clean_content %}
81
+ {{- clean_content + '\n' }}
82
+ {%- endif %}
83
+ {{- thinking_suffix + '\n' }}
84
+ {%- set ns.system_emitted = true %}
85
+
86
+ {#- ---- USER MESSAGE ---- -#}
87
+ {%- elif message['role'] == 'user' %}
88
+ {{- '<extra_id_1>User\n' }}
89
+ {{- message['content'] + '\n' }}
90
+
91
+ {#- ---- ASSISTANT MESSAGE ---- -#}
92
+ {%- elif message['role'] == 'assistant' %}
93
+ {%- if message.tool_calls is defined and message.tool_calls | length > 0 %}
94
+ {{- '<extra_id_1>Assistant\n' }}
95
+ {%- for tool_call in message.tool_calls %}
96
+ {{- '<tool_call>\n' }}
97
+ {{- {"name": tool_call.function.name, "arguments": tool_call.function.arguments} | tojson + '\n' }}
98
+ {{- '</tool_call>\n' }}
99
+ {%- endfor %}
100
+ {%- else %}
101
+ {{- '<extra_id_1>Assistant\n' }}
102
+ {{- message['content'] + '\n' }}
103
+ {%- endif %}
104
+
105
+ {#- ---- TOOL RESPONSE MESSAGE ---- -#}
106
+ {%- elif message['role'] == 'tool' %}
107
+ {{- '<extra_id_1>User\n' }}
108
+ {{- '<tool_response>\n' }}
109
+ {{- message['content'] + '\n' }}
110
+ {{- '</tool_response>\n' }}
111
+
112
+ {%- endif %}
113
+ {%- endfor %}
114
+
115
+ {#- ===== Generation prompt ===== -#}
116
+ {%- if add_generation_prompt %}
117
+ {{- '<extra_id_1>Assistant\n' }}
118
+ {%- if ns.thinking %}
119
+ {{- '<think>\n' }}
120
+ {%- else %}
121
+ {{- '<think>\n</think>\n\n' }}
122
+ {%- endif %}
123
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NemotronForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 2,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 5,
10
+ "head_dim": 128,
11
+ "hidden_act": "relu2",
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.0134,
14
+ "intermediate_size": 16384,
15
+ "max_position_embeddings": 32768,
16
+ "mlp_bias": false,
17
+ "model_type": "nemotron",
18
+ "nemo_version": "0.2.0",
19
+ "norm_eps": 1e-05,
20
+ "num_attention_heads": 48,
21
+ "num_hidden_layers": 40,
22
+ "num_key_value_heads": 8,
23
+ "pad_token_id": 0,
24
+ "partial_rotary_factor": 0.5,
25
+ "rope_theta": 500000,
26
+ "tie_word_embeddings": false,
27
+ "transformers_version": "4.57.1",
28
+ "use_cache": false,
29
+ "vocab_size": 256000
30
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "eos_token_id": [
5
+ 5
6
+ ],
7
+ "pad_token_id": 0,
8
+ "transformers_version": "4.57.1"
9
+ }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f2f368aab9b9d6a047a1f1f7d19e4b7e8531edda3690b2eda63c047afd9108c
3
+ size 4915962352
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bb6e10249871a9647a138170b75f788c7d65fd01e8a3a25945cd10e4ec2a02e
3
+ size 4966496880
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08a15f8b372b3e05f01a21127912407b712088ab192eda97eecf791189bead22
3
+ size 4949719448
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6156ef734c8705b9aa73693cd4179eba0d315c7f96c4ffa8077450d323dce28
3
+ size 4798537992
model.safetensors.index.json ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 9815334912,
4
+ "total_size": 19630669824
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00004-of-00004.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
9
+ "model.layers.0.input_layernorm.bias": "model-00001-of-00004.safetensors",
10
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
11
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
13
+ "model.layers.0.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
14
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
15
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
18
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
19
+ "model.layers.1.input_layernorm.bias": "model-00001-of-00004.safetensors",
20
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
21
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
22
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
23
+ "model.layers.1.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
24
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
25
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
26
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
27
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
28
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
29
+ "model.layers.10.input_layernorm.bias": "model-00002-of-00004.safetensors",
30
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
31
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
32
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
33
+ "model.layers.10.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
34
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
35
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
36
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
37
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
38
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.layers.11.input_layernorm.bias": "model-00002-of-00004.safetensors",
40
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
41
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
43
+ "model.layers.11.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
44
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
45
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
46
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
47
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
48
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
49
+ "model.layers.12.input_layernorm.bias": "model-00002-of-00004.safetensors",
50
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
51
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
53
+ "model.layers.12.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
54
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
55
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
57
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
58
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
59
+ "model.layers.13.input_layernorm.bias": "model-00002-of-00004.safetensors",
60
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
61
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
62
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
63
+ "model.layers.13.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
64
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
65
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
67
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
68
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
69
+ "model.layers.14.input_layernorm.bias": "model-00002-of-00004.safetensors",
70
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
71
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
72
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
73
+ "model.layers.14.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
74
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
75
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
76
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
77
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
79
+ "model.layers.15.input_layernorm.bias": "model-00002-of-00004.safetensors",
80
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
81
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
82
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.layers.15.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
84
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
85
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
86
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
87
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
89
+ "model.layers.16.input_layernorm.bias": "model-00002-of-00004.safetensors",
90
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
91
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
92
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
93
+ "model.layers.16.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
94
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
95
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
97
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
98
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
99
+ "model.layers.17.input_layernorm.bias": "model-00002-of-00004.safetensors",
100
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
101
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
102
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
103
+ "model.layers.17.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
104
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
105
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
106
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
107
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
108
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
109
+ "model.layers.18.input_layernorm.bias": "model-00002-of-00004.safetensors",
110
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
111
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
112
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
113
+ "model.layers.18.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
114
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
115
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
117
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
118
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
119
+ "model.layers.19.input_layernorm.bias": "model-00002-of-00004.safetensors",
120
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
121
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
122
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
123
+ "model.layers.19.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
124
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
125
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
126
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
127
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
128
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
129
+ "model.layers.2.input_layernorm.bias": "model-00001-of-00004.safetensors",
130
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
131
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
132
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
133
+ "model.layers.2.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
134
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
135
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
136
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
137
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
138
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
139
+ "model.layers.20.input_layernorm.bias": "model-00003-of-00004.safetensors",
140
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
141
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
142
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
143
+ "model.layers.20.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
144
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
145
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
146
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
147
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
148
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
149
+ "model.layers.21.input_layernorm.bias": "model-00003-of-00004.safetensors",
150
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
151
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
152
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
153
+ "model.layers.21.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
154
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
155
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
156
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
157
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
158
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
159
+ "model.layers.22.input_layernorm.bias": "model-00003-of-00004.safetensors",
160
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
161
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
162
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
163
+ "model.layers.22.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
164
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
165
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
166
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
167
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
168
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
169
+ "model.layers.23.input_layernorm.bias": "model-00003-of-00004.safetensors",
170
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
171
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
172
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
173
+ "model.layers.23.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
174
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
175
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
176
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
177
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
179
+ "model.layers.24.input_layernorm.bias": "model-00003-of-00004.safetensors",
180
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
181
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
182
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
183
+ "model.layers.24.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
184
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
185
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
186
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
187
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
189
+ "model.layers.25.input_layernorm.bias": "model-00003-of-00004.safetensors",
190
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
191
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
193
+ "model.layers.25.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
194
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
195
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
197
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
198
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
199
+ "model.layers.26.input_layernorm.bias": "model-00003-of-00004.safetensors",
200
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
201
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
202
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
203
+ "model.layers.26.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
204
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
205
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
206
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
207
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
208
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
209
+ "model.layers.27.input_layernorm.bias": "model-00003-of-00004.safetensors",
210
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
211
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
212
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
213
+ "model.layers.27.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
214
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
215
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
216
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
217
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
218
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
219
+ "model.layers.28.input_layernorm.bias": "model-00003-of-00004.safetensors",
220
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
221
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
222
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
223
+ "model.layers.28.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
224
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
225
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
226
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
227
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
229
+ "model.layers.29.input_layernorm.bias": "model-00003-of-00004.safetensors",
230
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
231
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
233
+ "model.layers.29.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
234
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
235
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
236
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
237
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
238
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
239
+ "model.layers.3.input_layernorm.bias": "model-00001-of-00004.safetensors",
240
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
241
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
242
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
243
+ "model.layers.3.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
244
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
245
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
246
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
247
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
248
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
249
+ "model.layers.30.input_layernorm.bias": "model-00003-of-00004.safetensors",
250
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
251
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
252
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
253
+ "model.layers.30.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
254
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
255
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
256
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
257
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
258
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
259
+ "model.layers.31.input_layernorm.bias": "model-00003-of-00004.safetensors",
260
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
261
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
262
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
263
+ "model.layers.31.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
264
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
265
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
266
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
267
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
268
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
269
+ "model.layers.32.input_layernorm.bias": "model-00003-of-00004.safetensors",
270
+ "model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
271
+ "model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
272
+ "model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
273
+ "model.layers.32.post_attention_layernorm.bias": "model-00003-of-00004.safetensors",
274
+ "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
275
+ "model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
276
+ "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
277
+ "model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
278
+ "model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
279
+ "model.layers.33.input_layernorm.bias": "model-00004-of-00004.safetensors",
280
+ "model.layers.33.input_layernorm.weight": "model-00004-of-00004.safetensors",
281
+ "model.layers.33.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
282
+ "model.layers.33.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
283
+ "model.layers.33.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
284
+ "model.layers.33.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
285
+ "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
286
+ "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
287
+ "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
288
+ "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
289
+ "model.layers.34.input_layernorm.bias": "model-00004-of-00004.safetensors",
290
+ "model.layers.34.input_layernorm.weight": "model-00004-of-00004.safetensors",
291
+ "model.layers.34.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
292
+ "model.layers.34.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
293
+ "model.layers.34.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
294
+ "model.layers.34.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
295
+ "model.layers.34.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
296
+ "model.layers.34.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
297
+ "model.layers.34.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
298
+ "model.layers.34.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
299
+ "model.layers.35.input_layernorm.bias": "model-00004-of-00004.safetensors",
300
+ "model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
301
+ "model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
302
+ "model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
303
+ "model.layers.35.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
304
+ "model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
305
+ "model.layers.35.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
306
+ "model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
307
+ "model.layers.35.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
308
+ "model.layers.35.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
309
+ "model.layers.36.input_layernorm.bias": "model-00004-of-00004.safetensors",
310
+ "model.layers.36.input_layernorm.weight": "model-00004-of-00004.safetensors",
311
+ "model.layers.36.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
312
+ "model.layers.36.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
313
+ "model.layers.36.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
314
+ "model.layers.36.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
315
+ "model.layers.36.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
316
+ "model.layers.36.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
317
+ "model.layers.36.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
318
+ "model.layers.36.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
319
+ "model.layers.37.input_layernorm.bias": "model-00004-of-00004.safetensors",
320
+ "model.layers.37.input_layernorm.weight": "model-00004-of-00004.safetensors",
321
+ "model.layers.37.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
322
+ "model.layers.37.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
323
+ "model.layers.37.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
324
+ "model.layers.37.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
325
+ "model.layers.37.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
326
+ "model.layers.37.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
327
+ "model.layers.37.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
328
+ "model.layers.37.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
329
+ "model.layers.38.input_layernorm.bias": "model-00004-of-00004.safetensors",
330
+ "model.layers.38.input_layernorm.weight": "model-00004-of-00004.safetensors",
331
+ "model.layers.38.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
332
+ "model.layers.38.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
333
+ "model.layers.38.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
334
+ "model.layers.38.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
335
+ "model.layers.38.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
336
+ "model.layers.38.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
337
+ "model.layers.38.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
338
+ "model.layers.38.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
339
+ "model.layers.39.input_layernorm.bias": "model-00004-of-00004.safetensors",
340
+ "model.layers.39.input_layernorm.weight": "model-00004-of-00004.safetensors",
341
+ "model.layers.39.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
342
+ "model.layers.39.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
343
+ "model.layers.39.post_attention_layernorm.bias": "model-00004-of-00004.safetensors",
344
+ "model.layers.39.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
345
+ "model.layers.39.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
346
+ "model.layers.39.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
347
+ "model.layers.39.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
348
+ "model.layers.39.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
349
+ "model.layers.4.input_layernorm.bias": "model-00001-of-00004.safetensors",
350
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
351
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
352
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
353
+ "model.layers.4.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
354
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
355
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
356
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
357
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
358
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
359
+ "model.layers.5.input_layernorm.bias": "model-00001-of-00004.safetensors",
360
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
361
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
362
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
363
+ "model.layers.5.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
364
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
365
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
366
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
367
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
368
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
369
+ "model.layers.6.input_layernorm.bias": "model-00001-of-00004.safetensors",
370
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
371
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
372
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
373
+ "model.layers.6.post_attention_layernorm.bias": "model-00001-of-00004.safetensors",
374
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
375
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
376
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
377
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
378
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
379
+ "model.layers.7.input_layernorm.bias": "model-00002-of-00004.safetensors",
380
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
381
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
382
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
383
+ "model.layers.7.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
384
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
385
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
386
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
387
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
388
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
389
+ "model.layers.8.input_layernorm.bias": "model-00002-of-00004.safetensors",
390
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
391
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
392
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
393
+ "model.layers.8.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
394
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
395
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
396
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
397
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
398
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
399
+ "model.layers.9.input_layernorm.bias": "model-00002-of-00004.safetensors",
400
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
401
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
402
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
403
+ "model.layers.9.post_attention_layernorm.bias": "model-00002-of-00004.safetensors",
404
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
405
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
406
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
407
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
408
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
409
+ "model.norm.bias": "model-00004-of-00004.safetensors",
410
+ "model.norm.weight": "model-00004-of-00004.safetensors"
411
+ }
412
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "<extra_id_1>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "pad_token": {
24
+ "content": "<pad>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "sep_token": {
31
+ "content": "<extra_id_1>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85c33485220c152141b14e438e9bf16141eb14ad4b17b6e9329ab35fc96d1137
3
+ size 34809687
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
tool_parser_plugin.py ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Custom vLLM tool parser plugin for models that use <tool_call> XML tags.
3
+
4
+ The model outputs tool calls in this format:
5
+ <tool_call>
6
+ {"name": "function_name", "arguments": {"arg1": "val1"}}
7
+ </tool_call>
8
+
9
+ Multiple tool calls can appear in a single response (parallel tool calling).
10
+
11
+ Usage:
12
+ vllm serve <model> \
13
+ --enable-auto-tool-choice \
14
+ --tool-parser-plugin /absolute/path/to/tool_parser_plugin.py \
15
+ --tool-call-parser xml_tool_call \
16
+ --chat-template /absolute/path/to/tool_chat_template.jinja
17
+ """
18
+
19
+ import ast
20
+ import json
21
+ import re
22
+ import uuid
23
+ from typing import Sequence, Union
24
+
25
+ # ---------------------------------------------------------------------------
26
+ # Import compatibility: vLLM >=0.8 moved tool_parsers to vllm.tool_parsers;
27
+ # older versions keep them under vllm.entrypoints.openai.tool_parsers.
28
+ # ---------------------------------------------------------------------------
29
+ try:
30
+ # Newer vLLM, roughly 0.15+
31
+ from vllm.entrypoints.openai.chat_completion.protocol import ChatCompletionRequest
32
+ from vllm.entrypoints.openai.engine.protocol import (
33
+ DeltaFunctionCall,
34
+ DeltaMessage,
35
+ DeltaToolCall,
36
+ ExtractedToolCallInformation,
37
+ FunctionCall,
38
+ ToolCall,
39
+ )
40
+ except ImportError:
41
+ # Older vLLM
42
+ from vllm.entrypoints.openai.protocol import (
43
+ ChatCompletionRequest,
44
+ DeltaFunctionCall,
45
+ DeltaMessage,
46
+ DeltaToolCall,
47
+ ExtractedToolCallInformation,
48
+ FunctionCall,
49
+ ToolCall,
50
+ )
51
+
52
+ try:
53
+ from vllm.tool_parsers.abstract_tool_parser import ToolParser, ToolParserManager
54
+ except ImportError:
55
+ from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
56
+ ToolParser,
57
+ ToolParserManager,
58
+ )
59
+
60
+ from vllm.logger import init_logger
61
+
62
+ logger = init_logger(__name__)
63
+
64
+
65
+ def _generate_tool_call_id() -> str:
66
+ """Generate a unique tool-call ID in the format expected by OpenAI."""
67
+ return f"call_{uuid.uuid4().hex[:24]}"
68
+
69
+
70
+ # ---------------------------------------------------------------------------
71
+ # Register the parser so it can be referenced via --tool-call-parser
72
+ # ---------------------------------------------------------------------------
73
+ @ToolParserManager.register_module(["xml_tool_call"])
74
+ class XMLToolCallParser(ToolParser):
75
+ """
76
+ Parses tool calls wrapped in <tool_call>...</tool_call> XML tags.
77
+
78
+ Handles both single and parallel (multiple) tool calls in one response.
79
+ Supports streaming and non-streaming extraction.
80
+ """
81
+
82
+ # Regex to match complete <tool_call>...</tool_call> blocks
83
+ TOOL_CALL_RE = re.compile(
84
+ r"<tool_call>\s*(.*?)\s*</tool_call>",
85
+ re.DOTALL,
86
+ )
87
+
88
+ # Regex that also matches an incomplete (still-streaming) block
89
+ TOOL_CALL_OPEN_RE = re.compile(
90
+ r"<tool_call>\s*(.*?)(?:</tool_call>|$)",
91
+ re.DOTALL,
92
+ )
93
+
94
+ TOOL_CALL_START = "<tool_call>"
95
+ TOOL_CALL_END = "</tool_call>"
96
+
97
+ def __init__(self, tokenizer, tools=None):
98
+ # vLLM newer versions: ToolParser.__init__(tokenizer, tools)
99
+ # vLLM older versions: ToolParser.__init__(tokenizer)
100
+ try:
101
+ super().__init__(tokenizer, tools)
102
+ except TypeError:
103
+ super().__init__(tokenizer)
104
+ self.tools = tools or []
105
+
106
+ # ---- streaming state ----
107
+ self.current_tool_id: int = -1
108
+ self.current_tool_name_sent: bool = False
109
+ self.prev_tool_call_arr: list[dict] = []
110
+ self.streamed_args_for_tool: list[str] = []
111
+
112
+ # ------------------------------------------------------------------
113
+ # Optional: adjust the request before inference
114
+ # ------------------------------------------------------------------
115
+ @staticmethod
116
+ def _parse_tool_json(raw: str) -> dict | None:
117
+ """Parse a tool call JSON block, handling Python-style single quotes."""
118
+ # Try standard JSON first
119
+ try:
120
+ return json.loads(raw)
121
+ except (json.JSONDecodeError, ValueError):
122
+ pass
123
+ # Fall back to ast.literal_eval for Python-style dicts with single quotes
124
+ try:
125
+ result = ast.literal_eval(raw)
126
+ if isinstance(result, dict):
127
+ return result
128
+ except (ValueError, SyntaxError):
129
+ pass
130
+ return None
131
+
132
+ def adjust_request(
133
+ self, request: ChatCompletionRequest
134
+ ) -> ChatCompletionRequest:
135
+ return request
136
+
137
+ # ------------------------------------------------------------------
138
+ # NON-STREAMING extraction
139
+ # ------------------------------------------------------------------
140
+ def extract_tool_calls(
141
+ self,
142
+ model_output: str,
143
+ request: ChatCompletionRequest,
144
+ ) -> ExtractedToolCallInformation:
145
+ """
146
+ Parse all <tool_call>...</tool_call> blocks from the full model
147
+ output and convert them to OpenAI ToolCall objects.
148
+ """
149
+
150
+ # Find all complete tool-call blocks
151
+ raw_matches = self.TOOL_CALL_RE.findall(model_output)
152
+
153
+ if not raw_matches:
154
+ # No tool calls found — return the text as-is
155
+ return ExtractedToolCallInformation(
156
+ tools_called=False,
157
+ tool_calls=[],
158
+ content=model_output,
159
+ )
160
+
161
+ tool_calls: list[ToolCall] = []
162
+ for raw_json in raw_matches:
163
+ parsed = self._parse_tool_json(raw_json)
164
+ if parsed is None:
165
+ logger.warning(
166
+ "Failed to parse tool call JSON: %s", raw_json
167
+ )
168
+ continue
169
+
170
+ fn_name = parsed.get("name", "")
171
+ fn_args = parsed.get("arguments", {})
172
+
173
+ # Ensure arguments is a JSON string (OpenAI format)
174
+ if isinstance(fn_args, dict):
175
+ fn_args_str = json.dumps(fn_args)
176
+ elif isinstance(fn_args, str):
177
+ # Model may emit arguments as a JSON string — validate and pass through
178
+ try:
179
+ json.loads(fn_args)
180
+ fn_args_str = fn_args
181
+ except (json.JSONDecodeError, ValueError):
182
+ # Try ast.literal_eval for Python-style dicts (e.g. single quotes,
183
+ # unquoted keys). If that also fails, emit an empty dict so
184
+ # downstream json.loads never sees an invalid string.
185
+ try:
186
+ recovered = ast.literal_eval(fn_args)
187
+ fn_args_str = json.dumps(recovered) if isinstance(recovered, dict) else json.dumps({})
188
+ except (ValueError, SyntaxError):
189
+ fn_args_str = "{}"
190
+ else:
191
+ fn_args_str = str(fn_args)
192
+
193
+ tool_calls.append(
194
+ ToolCall(
195
+ id=_generate_tool_call_id(),
196
+ type="function",
197
+ function=FunctionCall(
198
+ name=fn_name,
199
+ arguments=fn_args_str,
200
+ ),
201
+ )
202
+ )
203
+
204
+ # Strip tool-call blocks from content to get any surrounding text
205
+ remaining_content = self.TOOL_CALL_RE.sub("", model_output).strip()
206
+
207
+ return ExtractedToolCallInformation(
208
+ tools_called=True,
209
+ tool_calls=tool_calls,
210
+ content=remaining_content if remaining_content else None,
211
+ )
212
+
213
+ # ------------------------------------------------------------------
214
+ # STREAMING extraction
215
+ # ------------------------------------------------------------------
216
+ def extract_tool_calls_streaming(
217
+ self,
218
+ previous_text: str,
219
+ current_text: str,
220
+ delta_text: str,
221
+ previous_token_ids: Sequence[int],
222
+ current_token_ids: Sequence[int],
223
+ delta_token_ids: Sequence[int],
224
+ request: ChatCompletionRequest,
225
+ ) -> Union[DeltaMessage, None]:
226
+ """
227
+ Incrementally parse tool calls from the streaming token output.
228
+
229
+ Strategy:
230
+ - Before seeing <tool_call>, stream tokens as regular content.
231
+ - Once <tool_call> is detected, buffer until </tool_call>.
232
+ - On </tool_call>, emit the complete tool call delta.
233
+ - Support multiple sequential tool calls.
234
+ """
235
+
236
+ # If we haven't seen a tool_call opening tag yet, pass through as
237
+ # regular content (unless the start tag is partially forming).
238
+ if self.TOOL_CALL_START not in current_text:
239
+ # Check if the current text ends with a partial match of the
240
+ # start tag — if so, hold back to avoid emitting partial tags.
241
+ for i in range(1, len(self.TOOL_CALL_START)):
242
+ if current_text.endswith(self.TOOL_CALL_START[:i]):
243
+ # Possibly forming the start tag — hold delta
244
+ return None
245
+ return DeltaMessage(content=delta_text)
246
+
247
+ # ---- We are inside or past a <tool_call> block ----
248
+
249
+ # Find all *complete* tool call blocks so far
250
+ complete_matches = self.TOOL_CALL_RE.findall(current_text)
251
+ num_complete = len(complete_matches)
252
+
253
+ # Determine how many we've already streamed
254
+ num_already_sent = len(self.prev_tool_call_arr)
255
+
256
+ if num_complete > num_already_sent:
257
+ # A new tool call just completed — emit it
258
+ new_raw = complete_matches[num_already_sent]
259
+ parsed = self._parse_tool_json(new_raw)
260
+ if parsed is None:
261
+ logger.warning(
262
+ "Streaming: failed to parse tool call JSON: %s",
263
+ new_raw,
264
+ )
265
+ return None
266
+
267
+ fn_name = parsed.get("name", "")
268
+ fn_args = parsed.get("arguments", {})
269
+ if isinstance(fn_args, dict):
270
+ fn_args_str = json.dumps(fn_args)
271
+ elif isinstance(fn_args, str):
272
+ try:
273
+ json.loads(fn_args)
274
+ fn_args_str = fn_args
275
+ except (json.JSONDecodeError, ValueError):
276
+ try:
277
+ recovered = ast.literal_eval(fn_args)
278
+ fn_args_str = json.dumps(recovered) if isinstance(recovered, dict) else json.dumps({})
279
+ except (ValueError, SyntaxError):
280
+ fn_args_str = "{}"
281
+ else:
282
+ fn_args_str = str(fn_args)
283
+
284
+ self.current_tool_id += 1
285
+ self.prev_tool_call_arr.append(parsed)
286
+ self.streamed_args_for_tool.append(fn_args_str)
287
+ self.current_tool_name_sent = True
288
+
289
+ return DeltaMessage(
290
+ tool_calls=[
291
+ DeltaToolCall(
292
+ index=self.current_tool_id,
293
+ id=_generate_tool_call_id(),
294
+ type="function",
295
+ function=DeltaFunctionCall(
296
+ name=fn_name,
297
+ arguments=fn_args_str,
298
+ ),
299
+ )
300
+ ]
301
+ )
302
+
303
+ # If we're currently inside an incomplete tool call block,
304
+ # don't emit anything — wait for it to complete.
305
+ # Check if there's an open <tool_call> without a matching close
306
+ open_count = current_text.count(self.TOOL_CALL_START)
307
+ close_count = current_text.count(self.TOOL_CALL_END)
308
+ if open_count > close_count:
309
+ # Still buffering inside a tool call
310
+ return None
311
+
312
+ # If we're past all tool call blocks, stream remaining content
313
+ # (unlikely for most models but handles edge cases)
314
+ return None