Aremaki commited on
Commit
a916ad2
·
verified ·
1 Parent(s): 877658f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ text_to_code.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
2
+ Llama 3.1 Version Release Date: July 23, 2024
3
+
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
5
+ Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.1
8
+ distributed by Meta at https://llama.meta.com/doc/overview.
9
+
10
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
11
+ this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
12
+ regulations to provide legal consent and that has legal authority to bind your employer or such other
13
+ person or entity if you are entering in this Agreement on their behalf.
14
+
15
+ “Llama 3.1” means the foundational large language models and software and algorithms, including
16
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
17
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
18
+ https://llama.meta.com/llama-downloads.
19
+
20
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any
21
+ portion thereof) made available under this Agreement.
22
+
23
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
24
+ principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
25
+ outside of the EEA or Switzerland).
26
+
27
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
28
+ you agree to be bound by this Agreement.
29
+
30
+ 1. License Rights and Redistribution.
31
+
32
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
33
+ limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
34
+ Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
35
+ Llama Materials.
36
+
37
+ b. Redistribution and Use.
38
+
39
+ i. If you distribute or make available the Llama Materials (or any derivative works
40
+ thereof), or a product or service (including another AI model) that contains any of them, you shall (A)
41
+ provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with
42
+ Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use
43
+ the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
44
+ otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at
45
+ the beginning of any such AI model name.
46
+
47
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
48
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
49
+
50
+ iii. You must retain in all copies of the Llama Materials that you distribute the following
51
+ attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is
52
+ licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights
53
+ Reserved.”
54
+
55
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
56
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
57
+ Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by
58
+ reference into this Agreement.
59
+
60
+ 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users
61
+ of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
62
+ million monthly active users in the preceding calendar month, you must request a license from Meta,
63
+ which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
64
+ rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
65
+
66
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
67
+ OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
68
+ ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
69
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
70
+ MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
71
+ DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
72
+ ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
73
+ RESULTS.
74
+
75
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
76
+ LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
77
+ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
78
+ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
79
+ OF THE POSSIBILITY OF ANY OF THE FOREGOING.
80
+
81
+ 5. Intellectual Property.
82
+
83
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama
84
+ Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
85
+ or any of its affiliates, except as required for reasonable and customary use in describing and
86
+ redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
87
+ use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
88
+ comply with Meta’s brand guidelines (currently accessible at
89
+ https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
90
+ of the Mark will inure to the benefit of Meta.
91
+
92
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
93
+ respect to any derivative works and modifications of the Llama Materials that are made by you, as
94
+ between you and Meta, you are and will be the owner of such derivative works and modifications.
95
+
96
+ c. If you institute litigation or other proceedings against Meta or any entity (including a
97
+ cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or
98
+ results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
99
+ rights owned or licensable by you, then any licenses granted to you under this Agreement shall
100
+ terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
101
+ harmless Meta from and against any claim by any third party arising out of or related to your use or
102
+ distribution of the Llama Materials.
103
+
104
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
105
+ Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
106
+ accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
107
+ breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
108
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
109
+ Agreement.
110
+
111
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
112
+ the State of California without regard to choice of law principles, and the UN Convention on Contracts
113
+ for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
114
+ exclusive jurisdiction of any dispute arising out of this Agreement.
README.md ADDED
@@ -0,0 +1,348 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+
4
+ base_model:
5
+ - meta-llama/Llama-3.1-8B-Instruct
6
+
7
+ language:
8
+ - en
9
+
10
+ tags:
11
+ - biomedical-entity-linking
12
+ - entity-linking
13
+ - entity-disambiguation
14
+ - named-entity-linking
15
+ - biomedical
16
+ - healthcare
17
+ - umls
18
+ - medmentions
19
+ - text-generation
20
+ - constrained-decoding
21
+ - causal-lm
22
+ - llm
23
+
24
+ library_name: transformers
25
+ pipeline_tag: text-generation
26
+
27
+ datasets:
28
+ - AnonymousARR42/MedMentions
29
+
30
+ finetuning_task:
31
+ - entity-linking
32
+
33
+ metrics:
34
+ - recall
35
+
36
+ model-index:
37
+ - name: LongBEL-8B-MedMentions-ST21pv
38
+ results:
39
+ - task:
40
+ type: entity-linking
41
+ name: Biomedical Entity Linking
42
+ dataset:
43
+ type: AnonymousARR42/MedMentions
44
+ name: MedMentions-ST21pv
45
+ metrics:
46
+ - type: recall
47
+ name: Recall@1
48
+ value: 0.793
49
+ ---
50
+
51
+ # LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
52
+
53
+ ## LongBEL
54
+
55
+ **LongBEL** is a novel document-level framework for biomedical entity linking (BEL). Instead of normalizing each mention independently, LongBEL conditions each prediction on the document context and on previous normalizations produced in the same document. This design enforces document-level consistency and is enhanced by our **robust memory** mechanism. The method is introduced in our paper, currently under review.
56
+
57
+ ## LongBEL (MedMentions Edition)
58
+
59
+ This is a **finetuned version of LLaMA-3-8B** trained on **MedMentions**, applying the LongBEL framework to enable long context and robust memory predictions.
60
+
61
+ | Field | Value |
62
+ |---|---|
63
+ | Base model | `meta-llama/Llama-3.1-8B-Instruct` |
64
+ | Task | Biomedical Entity Linking |
65
+ | Dataset | MedMentions-ST21pv |
66
+ | Knowledge base | UMLS 2017AA, ST21pv subset |
67
+ | Input | BigBio-like documents with mention spans and semantic groups |
68
+ | Output | Ranked UMLS concept predictions |
69
+ | Decoding | Semantic-guided constrained decoding |
70
+ | Main metric | Recall@1 |
71
+
72
+
73
+ ## Intended Use
74
+
75
+ This model is intended for research on biomedical entity linking and document-level consistency.
76
+
77
+ It assumes that mention spans and semantic groups are already provided. It does **not** perform named entity recognition. In a full pipeline, a NER model should first detect mentions and assign semantic groups, then LongBEL can normalize these mentions to UMLS concepts.
78
+
79
+ ## Usage
80
+
81
+ ### Loading the model
82
+
83
+ ```python
84
+ import torch
85
+ from transformers import AutoModelForCausalLM
86
+
87
+ model = AutoModelForCausalLM.from_pretrained(
88
+ "AnonymousARR42/LongBEL_8B_MedMentions_st21pv",
89
+ trust_remote_code=True,
90
+ device_map="auto",
91
+ )
92
+ ````
93
+
94
+ ### Inference example
95
+
96
+ The model expects BigBio-like documents. Each entity should include a mention text, character offsets, and a semantic group in the `type` field.
97
+
98
+ ```python
99
+ num_beams = 5
100
+
101
+ bigbio_pages = [
102
+ {
103
+ "id": "001",
104
+ "document_id": "doc_001",
105
+ "passages": [
106
+ {
107
+ "id": "0",
108
+ "type": "paragraph",
109
+ "text": [
110
+ "A 29-year-old pregnant woman presented with severe-range hypertension, "
111
+ "headache, and epigastric pain. Laboratory testing showed proteinuria "
112
+ "and mildly elevated liver enzymes. She was admitted overnight with "
113
+ "suspected PET and was started on urgent treatment."
114
+ ],
115
+ "offsets": [[0, 257]],
116
+ }
117
+ ],
118
+ "entities": [
119
+ {
120
+ "id": "T1",
121
+ "type": "Living Beings",
122
+ "text": ["pregnant woman"],
123
+ "offsets": [[14, 28]],
124
+ },
125
+ {
126
+ "id": "T2",
127
+ "type": "Disorders",
128
+ "text": ["severe-range hypertension"],
129
+ "offsets": [[44, 69]],
130
+ },
131
+ {
132
+ "id": "T3",
133
+ "type": "Disorders",
134
+ "text": ["proteinuria"],
135
+ "offsets": [[128, 139]],
136
+ },
137
+ {
138
+ "id": "T4",
139
+ "type": "Disorders",
140
+ "text": ["PET"],
141
+ "offsets": [[217, 220]],
142
+ },
143
+ ],
144
+ "events": [],
145
+ "coreferences": [],
146
+ "relations": [],
147
+ }
148
+ ]
149
+
150
+ predictions = model.sample(
151
+ bigbio_pages=bigbio_pages,
152
+ num_beams=num_beams,
153
+ )
154
+
155
+ for i in range(0, len(predictions), num_beams):
156
+ mention = predictions[i]["mention"]
157
+ print(f"## Mention {(i // num_beams) + 1}: {mention}")
158
+
159
+ for j in range(num_beams):
160
+ pred = predictions[i + j]
161
+ print(
162
+ f" - Beam {j + 1}:\n"
163
+ f" Predicted concept name: {pred['pred_concept_name']}\n"
164
+ f" Predicted code: {pred['pred_concept_code']}\n"
165
+ f" Beam score: {pred['beam_score']:.3f}\n"
166
+ )
167
+ ```
168
+
169
+
170
+ **Example Output:**
171
+
172
+ ```text
173
+ ## Mention 1: pregnant woman
174
+ - Beam 1:
175
+ - Predicted concept name:Pregnant Woman
176
+ - Predicted code: C0033011
177
+ - Beam score: 1.000
178
+
179
+ - Beam 2:
180
+ - Predicted concept name:Pregnant woman
181
+ - Predicted code: C0033011
182
+ - Beam score: 0.003
183
+
184
+ - Beam 3:
185
+ - Predicted concept name:Pregnant woman (person)
186
+ - Predicted code: C0033011
187
+ - Beam score: 0.001
188
+
189
+ - Beam 4:
190
+ - Predicted concept name:Pregnancy Partner
191
+ - Predicted code: C3538996
192
+ - Beam score: 0.000
193
+
194
+ - Beam 5:
195
+ - Predicted concept name:Pregnant woman (person)
196
+ - Predicted code: C0033011
197
+ - Beam score: 0.000
198
+
199
+ ## Mention 2: severe-range hypertension
200
+ - Beam 1:
201
+ - Predicted concept name:Hypertensive disease
202
+ - Predicted code: C0020538
203
+ - Beam score: 0.078
204
+
205
+ - Beam 2:
206
+ - Predicted concept name:Hypertension (in some patients)
207
+ - Predicted code: C3280936
208
+ - Beam score: 0.022
209
+
210
+ - Beam 3:
211
+ - Predicted concept name:Hypertensive disease (disorder)
212
+ - Predicted code: C0020538
213
+ - Beam score: 0.010
214
+
215
+ - Beam 4:
216
+ - Predicted concept name:Hypertension, severe
217
+ - Predicted code: C4013784
218
+ - Beam score: 0.010
219
+
220
+ - Beam 5:
221
+ - Predicted concept name:Hypertension (patient A)
222
+ - Predicted code: C4313262
223
+ - Beam score: 0.004
224
+
225
+ ## Mention 3: proteinuria
226
+ - Beam 1:
227
+ - Predicted concept name:Proteinurias
228
+ - Predicted code: C0033687
229
+ - Beam score: 1.000
230
+
231
+ - Beam 2:
232
+ - Predicted concept name:Proteinuric diabetic nephropathy (disorder)
233
+ - Predicted code: C0403519
234
+ - Beam score: 0.003
235
+
236
+ - Beam 3:
237
+ - Predicted concept name:Proteinuria
238
+ - Predicted code: C0033687
239
+ - Beam score: 0.003
240
+
241
+ - Beam 4:
242
+ - Predicted concept name:Proteinuric diabetic nephropathy
243
+ - Predicted code: C0403519
244
+ - Beam score: 0.002
245
+
246
+ - Beam 5:
247
+ - Predicted concept name:Proteinuric hypertension of pregnancy (disorder)
248
+ - Predicted code: C0032914
249
+ - Beam score: 0.001
250
+
251
+ ## Mention 4: PET
252
+ - Beam 1:
253
+ - Predicted concept name:PET - Pre-eclamptic toxemia
254
+ - Predicted code: C0032914
255
+ - Beam score: 0.075
256
+
257
+ - Beam 2:
258
+ - Predicted concept name:PET - Pre-eclamptic toxaemia
259
+ - Predicted code: C0032914
260
+ - Beam score: 0.039
261
+
262
+ - Beam 3:
263
+ - Predicted concept name:Preeclamptic toxemia
264
+ - Predicted code: C2931877
265
+ - Beam score: 0.027
266
+
267
+ - Beam 4:
268
+ - Predicted concept name:Preeclampsia
269
+ - Predicted code: C0032914
270
+ - Beam score: 0.023
271
+
272
+ - Beam 5:
273
+ - Predicted concept name:Preeclampsia with Severe Features
274
+ - Predicted code: C0341950
275
+ - Beam score: 0.019
276
+ ```
277
+
278
+ ### Saliency map example
279
+
280
+ The model can also return token-level saliency maps during inference.
281
+
282
+ ```python
283
+ predictions, saliency_maps = model.sample(
284
+ bigbio_pages=bigbio_pages,
285
+ num_beams=num_beams,
286
+ with_saliency_maps=True,
287
+ )
288
+
289
+ model.display_saliency_map(saliency_maps[3])
290
+ ````
291
+
292
+ Example saliency map for the mention `PET`:
293
+
294
+ <p align="center">
295
+ <img src="saliency_map.png" alt="Saliency map for PET prediction" width="900">
296
+ </p>
297
+
298
+ ## Evaluation
299
+
300
+ Entity linking performance is reported using Recall@1 with bootstrap confidence intervals. The best result is shown in **bold**, and the second-best result is <u>underlined</u> and ⭐ marks the main LongBEL-8B model.
301
+
302
+ | Model | MM-ST21PV<br>(English) | QUAERO-EMEA<br>(French) | SympTEMIST<br>(Spanish) | DisTEMIST<br>(Spanish) | MedProcNER<br>(Spanish) |
303
+ | :--- | :---: | :---: | :---: | :---: | :---: |
304
+ | **Context-Free BEL** ||||| |
305
+ | SciSpacy | 53.8 ± 1.0 | 37.1 ± 4.3 | 9.8 ± 1.3 | 21.1 ± 1.9 | 10.3 ± 1.2 |
306
+ | SapBERT | 65.6 ± 1.0 | 59.7 ± 3.8 | 34.2 ± 2.0 | 38.6 ± 2.6 | 30.4 ± 2.1 |
307
+ | CODER-all | 62.9 ± 1.1 | 66.9 ± 4.0 | 42.2 ± 2.2 | 47.0 ± 2.6 | 42.7 ± 2.1 |
308
+ | SapBERT-all | 64.6 ± 1.1 | 67.9 ± 3.9 | 49.8 ± 2.4 | 49.6 ± 2.6 | 45.1 ± 2.2 |
309
+ | BERGAMOT | 60.9 ± 1.1 | 63.8 ± 4.9 | 48.0 ± 2.7 | 48.9 ± 2.4 | 42.3 ± 2.2 |
310
+ | **Local-Context BEL** ||||| |
311
+ | ArboEL | 76.9 ± 0.9 | 63.0 ± 3.9 | 55.4 ± 2.5 | 54.7 ± 2.6 | 59.7 ± 2.6 |
312
+ | GENRE / mBART-large | 69.6 ± 1.0 | 69.3 ± 5.4 | 59.8 ± 2.7 | 58.7 ± 2.7 | 66.0 ± 2.3 |
313
+ | GENRE / Llama-1B | 73.1 ± 1.0 | 75.1 ± 3.6 | 60.5 ± 2.4 | 62.5 ± 2.3 | 67.4 ± 2.1 |
314
+ | GENRE / Llama-8B | 75.0 ± 0.9 | 73.8 ± 4.0 | 61.7 ± 2.5 | 63.2 ± 2.5 | 68.3 ± 2.2 |
315
+ | **Global-Context BEL: LongBEL** ||||| |
316
+ | LongBEL-1B | 77.6 ± 0.9 | 74.5 ± 3.7 | 59.8 ± 2.5 | 61.9 ± 2.4 | 66.6 ± 2.1 |
317
+ | LongBEL-1B + Ensemble | 78.6 ± 0.8 | <u>77.2 ± 3.0</u> | 61.8 ± 2.5 | <u>64.3 ± 2.2</u> | <u>69.0 ± 2.0</u> |
318
+ | **⭐ LongBEL-8B** | <u>79.3 ± 0.8</u> | 75.4 ± 3.4 | <u>62.0 ± 2.6</u> | 63.6 ± 2.1 | <u>69.0 ± 2.1</u> |
319
+ | LongBEL-8B + Ensemble | **80.0 ± 0.8** | **77.6 ± 3.0** | **63.3 ± 2.5** | **65.8 ± 2.2** | **71.0 ± 2.0** |
320
+
321
+ The score reported for this checkpoint is the **single LongBEL-8B model**. The ensemble result requires fusing several LongBEL input configurations and is not produced by this checkpoint alone.
322
+
323
+ ## Speed and Memory
324
+
325
+ Measured on a single NVIDIA H100 80GB GPU.
326
+
327
+ | Model | Model memory | Candidate memory | Speed |
328
+ | ----------------------- | -----------: | ---------------: | --------------: |
329
+ | GENRE-Llama-8B baseline | 28.6 GB | 5.4 GB | 38.2 mentions/s |
330
+ | LongBEL-8B | 28.6 GB | 5.4 GB | 15.2 mentions/s |
331
+
332
+ LongBEL has the same model memory footprint as the sentence-level Llama-8B baseline, but it is slower because it processes longer contexts and updates document-level memory during inference.
333
+
334
+ ## Limitations
335
+
336
+ This model assumes that mention spans and semantic groups are given. It does not perform mention detection.
337
+
338
+ LongBEL is most useful when concepts recur within a document. When most concepts appear only once, the memory mechanism has less information to exploit.
339
+
340
+ Because LongBEL uses previous predictions as memory, early mistakes can still influence later predictions. Robust memory training reduces this risk but does not remove it completely.
341
+
342
+ This model is intended for research use. It should not be used for clinical decision-making without additional validation and human oversight.
343
+
344
+ ## Reproducibility
345
+
346
+ Code and evaluation scripts are available in this [GitHub repository](https://anonymous.4open.science/r/LongBEL-31AD).
347
+
348
+ Trained model checkpoints and processed datasets are available in the anonymous Hugging Face collection associated with LongBEL.
__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # __init__.py
2
+ from .longbel import LLamaLongBEL, LLamaLongBELConfig
3
+
4
+ __all__ = ["LLamaLongBEL", "LLamaLongBELConfig"]
candidate_trie.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88d7443bd939bd25b0672b560c882d6bfc3427fe0637224686313124f60cd6a5
3
+ size 164120599
chat_template.jinja ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
2
+
3
+ '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>
4
+
5
+ ' }}
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LLamaLongBEL"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 128009,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 14336,
15
+ "max_position_embeddings": 131072,
16
+ "mlp_bias": false,
17
+ "model_type": "llama_longbel",
18
+ "auto_map": {
19
+ "AutoConfig": "longbel.LLamaLongBELConfig",
20
+ "AutoModelForCausalLM": "longbel.LLamaLongBEL"
21
+ },
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 32,
24
+ "num_key_value_heads": 8,
25
+ "pad_token_id": 128009,
26
+ "pretraining_tp": 1,
27
+ "rms_norm_eps": 1e-05,
28
+ "rope_scaling": {
29
+ "factor": 8.0,
30
+ "high_freq_factor": 4.0,
31
+ "low_freq_factor": 1.0,
32
+ "original_max_position_embeddings": 8192,
33
+ "rope_type": "llama3"
34
+ },
35
+ "rope_theta": 500000.0,
36
+ "tie_word_embeddings": false,
37
+ "transformers_version": "4.57.1",
38
+ "use_cache": true,
39
+ "vocab_size": 128257
40
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128009,
6
+ 128001,
7
+ 128008,
8
+ 128009
9
+ ],
10
+ "pad_token_id": 128009,
11
+ "temperature": 0.6,
12
+ "top_p": 0.9,
13
+ "transformers_version": "4.57.1"
14
+ }
longbel.py ADDED
@@ -0,0 +1,905 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core models for LongBEL
3
+ """
4
+ # Copyright (c) Facebook, Inc. and its affiliates.
5
+ # All rights reserved.
6
+ #
7
+ # This source code is licensed under the license found in the
8
+ # LICENSE file in the root directory of this source tree.
9
+
10
+ import json
11
+ import logging
12
+ import os
13
+ import pickle
14
+ import re
15
+ from html import escape
16
+ from typing import Optional
17
+
18
+ import torch
19
+ import torch.nn.functional as F
20
+ from huggingface_hub import hf_hub_download
21
+ from IPython.display import HTML, display
22
+ from tqdm.auto import tqdm
23
+ from transformers import (
24
+ AutoTokenizer,
25
+ LlamaForCausalLM,
26
+ PretrainedConfig,
27
+ )
28
+
29
+ logger = logging.getLogger(__name__)
30
+ logging.basicConfig(
31
+ level=logging.INFO, # Display INFO and above
32
+ format="%(levelname)s - %(message)s",
33
+ )
34
+
35
+
36
+ # Define a simple config class that inherits from PretrainedConfig
37
+ class LLamaLongBELConfig(PretrainedConfig):
38
+ model_type = "llama_longbel"
39
+
40
+ def __init__(self, **kwargs):
41
+ # Ensure it has llama as base
42
+ kwargs.setdefault("model_type", "llama")
43
+ super().__init__(**kwargs)
44
+
45
+
46
+ def clean_natural(text):
47
+ return (
48
+ text.replace("\xa0", " ")
49
+ .replace("{", "(")
50
+ .replace("}", ")")
51
+ .replace("[", "(")
52
+ .replace("]", ")")
53
+ .replace("\n", " ")
54
+ )
55
+
56
+
57
+ def parse_text(
58
+ data,
59
+ start_entity,
60
+ end_entity,
61
+ start_group,
62
+ end_group,
63
+ ) -> tuple[list[str], list[str], list[dict[str, str]]]:
64
+ """Create simple (source, target) pairs per entity.
65
+
66
+ For each entity in the BigBio page, returns one pair where:
67
+ - source: the sentence text that contains the entity mention
68
+ - target: "<entity> is <annotation>" where <annotation> is the best synonym
69
+ if available (or the normalized id otherwise).
70
+ """
71
+ source_sentences: list[str] = []
72
+ tsv_lines: list[dict[str, str]] = []
73
+ target_texts_dict: dict[tuple[tuple[int, int], ...], str] = {}
74
+ source_texts_dict: dict[tuple[tuple[int, int], ...], str] = {}
75
+ tsv_lines_dict: dict[tuple[tuple[int, int], ...], dict[str, str]] = {}
76
+ all_passages = {}
77
+ for i, passage in enumerate(data.get("passages", [])):
78
+ all_passages[i] = clean_natural(passage["text"][0])
79
+ for passage_id, passage in enumerate(data.get("passages", [])):
80
+ passage_text = passage["text"][0]
81
+ start_offset_passage = passage["offsets"][0][0]
82
+ end_offset_passage = passage["offsets"][0][1]
83
+
84
+ passage_text = clean_natural(passage_text)
85
+
86
+ # Iterate over entities and emit one pair per entity found in this passage
87
+ for entity in data.get("entities", []):
88
+ # min and max of all entity offsets to get the global span of the entity for filtering sentences
89
+ global_start = min(off[0] for off in entity["offsets"])
90
+ global_end = max(off[1] for off in entity["offsets"])
91
+ # Keep only entities whose start falls inside this passage
92
+ if not (start_offset_passage <= global_start < end_offset_passage):
93
+ continue
94
+ entity_text = " ".join(entity["text"])
95
+ entity_text = clean_natural(entity_text)
96
+ # Define entity group
97
+ group_annotation = entity.get("type")
98
+ # Get all offsets, convert to relative, and filter for this sentence
99
+ relative_entity_spans = []
100
+ for off in entity["offsets"]:
101
+ global_start_off, global_end_off = off
102
+ if not (start_offset_passage <= global_start_off < end_offset_passage):
103
+ continue
104
+
105
+ rel_start_off = global_start_off - start_offset_passage
106
+ rel_end_off = global_end_off - start_offset_passage
107
+ relative_entity_spans.append((rel_start_off, rel_end_off))
108
+ relative_entity_spans.sort(key=lambda x: x[0])
109
+
110
+ marked_text = passage_text
111
+ for start_in_sent, end_in_sent in relative_entity_spans:
112
+ marked_text = (
113
+ marked_text[:start_in_sent]
114
+ + start_entity
115
+ + marked_text[start_in_sent:end_in_sent]
116
+ + end_entity
117
+ + marked_text[end_in_sent:]
118
+ )
119
+
120
+ for other_passage_id, other_passage_text in all_passages.items():
121
+ if other_passage_id < passage_id:
122
+ marked_text = other_passage_text + "\n" + marked_text
123
+ elif other_passage_id > passage_id:
124
+ marked_text = marked_text + "\n" + other_passage_text
125
+ # Emit the pair
126
+ doc_id = data.get("id", "")
127
+ tsv_line = {
128
+ "doc_id": doc_id,
129
+ "semantic_group": group_annotation,
130
+ "start_span": global_start,
131
+ "end_span": global_end,
132
+ "mention": entity_text,
133
+ }
134
+ if entity.get("normalized"):
135
+ tsv_line["gold_concept_code"] = entity["normalized"][0]["db_id"]
136
+ tsv_line["gold_concept_name"] = entity["normalized"][0]["db_match"]
137
+
138
+ tsv_lines_dict[(global_start, global_end)] = tsv_line
139
+ source_texts_dict[(global_start, global_end)] = marked_text
140
+ target_entity_text = (
141
+ start_entity
142
+ + entity_text
143
+ + end_entity
144
+ + start_group
145
+ + group_annotation
146
+ + end_group
147
+ )
148
+ target_texts_dict[(global_start, global_end)] = target_entity_text
149
+ # Sort keys to have a deterministic order
150
+ target_texts = []
151
+ sorted_keys = sorted(tsv_lines_dict.keys(), key=lambda x: (x[0], x[1]))
152
+ for entity_id, entity_span in enumerate(sorted_keys):
153
+ tsv_line = tsv_lines_dict[entity_span]
154
+ tsv_line["mention_id"] = f"{data.get('id', '')}.{entity_id + 1}"
155
+ tsv_lines.append(tsv_line)
156
+ source_sentences.append(source_texts_dict[entity_span])
157
+ target_texts.append(target_texts_dict[entity_span])
158
+
159
+ return source_sentences, target_texts, tsv_lines # type: ignore
160
+
161
+
162
+ def get_prefix_allowed_tokens_fn(
163
+ model,
164
+ sources: list[str],
165
+ sem_groups: list[str],
166
+ multiple_answers: bool = False,
167
+ ):
168
+ candidates_trie = model.candidate_trie # type: ignore
169
+ sep_token_id = model.tokenizer.sep_token_id
170
+ eos_token_id = model.tokenizer.eos_token_id
171
+ pad_token_id = model.tokenizer.pad_token_id
172
+ plus_token_id = model.tokenizer.convert_tokens_to_ids("<+>") # type: ignore
173
+ end_group_token_id = model.tokenizer.convert_tokens_to_ids("}") # type: ignore
174
+
175
+ def prefix_allowed_tokens_fn(batch_id, sent):
176
+ sent = sent.tolist()
177
+ if len(sent) > 1 and sent[-1] in [eos_token_id, pad_token_id, sep_token_id]:
178
+ if sep_token_id:
179
+ return [sep_token_id, pad_token_id, eos_token_id]
180
+ else:
181
+ return [pad_token_id, eos_token_id]
182
+
183
+ # Remove the prefix from the sent
184
+ index_sep = len(sent) - 1 - sent[::-1].index(end_group_token_id)
185
+ sent = sent[index_sep:]
186
+
187
+ sem_group = sem_groups[batch_id]
188
+ # Remove everything up to last sep_token_id and add prefix
189
+ if multiple_answers and plus_token_id in sent:
190
+ index_plus = len(sent) - 1 - sent[::-1].index(plus_token_id)
191
+ # Start fresh with decoder start
192
+ if index_plus == len(sent) - 1:
193
+ sent = [end_group_token_id]
194
+ # If there are tokens after the last plus_token_id, keep them
195
+ else:
196
+ sent = [end_group_token_id] + sent[index_plus + 1 :]
197
+ trie_out = candidates_trie[
198
+ sem_group # type: ignore
199
+ ].get(sent)
200
+ if eos_token_id in trie_out:
201
+ if sep_token_id:
202
+ trie_out += [sep_token_id]
203
+ if multiple_answers:
204
+ trie_out += [plus_token_id]
205
+ elif not trie_out:
206
+ if sep_token_id:
207
+ return [sep_token_id, pad_token_id, eos_token_id]
208
+ else:
209
+ return [pad_token_id, eos_token_id]
210
+ return trie_out
211
+
212
+ return prefix_allowed_tokens_fn
213
+
214
+
215
+ def add_headers_to_prompt(source: str, target: str, previous_targets: str):
216
+ if not previous_targets:
217
+ previous_targets = "None"
218
+ input_sentence = f"### Context\n{source.rstrip()}\n\n### Previous Normalizations\n{previous_targets.rstrip()}\n\n### Prediction\n{target.rstrip()}"
219
+ return input_sentence
220
+
221
+
222
+ def parse_prediction(
223
+ outputs: list[str],
224
+ sem_groups: list[str],
225
+ text_to_code: Optional[dict[str, dict[str, str]]] = None,
226
+ multiple_answers: bool = False,
227
+ ) -> tuple[list[str], list[str]]:
228
+ codes = []
229
+ predictions = []
230
+ for output, group in zip(outputs, sem_groups):
231
+ splits = output.split("} ") # type: ignore
232
+ if len(splits) > 1 and splits[-1].strip():
233
+ prediction = splits[-1].strip().replace("<SEP>", "")
234
+ if text_to_code:
235
+ if multiple_answers:
236
+ prediction_list = prediction.split("<+>") # type: ignore
237
+ code_list = set()
238
+ for pred in prediction_list:
239
+ code_list.add(text_to_code[group].get(pred.strip(), "NO_CODE"))
240
+ if len(code_list) > 1 and "NO_CODE" in code_list:
241
+ code_list.remove("NO_CODE")
242
+ code = "+".join(code_list)
243
+ else:
244
+ code = text_to_code[group].get(prediction, "NO_CODE")
245
+ else:
246
+ code = "NO_CODE"
247
+ else:
248
+ print(
249
+ "IndexError: splitting failed or empty prediction, adding empty string as prediction."
250
+ )
251
+ prediction = "NO_PREDICTION"
252
+ code = "NO_CODE"
253
+ codes.append(code)
254
+ predictions.append(prediction)
255
+ return codes, predictions
256
+
257
+
258
+ def compute_score(outputs, tokenizer, prefix_len=0):
259
+ sequences = outputs.sequences # (N, seq_len)
260
+ scores = outputs.scores # list length T = # generated tokens
261
+
262
+ N, total_len = sequences.shape
263
+ T = len(scores)
264
+
265
+ # keep only the generated part (completion)
266
+ sequences = sequences[:, prefix_len : prefix_len + T]
267
+
268
+ # Make sure score is not longer than sequences
269
+ if len(scores) > sequences.size(1):
270
+ scores = scores[: sequences.size(1)]
271
+
272
+ # Compute as usual but now only for completion tokens
273
+ mask = (
274
+ (sequences != tokenizer.pad_token_id)
275
+ & (sequences != tokenizer.eos_token_id)
276
+ & (sequences != tokenizer.bos_token_id)
277
+ )
278
+
279
+ # log-prob for each generated token
280
+ logprob_steps = []
281
+ for t, logits in enumerate(scores):
282
+ log_probs_t = F.log_softmax(logits, dim=-1)
283
+ token_t = sequences[:, t]
284
+ idx = torch.arange(N)
285
+ logprob_steps.append(log_probs_t[idx, token_t])
286
+
287
+ logprobs = torch.stack(logprob_steps, dim=1)
288
+ logprobs.masked_fill_(~mask, 0)
289
+
290
+ lengths = mask.sum(dim=1).clamp(min=1)
291
+ confidence = torch.exp(logprobs.sum(dim=1) / lengths)
292
+
293
+ return confidence.tolist()
294
+
295
+
296
+ def skip_undesired_tokens(outputs, tokenizer):
297
+ sep_token = "<SEP>"
298
+ plus_token = "<+>"
299
+ # Build the list of special tokens to remove
300
+ tokens_to_remove = tokenizer.all_special_tokens[:2]
301
+
302
+ cleaned_outputs = []
303
+ for sequence in outputs:
304
+ # Remove undesired special tokens
305
+ for token in tokens_to_remove:
306
+ sequence = sequence.replace(token, "")
307
+
308
+ # Remove spaces *immediately* after the sep_token adn plus_token (e.g. "<sep> text" → "<sep>text")
309
+ sequence = re.sub(rf"({re.escape(plus_token)})\s+", r"\1", sequence)
310
+ sequence = re.sub(rf"({re.escape(sep_token)})\s+", r"\1", sequence)
311
+
312
+ cleaned_outputs.append(sequence.strip())
313
+
314
+ return cleaned_outputs
315
+
316
+
317
+ def _score_to_rgb(score: float) -> tuple[int, int, int]:
318
+ clipped_score = max(0.0, min(1.0, score))
319
+ red = 255
320
+ channel = int(255 * (1.0 - clipped_score))
321
+ return red, channel, channel
322
+
323
+
324
+ def _build_ansi_saliency_text(
325
+ token_texts: list[str], saliency_scores: list[float]
326
+ ) -> str:
327
+ chunks = []
328
+ for token_text, score in zip(token_texts, saliency_scores):
329
+ red, green, blue = _score_to_rgb(score)
330
+ chunks.append(f"\x1b[48;2;{red};{green};{blue}m{token_text}\x1b[0m")
331
+ return "".join(chunks)
332
+
333
+
334
+ def _build_html_saliency_text(
335
+ token_texts: list[str], saliency_scores: list[float]
336
+ ) -> str:
337
+ chunks = []
338
+ for token_text, score in zip(token_texts, saliency_scores):
339
+ red, green, blue = _score_to_rgb(score)
340
+ chunks.append(
341
+ f'<span style="background-color: rgb({red}, {green}, {blue});">{escape(token_text)}</span>'
342
+ )
343
+ return "".join(chunks)
344
+
345
+
346
+ class LLamaLongBEL(LlamaForCausalLM):
347
+ config_class = LLamaLongBELConfig
348
+
349
+ def __init__(self, config, *args, **kwargs):
350
+ # Initialize the parent LlamaForCausalLM
351
+ super().__init__(config, *args, **kwargs)
352
+
353
+ self.text_to_code = None
354
+ self.candidate_trie = None
355
+ self.tokenizer = None
356
+
357
+ @classmethod
358
+ def from_pretrained(
359
+ cls,
360
+ pretrained_model_name_or_path,
361
+ *args,
362
+ text_to_code_path=None,
363
+ candidate_trie_path=None,
364
+ **kwargs,
365
+ ):
366
+ # Remove custom kwargs before passing to parent
367
+ custom_kwargs = {
368
+ "text_to_code_path": text_to_code_path,
369
+ "candidate_trie_path": candidate_trie_path,
370
+ }
371
+
372
+ # Call parent's from_pretrained
373
+ model = super().from_pretrained(
374
+ pretrained_model_name_or_path,
375
+ *args,
376
+ **{k: v for k, v in kwargs.items() if k not in custom_kwargs},
377
+ )
378
+
379
+ # Set up tokenizer
380
+ model.tokenizer = AutoTokenizer.from_pretrained(
381
+ pretrained_model_name_or_path, use_fast=True
382
+ )
383
+ model.tokenizer.padding_side = "left"
384
+
385
+ # Load text_to_code
386
+ text_to_code_file_local = (
387
+ text_to_code_path
388
+ if text_to_code_path is not None
389
+ else os.path.join(pretrained_model_name_or_path, "text_to_code.json")
390
+ )
391
+ try:
392
+ if os.path.exists(text_to_code_file_local):
393
+ with open(text_to_code_file_local, encoding="utf-8") as f:
394
+ model.text_to_code = json.load(f)
395
+ logger.info(
396
+ f"Loaded text_to_code.json from local path: {text_to_code_file_local}"
397
+ )
398
+ else:
399
+ text_to_code_path_hf = hf_hub_download(
400
+ repo_id=pretrained_model_name_or_path,
401
+ filename="text_to_code.json",
402
+ )
403
+ with open(text_to_code_path_hf, encoding="utf-8") as f:
404
+ model.text_to_code = json.load(f)
405
+ logger.info(
406
+ f"Loaded text_to_code.json from HF Hub: {text_to_code_path_hf}"
407
+ )
408
+ except Exception:
409
+ logger.warning("text_to_code.json not found (local or HF hub)")
410
+ model.text_to_code = None
411
+
412
+ # Load candidate_trie
413
+ candidate_trie_file_local = (
414
+ candidate_trie_path
415
+ if candidate_trie_path is not None
416
+ else os.path.join(pretrained_model_name_or_path, "candidate_trie.pkl")
417
+ )
418
+ try:
419
+ if os.path.exists(candidate_trie_file_local):
420
+ with open(candidate_trie_file_local, "rb") as f:
421
+ model.candidate_trie = pickle.load(f)
422
+ logger.info(
423
+ f"Loaded candidate_trie.pkl from local path: {candidate_trie_file_local}"
424
+ )
425
+ else:
426
+ candidate_trie_path_hf = hf_hub_download(
427
+ repo_id=pretrained_model_name_or_path,
428
+ filename="candidate_trie.pkl",
429
+ )
430
+ with open(candidate_trie_path_hf, "rb") as f:
431
+ model.candidate_trie = pickle.load(f)
432
+ logger.info(
433
+ f"Loaded candidate_trie.pkl from HF Hub: {candidate_trie_path_hf}"
434
+ )
435
+ except Exception:
436
+ logger.warning("candidate_trie.pkl not found (local or HF hub)")
437
+ model.candidate_trie = None
438
+
439
+ return model
440
+
441
+ def _compute_gradient_saliency(
442
+ self,
443
+ input_sentences: list[str],
444
+ generated_sequences: torch.Tensor,
445
+ num_beams: int,
446
+ prefix_len: int,
447
+ ) -> list[dict[str, object]]:
448
+ if not input_sentences:
449
+ return []
450
+
451
+ top_sequence_indices = (
452
+ torch.arange(
453
+ len(input_sentences),
454
+ device=generated_sequences.device,
455
+ )
456
+ * num_beams
457
+ )
458
+ top_sequences = generated_sequences.index_select(0, top_sequence_indices)
459
+
460
+ attention_mask = (top_sequences != self.tokenizer.pad_token_id).long() # type: ignore
461
+ input_embeddings = self.get_input_embeddings()(top_sequences).detach() # type: ignore
462
+
463
+ next_tokens = top_sequences[:, 1:]
464
+ output_token_mask = torch.zeros_like(next_tokens, dtype=torch.bool)
465
+ if prefix_len > 0:
466
+ output_token_mask[:, prefix_len - 1 :] = True
467
+
468
+ valid_token_mask = output_token_mask & (
469
+ (next_tokens != self.tokenizer.pad_token_id) # type: ignore
470
+ & (next_tokens != self.tokenizer.eos_token_id) # type: ignore
471
+ & (next_tokens != self.tokenizer.bos_token_id) # type: ignore
472
+ )
473
+
474
+ def _objective_from_embeddings(embeddings: torch.Tensor) -> torch.Tensor:
475
+ forward_outputs = self( # type: ignore
476
+ inputs_embeds=embeddings,
477
+ attention_mask=attention_mask,
478
+ use_cache=False,
479
+ return_dict=True,
480
+ )
481
+ logits = forward_outputs.logits[:, :-1, :]
482
+ log_probs = F.log_softmax(logits, dim=-1)
483
+ token_log_probs = log_probs.gather(
484
+ dim=-1,
485
+ index=next_tokens.unsqueeze(-1),
486
+ ).squeeze(-1)
487
+ return token_log_probs.masked_select(valid_token_mask).sum()
488
+
489
+ simple_embeddings = input_embeddings.detach()
490
+ simple_embeddings.requires_grad_(True)
491
+ self.zero_grad(set_to_none=True) # type: ignore
492
+ with torch.enable_grad():
493
+ objective = _objective_from_embeddings(simple_embeddings)
494
+ gradients = torch.autograd.grad(
495
+ outputs=objective,
496
+ inputs=simple_embeddings,
497
+ retain_graph=False,
498
+ create_graph=False,
499
+ )[0]
500
+ token_importance = gradients.norm(p=2, dim=-1)
501
+ saliency_maps = []
502
+ sequence_len = top_sequences.size(1)
503
+ prompt_positions = torch.arange(sequence_len, device=top_sequences.device)
504
+ prompt_mask = (prompt_positions.unsqueeze(0) < prefix_len) & (
505
+ top_sequences != self.tokenizer.pad_token_id # type: ignore
506
+ )
507
+
508
+ for sequence_ids, importance_scores, sentence, mask in zip(
509
+ top_sequences,
510
+ token_importance,
511
+ input_sentences,
512
+ prompt_mask,
513
+ ):
514
+ selected_ids = sequence_ids[mask]
515
+ selected_scores = importance_scores[mask]
516
+
517
+ if selected_scores.numel() == 0:
518
+ saliency_maps.append({
519
+ "input_sentence": sentence,
520
+ "token_ids": [],
521
+ "token_strings": [],
522
+ "saliency_scores": [],
523
+ "saliency_ansi": "",
524
+ "saliency_html": "",
525
+ })
526
+ continue
527
+
528
+ max_score = selected_scores.max().clamp(min=1e-12)
529
+ normalized_scores = (selected_scores / max_score).tolist()
530
+ selected_ids_list = selected_ids.tolist()
531
+ token_strings = [
532
+ self.tokenizer.decode( # type: ignore
533
+ [token_id],
534
+ skip_special_tokens=False,
535
+ clean_up_tokenization_spaces=False,
536
+ )
537
+ for token_id in selected_ids_list
538
+ ]
539
+
540
+ saliency_maps.append({
541
+ "input_sentence": sentence,
542
+ "token_ids": selected_ids_list,
543
+ "token_strings": token_strings,
544
+ "saliency_scores": normalized_scores,
545
+ "saliency_ansi": _build_ansi_saliency_text(
546
+ token_strings,
547
+ normalized_scores,
548
+ ),
549
+ "saliency_html": _build_html_saliency_text(
550
+ token_strings,
551
+ normalized_scores,
552
+ ),
553
+ })
554
+
555
+ return saliency_maps
556
+
557
+ def display_saliency_map(self, saliency_map):
558
+ saliency_html = re.sub(
559
+ r"<span[^>]*>\s*&lt;\|begin_of_text\|&gt;\s*</span>",
560
+ "",
561
+ saliency_map["saliency_html"],
562
+ count=1,
563
+ )
564
+ pred_name = escape(str(saliency_map.get("pred_concept_name", "")))
565
+ pred_code = escape(str(saliency_map.get("pred_concept_code", "")))
566
+ full_html = f"""
567
+ <div style="
568
+ font-family: Times New Roman, Times, serif, monospace;
569
+ font-size: 18px;
570
+ line-height: 1.6;
571
+ white-space: pre-wrap;
572
+ border: 1px solid #ddd;
573
+ border-radius: 8px;
574
+ padding: 12px;
575
+ background: #fafafa;
576
+ ">{saliency_html} → {pred_name} ({pred_code})</div>
577
+ """
578
+ display(HTML(full_html))
579
+
580
+ def predict_batch(
581
+ self,
582
+ all_outputs,
583
+ batch_size,
584
+ input_sentences,
585
+ sem_groups,
586
+ mentions,
587
+ mentions_id,
588
+ doc_ids,
589
+ start_spans,
590
+ end_spans,
591
+ gold_concept_codes,
592
+ gold_concept_names,
593
+ constrained,
594
+ multiple_answers,
595
+ num_beams,
596
+ with_saliency_maps: bool = False,
597
+ **kwargs,
598
+ ):
599
+ input_args = {
600
+ k: v.to(self.device) # type: ignore
601
+ for k, v in self.tokenizer.batch_encode_plus( # type: ignore
602
+ input_sentences, padding="longest", return_tensors="pt"
603
+ ).items()
604
+ }
605
+
606
+ # Constrained decoding
607
+ prefix_allowed_tokens_fn = None
608
+ if constrained:
609
+ if self.candidate_trie is None: # type: ignore
610
+ raise ValueError(
611
+ "candidate_trie is not loaded in the model. Use constrained=False."
612
+ )
613
+ prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(
614
+ model=self,
615
+ sources=input_sentences,
616
+ sem_groups=sem_groups,
617
+ multiple_answers=multiple_answers,
618
+ )
619
+ if self.tokenizer.sep_token_id: # type: ignore
620
+ eos_token_id = self.tokenizer.sep_token_id # type: ignore
621
+ else:
622
+ eos_token_id = self.tokenizer.eos_token_id # type: ignore
623
+ outputs = self.generate( # type: ignore
624
+ **input_args,
625
+ max_new_tokens=128,
626
+ num_beams=num_beams,
627
+ num_return_sequences=num_beams,
628
+ output_scores=True,
629
+ return_dict_in_generate=True,
630
+ prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
631
+ eos_token_id=eos_token_id, # type: ignore
632
+ **kwargs,
633
+ )
634
+ decoded_sequences = self.tokenizer.batch_decode( # type: ignore
635
+ outputs.sequences, # type: ignore
636
+ skip_special_tokens=False,
637
+ clean_up_tokenization_spaces=True,
638
+ )
639
+ cleaned_output_sequences = skip_undesired_tokens(
640
+ decoded_sequences,
641
+ self.tokenizer, # type: ignore
642
+ )
643
+
644
+ prefix_len = input_args["input_ids"].size(1)
645
+
646
+ base_sem_groups = sem_groups.copy()
647
+ base_mentions = mentions.copy()
648
+ base_mentions_id = mentions_id.copy()
649
+ base_doc_ids = doc_ids.copy()
650
+ base_start_spans = start_spans.copy()
651
+ base_end_spans = end_spans.copy()
652
+ base_gold_concept_codes = gold_concept_codes.copy()
653
+ base_gold_concept_names = gold_concept_names.copy()
654
+
655
+ # Duplicate sem_groups and mentions for each beam
656
+ sem_groups = [x for x in sem_groups for _ in range(num_beams)]
657
+ mentions = [x for x in mentions for _ in range(num_beams)]
658
+ mentions_id = [x for x in mentions_id for _ in range(num_beams)]
659
+ gold_concept_codes = [x for x in gold_concept_codes for _ in range(num_beams)] # type: ignore
660
+ gold_concept_names = [x for x in gold_concept_names for _ in range(num_beams)] # type: ignore
661
+ start_spans = [x for x in start_spans for _ in range(num_beams)]
662
+ end_spans = [x for x in end_spans for _ in range(num_beams)]
663
+ doc_ids = [x for x in doc_ids for _ in range(num_beams)]
664
+ # Parse predictions
665
+ pred_concept_codes, pred_concept_names = parse_prediction(
666
+ cleaned_output_sequences,
667
+ sem_groups,
668
+ self.text_to_code, # type: ignore
669
+ multiple_answers=multiple_answers,
670
+ )
671
+ scores = compute_score(
672
+ outputs,
673
+ self.tokenizer, # type: ignore
674
+ prefix_len=prefix_len,
675
+ )
676
+ beam_scores = [
677
+ float(torch.exp(s)) if num_beams > 1 else float("nan")
678
+ for s in (
679
+ outputs.sequences_scores # type: ignore
680
+ if num_beams > 1
681
+ else [torch.tensor(float("nan"))] * len(scores)
682
+ )
683
+ ]
684
+ all_outputs.extend([
685
+ {
686
+ "mention": mention,
687
+ "doc_id": doc_id,
688
+ "mention_id": mention_id,
689
+ "start_span": start_span,
690
+ "end_span": end_span,
691
+ "semantic_group": group,
692
+ "gold_concept_code": gold_concept_code,
693
+ "gold_concept_name": gold_concept_name,
694
+ "pred_concept_name": pred_concept_name,
695
+ "pred_concept_code": pred_concept_code,
696
+ "score": score,
697
+ "beam_score": beam_score,
698
+ "rank": rank + 1,
699
+ }
700
+ for score, beam_score, pred_concept_code, pred_concept_name, mention, doc_id, mention_id, start_span, end_span, group, gold_concept_code, gold_concept_name, rank in zip(
701
+ scores,
702
+ beam_scores,
703
+ pred_concept_codes,
704
+ pred_concept_names,
705
+ mentions,
706
+ doc_ids,
707
+ mentions_id,
708
+ start_spans,
709
+ end_spans,
710
+ sem_groups,
711
+ gold_concept_codes,
712
+ gold_concept_names,
713
+ list(range(num_beams)) * batch_size,
714
+ )
715
+ ])
716
+
717
+ saliency_maps = []
718
+ if with_saliency_maps:
719
+ saliency_maps = self._compute_gradient_saliency(
720
+ input_sentences=input_sentences,
721
+ generated_sequences=outputs.sequences, # type: ignore
722
+ num_beams=num_beams,
723
+ prefix_len=prefix_len,
724
+ )
725
+ for idx, saliency_map in enumerate(saliency_maps):
726
+ top_prediction_index = idx * num_beams
727
+ saliency_map.update({
728
+ "mention": base_mentions[idx],
729
+ "doc_id": base_doc_ids[idx],
730
+ "mention_id": base_mentions_id[idx],
731
+ "start_span": base_start_spans[idx],
732
+ "end_span": base_end_spans[idx],
733
+ "semantic_group": base_sem_groups[idx],
734
+ "gold_concept_code": base_gold_concept_codes[idx],
735
+ "gold_concept_name": base_gold_concept_names[idx],
736
+ "pred_concept_name": pred_concept_names[top_prediction_index],
737
+ "pred_concept_code": pred_concept_codes[top_prediction_index],
738
+ "score": scores[top_prediction_index],
739
+ "rank": 1,
740
+ })
741
+
742
+ print(f"Sampling completed. Generated {len(all_outputs)} predictions.")
743
+ return all_outputs, cleaned_output_sequences, saliency_maps
744
+
745
+ def sample(
746
+ self,
747
+ bigbio_pages: list[dict], # type: ignore
748
+ num_beams: int = 5,
749
+ constrained: bool = True,
750
+ with_saliency_maps: bool = False,
751
+ multiple_answers: bool = False,
752
+ batch_size: int = 8,
753
+ start_entity: str = "[",
754
+ end_entity: str = "]",
755
+ start_group: str = "{",
756
+ end_group: str = "}",
757
+ show_progress: bool = True,
758
+ **kwargs,
759
+ ) -> (
760
+ list[dict[str, object]]
761
+ | tuple[list[dict[str, object]], list[dict[str, object]]]
762
+ ):
763
+
764
+ print(
765
+ f"Starting sampling on {len(bigbio_pages)} pages, constrained={constrained}, beams={num_beams}, batch_size={batch_size})"
766
+ )
767
+
768
+ def _progress(
769
+ iterable, desc: str, total: Optional[int] = None, show: bool = True
770
+ ):
771
+ if show:
772
+ return tqdm(iterable, desc=desc, total=total)
773
+ return iterable
774
+
775
+ all_outputs = []
776
+ all_sources = []
777
+ all_targets = []
778
+ all_entities_info = []
779
+ for data in bigbio_pages:
780
+ sources, targets, entities_info = parse_text(
781
+ data=data,
782
+ start_entity=start_entity,
783
+ end_entity=end_entity,
784
+ start_group=start_group,
785
+ end_group=end_group,
786
+ )
787
+ all_sources.append(sources)
788
+ all_targets.append(targets)
789
+ all_entities_info.append(entities_info)
790
+
791
+ def _build_sequential_batches():
792
+ # Keep per-page order while still processing multiple pages per batch.
793
+ page_positions = [0] * len(all_sources)
794
+ next_page_idx = 0
795
+ active_pages = []
796
+ batches = []
797
+
798
+ while active_pages or next_page_idx < len(all_sources):
799
+ while len(active_pages) < batch_size and next_page_idx < len(
800
+ all_sources
801
+ ):
802
+ if len(all_sources[next_page_idx]) > 0:
803
+ active_pages.append(next_page_idx)
804
+ next_page_idx += 1
805
+
806
+ if not active_pages:
807
+ break
808
+
809
+ batch = []
810
+ next_active_pages = []
811
+ for page_idx in active_pages:
812
+ item_idx = page_positions[page_idx]
813
+ batch.append((
814
+ all_sources[page_idx][item_idx],
815
+ all_targets[page_idx][item_idx],
816
+ all_entities_info[page_idx][item_idx],
817
+ ))
818
+ page_positions[page_idx] += 1
819
+ if page_positions[page_idx] < len(all_sources[page_idx]):
820
+ next_active_pages.append(page_idx)
821
+
822
+ batches.append(batch)
823
+ active_pages = next_active_pages
824
+
825
+ return batches
826
+
827
+ all_batches = _build_sequential_batches()
828
+
829
+ print(
830
+ f"Input preparation completed. Running generation on {len(all_batches)} batches."
831
+ )
832
+
833
+ all_outputs = []
834
+ all_saliency_maps = []
835
+ batch_previous_targets = {}
836
+ for batch in _progress(
837
+ all_batches,
838
+ desc="Processing batches",
839
+ total=len(all_batches),
840
+ show=show_progress,
841
+ ):
842
+ input_sentences = []
843
+ sem_groups = []
844
+ mentions = []
845
+ doc_ids = []
846
+ mentions_id = []
847
+ gold_concept_codes = []
848
+ gold_concept_names = []
849
+ start_spans = []
850
+ end_spans = []
851
+ for source, target, entity in batch:
852
+ doc_id = entity["doc_id"]
853
+ if doc_id not in batch_previous_targets:
854
+ batch_previous_targets[doc_id] = ""
855
+ previous_targets = batch_previous_targets.get(doc_id)
856
+
857
+ input_sentences.append(
858
+ add_headers_to_prompt(
859
+ source,
860
+ target,
861
+ previous_targets, # type: ignore
862
+ )
863
+ )
864
+ sem_groups.append(entity["semantic_group"])
865
+ mentions.append(entity["mention"])
866
+ doc_ids.append(doc_id)
867
+ mentions_id.append(entity["mention_id"])
868
+ start_spans.append(entity["start_span"])
869
+ end_spans.append(entity["end_span"])
870
+ gold_concept_codes.append(entity.get("gold_concept_code", None)) # type: ignore
871
+ gold_concept_names.append(entity.get("gold_concept_name", None)) # type: ignore
872
+ all_outputs, cleaned_output_sequences, batch_saliency_maps = (
873
+ self.predict_batch(
874
+ all_outputs=all_outputs,
875
+ batch_size=batch_size,
876
+ input_sentences=input_sentences,
877
+ sem_groups=sem_groups,
878
+ mentions=mentions,
879
+ mentions_id=mentions_id,
880
+ doc_ids=doc_ids,
881
+ start_spans=start_spans,
882
+ end_spans=end_spans,
883
+ gold_concept_codes=gold_concept_codes,
884
+ gold_concept_names=gold_concept_names,
885
+ constrained=constrained,
886
+ multiple_answers=multiple_answers,
887
+ num_beams=num_beams,
888
+ with_saliency_maps=with_saliency_maps,
889
+ **kwargs,
890
+ )
891
+ )
892
+ if with_saliency_maps:
893
+ all_saliency_maps.extend(batch_saliency_maps)
894
+ for i, doc_id in enumerate(doc_ids):
895
+ clean_sentence = cleaned_output_sequences[num_beams * i]
896
+ clean_sentence = start_entity + clean_sentence.split(start_entity)[-1]
897
+ clean_sentence = clean_sentence.rstrip() + "\n"
898
+ batch_previous_targets[doc_id] += clean_sentence
899
+
900
+ if with_saliency_maps:
901
+ return all_outputs, all_saliency_maps # type: ignore
902
+ return all_outputs # type: ignore
903
+
904
+ def encode(self, sentence):
905
+ return self.tokenizer.encode(sentence, return_tensors="pt")[0] # type: ignore
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77f91483959428848b33fbe9120b87f9dda0df3fe67a7b4707d9be61e2fc8d0b
3
+ size 4976706864
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e31f26be17dae9d9ed75c56bbc4faddeed5a9a2335c7a0a3e7ff9d5528bd0e3
3
+ size 4999802720
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e5a074ce2475d0554258287e04f46b83ced93208585d76b681864a2916db183
3
+ size 4915916176
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64354c6a5ab462a92505d56a33c4f0db8aadd8701338ebe94eea5f168eecf12b
3
+ size 1168147000
model.safetensors.index.json ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 8030269440,
4
+ "total_size": 16060538880
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00004-of-00004.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
11
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
13
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
17
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
18
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
19
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
20
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
21
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
22
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
23
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
24
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
25
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
26
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
27
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
28
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
29
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
30
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
31
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
32
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
33
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
34
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
35
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
36
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
37
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
38
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
40
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
41
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
43
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
44
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
45
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
46
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
47
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
48
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
49
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
50
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
51
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
53
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
54
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
55
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
57
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
58
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
59
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
60
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
61
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
62
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
63
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
64
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
65
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
67
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
68
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
69
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
70
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
71
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
72
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
73
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
74
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
75
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
76
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
77
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
79
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
80
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
81
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
82
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
84
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
85
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
86
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
87
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
89
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
90
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
91
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
92
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
93
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
94
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
95
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
97
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
98
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
99
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
100
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
101
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
102
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
103
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
104
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
105
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
106
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
107
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
108
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
109
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
110
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
111
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
112
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
113
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
114
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
115
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
117
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
118
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
119
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
120
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
121
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
122
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
123
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
124
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
125
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
126
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
127
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
128
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
129
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
130
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
131
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
132
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
133
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
134
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
135
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
136
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
137
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
138
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
139
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
140
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
141
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
142
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
143
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
144
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
145
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
146
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
147
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
148
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
149
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
150
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
151
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
152
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
153
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
154
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
155
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
156
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
157
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
158
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
159
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
160
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
161
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
162
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
163
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
164
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
165
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
166
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
167
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
168
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
169
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
170
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
171
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
172
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
173
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
174
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
175
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
176
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
177
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
179
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
180
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
181
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
182
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
183
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
184
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
185
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
186
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
187
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
189
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
190
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
191
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
193
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
194
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
195
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
197
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
198
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
199
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
200
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
201
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
202
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
203
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
204
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
205
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
206
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
207
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
208
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
209
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
210
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
211
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
212
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
213
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
214
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
215
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
216
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
217
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
218
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
219
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
220
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
221
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
222
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
223
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
224
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
225
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
226
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
227
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
229
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
230
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
231
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
233
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
234
+ "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
235
+ "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
236
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
237
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
238
+ "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
239
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
240
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
241
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
242
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
243
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
244
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
245
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
246
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
247
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
248
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
249
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
250
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
251
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
252
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
253
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
254
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
255
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
256
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
257
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
258
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
259
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
260
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
261
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
262
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
263
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
264
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
265
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
266
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
267
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
268
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
269
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
270
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
271
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
272
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
273
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
274
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
275
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
276
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
277
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
278
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
279
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
280
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
281
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
282
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
283
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
284
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
285
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
286
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
287
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
288
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
289
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
290
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
291
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
292
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
293
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
294
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
295
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
296
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
297
+ "model.norm.weight": "model-00004-of-00004.safetensors"
298
+ }
299
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2411727f8cd2e4beefb959bb5842f78c67ceb813b76bba117f0ec85c931da64
3
+ size 32121333167
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7362b3e89cd9e2d7e215ca780b2d358374363d8d7abbe8f66f448c6909f796a0
3
+ size 14645
saliency_map.png ADDED
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6570bbca0e1192ffeb73829181133280936e5db8cfa69c3fcaec8c6902b36632
3
+ size 1465
special_tokens_map.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "[",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "]",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "{",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "}",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<+>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ }
38
+ ],
39
+ "bos_token": {
40
+ "content": "<|begin_of_text|>",
41
+ "lstrip": false,
42
+ "normalized": false,
43
+ "rstrip": false,
44
+ "single_word": false
45
+ },
46
+ "eos_token": {
47
+ "content": "<|eot_id|>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false
52
+ },
53
+ "pad_token": {
54
+ "content": "<|eot_id|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false
59
+ }
60
+ }
text_to_code.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f006d84b142ba4fbcd5f70f63a975596bcfa7dfe8901277507b8b57acc07d36
3
+ size 259408285
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11ac3b66638a75d981484ee3713682e63c142ad255bd7cd96d9635ad5e654cdd
3
+ size 17210796
tokenizer_config.json ADDED
@@ -0,0 +1,2110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "58": {
4
+ "content": "[",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "60": {
12
+ "content": "]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "90": {
20
+ "content": "{",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "92": {
28
+ "content": "}",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "<|begin_of_text|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128001": {
44
+ "content": "<|end_of_text|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128002": {
52
+ "content": "<|reserved_special_token_0|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128003": {
60
+ "content": "<|reserved_special_token_1|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128004": {
68
+ "content": "<|finetune_right_pad_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128005": {
76
+ "content": "<|reserved_special_token_2|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128006": {
84
+ "content": "<|start_header_id|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128007": {
92
+ "content": "<|end_header_id|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128008": {
100
+ "content": "<|eom_id|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128009": {
108
+ "content": "<|eot_id|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128010": {
116
+ "content": "<|python_tag|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128011": {
124
+ "content": "<|reserved_special_token_3|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128012": {
132
+ "content": "<|reserved_special_token_4|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128013": {
140
+ "content": "<|reserved_special_token_5|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128014": {
148
+ "content": "<|reserved_special_token_6|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128015": {
156
+ "content": "<|reserved_special_token_7|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128016": {
164
+ "content": "<|reserved_special_token_8|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128017": {
172
+ "content": "<|reserved_special_token_9|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128018": {
180
+ "content": "<|reserved_special_token_10|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128019": {
188
+ "content": "<|reserved_special_token_11|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128020": {
196
+ "content": "<|reserved_special_token_12|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128021": {
204
+ "content": "<|reserved_special_token_13|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128022": {
212
+ "content": "<|reserved_special_token_14|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128023": {
220
+ "content": "<|reserved_special_token_15|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128024": {
228
+ "content": "<|reserved_special_token_16|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128025": {
236
+ "content": "<|reserved_special_token_17|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128026": {
244
+ "content": "<|reserved_special_token_18|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128027": {
252
+ "content": "<|reserved_special_token_19|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128028": {
260
+ "content": "<|reserved_special_token_20|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128029": {
268
+ "content": "<|reserved_special_token_21|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128030": {
276
+ "content": "<|reserved_special_token_22|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128031": {
284
+ "content": "<|reserved_special_token_23|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128032": {
292
+ "content": "<|reserved_special_token_24|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128033": {
300
+ "content": "<|reserved_special_token_25|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128034": {
308
+ "content": "<|reserved_special_token_26|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128035": {
316
+ "content": "<|reserved_special_token_27|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128036": {
324
+ "content": "<|reserved_special_token_28|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128037": {
332
+ "content": "<|reserved_special_token_29|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128038": {
340
+ "content": "<|reserved_special_token_30|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128039": {
348
+ "content": "<|reserved_special_token_31|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128040": {
356
+ "content": "<|reserved_special_token_32|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128041": {
364
+ "content": "<|reserved_special_token_33|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128042": {
372
+ "content": "<|reserved_special_token_34|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128043": {
380
+ "content": "<|reserved_special_token_35|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128044": {
388
+ "content": "<|reserved_special_token_36|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128045": {
396
+ "content": "<|reserved_special_token_37|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128046": {
404
+ "content": "<|reserved_special_token_38|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128047": {
412
+ "content": "<|reserved_special_token_39|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128048": {
420
+ "content": "<|reserved_special_token_40|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128049": {
428
+ "content": "<|reserved_special_token_41|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128050": {
436
+ "content": "<|reserved_special_token_42|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128051": {
444
+ "content": "<|reserved_special_token_43|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128052": {
452
+ "content": "<|reserved_special_token_44|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128053": {
460
+ "content": "<|reserved_special_token_45|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128054": {
468
+ "content": "<|reserved_special_token_46|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128055": {
476
+ "content": "<|reserved_special_token_47|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128056": {
484
+ "content": "<|reserved_special_token_48|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128057": {
492
+ "content": "<|reserved_special_token_49|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128058": {
500
+ "content": "<|reserved_special_token_50|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128059": {
508
+ "content": "<|reserved_special_token_51|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128060": {
516
+ "content": "<|reserved_special_token_52|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128061": {
524
+ "content": "<|reserved_special_token_53|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128062": {
532
+ "content": "<|reserved_special_token_54|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128063": {
540
+ "content": "<|reserved_special_token_55|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128064": {
548
+ "content": "<|reserved_special_token_56|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128065": {
556
+ "content": "<|reserved_special_token_57|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128066": {
564
+ "content": "<|reserved_special_token_58|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128067": {
572
+ "content": "<|reserved_special_token_59|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128068": {
580
+ "content": "<|reserved_special_token_60|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128069": {
588
+ "content": "<|reserved_special_token_61|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128070": {
596
+ "content": "<|reserved_special_token_62|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128071": {
604
+ "content": "<|reserved_special_token_63|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128072": {
612
+ "content": "<|reserved_special_token_64|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128073": {
620
+ "content": "<|reserved_special_token_65|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128074": {
628
+ "content": "<|reserved_special_token_66|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128075": {
636
+ "content": "<|reserved_special_token_67|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128076": {
644
+ "content": "<|reserved_special_token_68|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128077": {
652
+ "content": "<|reserved_special_token_69|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128078": {
660
+ "content": "<|reserved_special_token_70|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128079": {
668
+ "content": "<|reserved_special_token_71|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128080": {
676
+ "content": "<|reserved_special_token_72|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128081": {
684
+ "content": "<|reserved_special_token_73|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128082": {
692
+ "content": "<|reserved_special_token_74|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128083": {
700
+ "content": "<|reserved_special_token_75|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128084": {
708
+ "content": "<|reserved_special_token_76|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128085": {
716
+ "content": "<|reserved_special_token_77|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128086": {
724
+ "content": "<|reserved_special_token_78|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128087": {
732
+ "content": "<|reserved_special_token_79|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128088": {
740
+ "content": "<|reserved_special_token_80|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128089": {
748
+ "content": "<|reserved_special_token_81|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128090": {
756
+ "content": "<|reserved_special_token_82|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128091": {
764
+ "content": "<|reserved_special_token_83|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128092": {
772
+ "content": "<|reserved_special_token_84|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128093": {
780
+ "content": "<|reserved_special_token_85|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128094": {
788
+ "content": "<|reserved_special_token_86|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128095": {
796
+ "content": "<|reserved_special_token_87|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128096": {
804
+ "content": "<|reserved_special_token_88|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128097": {
812
+ "content": "<|reserved_special_token_89|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128098": {
820
+ "content": "<|reserved_special_token_90|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128099": {
828
+ "content": "<|reserved_special_token_91|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128100": {
836
+ "content": "<|reserved_special_token_92|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128101": {
844
+ "content": "<|reserved_special_token_93|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128102": {
852
+ "content": "<|reserved_special_token_94|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128103": {
860
+ "content": "<|reserved_special_token_95|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128104": {
868
+ "content": "<|reserved_special_token_96|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128105": {
876
+ "content": "<|reserved_special_token_97|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128106": {
884
+ "content": "<|reserved_special_token_98|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128107": {
892
+ "content": "<|reserved_special_token_99|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128108": {
900
+ "content": "<|reserved_special_token_100|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128109": {
908
+ "content": "<|reserved_special_token_101|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128110": {
916
+ "content": "<|reserved_special_token_102|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128111": {
924
+ "content": "<|reserved_special_token_103|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128112": {
932
+ "content": "<|reserved_special_token_104|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128113": {
940
+ "content": "<|reserved_special_token_105|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128114": {
948
+ "content": "<|reserved_special_token_106|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128115": {
956
+ "content": "<|reserved_special_token_107|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128116": {
964
+ "content": "<|reserved_special_token_108|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128117": {
972
+ "content": "<|reserved_special_token_109|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128118": {
980
+ "content": "<|reserved_special_token_110|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128119": {
988
+ "content": "<|reserved_special_token_111|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128120": {
996
+ "content": "<|reserved_special_token_112|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128121": {
1004
+ "content": "<|reserved_special_token_113|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128122": {
1012
+ "content": "<|reserved_special_token_114|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128123": {
1020
+ "content": "<|reserved_special_token_115|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128124": {
1028
+ "content": "<|reserved_special_token_116|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128125": {
1036
+ "content": "<|reserved_special_token_117|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128126": {
1044
+ "content": "<|reserved_special_token_118|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128127": {
1052
+ "content": "<|reserved_special_token_119|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128128": {
1060
+ "content": "<|reserved_special_token_120|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128129": {
1068
+ "content": "<|reserved_special_token_121|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128130": {
1076
+ "content": "<|reserved_special_token_122|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128131": {
1084
+ "content": "<|reserved_special_token_123|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128132": {
1092
+ "content": "<|reserved_special_token_124|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128133": {
1100
+ "content": "<|reserved_special_token_125|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128134": {
1108
+ "content": "<|reserved_special_token_126|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128135": {
1116
+ "content": "<|reserved_special_token_127|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128136": {
1124
+ "content": "<|reserved_special_token_128|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128137": {
1132
+ "content": "<|reserved_special_token_129|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128138": {
1140
+ "content": "<|reserved_special_token_130|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128139": {
1148
+ "content": "<|reserved_special_token_131|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128140": {
1156
+ "content": "<|reserved_special_token_132|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128141": {
1164
+ "content": "<|reserved_special_token_133|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128142": {
1172
+ "content": "<|reserved_special_token_134|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128143": {
1180
+ "content": "<|reserved_special_token_135|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128144": {
1188
+ "content": "<|reserved_special_token_136|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128145": {
1196
+ "content": "<|reserved_special_token_137|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128146": {
1204
+ "content": "<|reserved_special_token_138|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128147": {
1212
+ "content": "<|reserved_special_token_139|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128148": {
1220
+ "content": "<|reserved_special_token_140|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128149": {
1228
+ "content": "<|reserved_special_token_141|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128150": {
1236
+ "content": "<|reserved_special_token_142|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128151": {
1244
+ "content": "<|reserved_special_token_143|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128152": {
1252
+ "content": "<|reserved_special_token_144|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128153": {
1260
+ "content": "<|reserved_special_token_145|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128154": {
1268
+ "content": "<|reserved_special_token_146|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128155": {
1276
+ "content": "<|reserved_special_token_147|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128156": {
1284
+ "content": "<|reserved_special_token_148|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128157": {
1292
+ "content": "<|reserved_special_token_149|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128158": {
1300
+ "content": "<|reserved_special_token_150|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128159": {
1308
+ "content": "<|reserved_special_token_151|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128160": {
1316
+ "content": "<|reserved_special_token_152|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128161": {
1324
+ "content": "<|reserved_special_token_153|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128162": {
1332
+ "content": "<|reserved_special_token_154|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128163": {
1340
+ "content": "<|reserved_special_token_155|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128164": {
1348
+ "content": "<|reserved_special_token_156|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128165": {
1356
+ "content": "<|reserved_special_token_157|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128166": {
1364
+ "content": "<|reserved_special_token_158|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128167": {
1372
+ "content": "<|reserved_special_token_159|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128168": {
1380
+ "content": "<|reserved_special_token_160|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128169": {
1388
+ "content": "<|reserved_special_token_161|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128170": {
1396
+ "content": "<|reserved_special_token_162|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128171": {
1404
+ "content": "<|reserved_special_token_163|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128172": {
1412
+ "content": "<|reserved_special_token_164|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128173": {
1420
+ "content": "<|reserved_special_token_165|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128174": {
1428
+ "content": "<|reserved_special_token_166|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128175": {
1436
+ "content": "<|reserved_special_token_167|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128176": {
1444
+ "content": "<|reserved_special_token_168|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128177": {
1452
+ "content": "<|reserved_special_token_169|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128178": {
1460
+ "content": "<|reserved_special_token_170|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128179": {
1468
+ "content": "<|reserved_special_token_171|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128180": {
1476
+ "content": "<|reserved_special_token_172|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128181": {
1484
+ "content": "<|reserved_special_token_173|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128182": {
1492
+ "content": "<|reserved_special_token_174|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128183": {
1500
+ "content": "<|reserved_special_token_175|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128184": {
1508
+ "content": "<|reserved_special_token_176|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128185": {
1516
+ "content": "<|reserved_special_token_177|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128186": {
1524
+ "content": "<|reserved_special_token_178|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128187": {
1532
+ "content": "<|reserved_special_token_179|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128188": {
1540
+ "content": "<|reserved_special_token_180|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128189": {
1548
+ "content": "<|reserved_special_token_181|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128190": {
1556
+ "content": "<|reserved_special_token_182|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128191": {
1564
+ "content": "<|reserved_special_token_183|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128192": {
1572
+ "content": "<|reserved_special_token_184|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128193": {
1580
+ "content": "<|reserved_special_token_185|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128194": {
1588
+ "content": "<|reserved_special_token_186|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128195": {
1596
+ "content": "<|reserved_special_token_187|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128196": {
1604
+ "content": "<|reserved_special_token_188|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128197": {
1612
+ "content": "<|reserved_special_token_189|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128198": {
1620
+ "content": "<|reserved_special_token_190|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128199": {
1628
+ "content": "<|reserved_special_token_191|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128200": {
1636
+ "content": "<|reserved_special_token_192|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128201": {
1644
+ "content": "<|reserved_special_token_193|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128202": {
1652
+ "content": "<|reserved_special_token_194|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128203": {
1660
+ "content": "<|reserved_special_token_195|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128204": {
1668
+ "content": "<|reserved_special_token_196|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128205": {
1676
+ "content": "<|reserved_special_token_197|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128206": {
1684
+ "content": "<|reserved_special_token_198|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128207": {
1692
+ "content": "<|reserved_special_token_199|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128208": {
1700
+ "content": "<|reserved_special_token_200|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128209": {
1708
+ "content": "<|reserved_special_token_201|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128210": {
1716
+ "content": "<|reserved_special_token_202|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128211": {
1724
+ "content": "<|reserved_special_token_203|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128212": {
1732
+ "content": "<|reserved_special_token_204|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128213": {
1740
+ "content": "<|reserved_special_token_205|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128214": {
1748
+ "content": "<|reserved_special_token_206|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128215": {
1756
+ "content": "<|reserved_special_token_207|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128216": {
1764
+ "content": "<|reserved_special_token_208|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128217": {
1772
+ "content": "<|reserved_special_token_209|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128218": {
1780
+ "content": "<|reserved_special_token_210|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128219": {
1788
+ "content": "<|reserved_special_token_211|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128220": {
1796
+ "content": "<|reserved_special_token_212|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128221": {
1804
+ "content": "<|reserved_special_token_213|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128222": {
1812
+ "content": "<|reserved_special_token_214|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128223": {
1820
+ "content": "<|reserved_special_token_215|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128224": {
1828
+ "content": "<|reserved_special_token_216|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128225": {
1836
+ "content": "<|reserved_special_token_217|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128226": {
1844
+ "content": "<|reserved_special_token_218|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128227": {
1852
+ "content": "<|reserved_special_token_219|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128228": {
1860
+ "content": "<|reserved_special_token_220|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128229": {
1868
+ "content": "<|reserved_special_token_221|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128230": {
1876
+ "content": "<|reserved_special_token_222|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128231": {
1884
+ "content": "<|reserved_special_token_223|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128232": {
1892
+ "content": "<|reserved_special_token_224|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128233": {
1900
+ "content": "<|reserved_special_token_225|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128234": {
1908
+ "content": "<|reserved_special_token_226|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128235": {
1916
+ "content": "<|reserved_special_token_227|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128236": {
1924
+ "content": "<|reserved_special_token_228|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128237": {
1932
+ "content": "<|reserved_special_token_229|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128238": {
1940
+ "content": "<|reserved_special_token_230|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128239": {
1948
+ "content": "<|reserved_special_token_231|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128240": {
1956
+ "content": "<|reserved_special_token_232|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128241": {
1964
+ "content": "<|reserved_special_token_233|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128242": {
1972
+ "content": "<|reserved_special_token_234|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128243": {
1980
+ "content": "<|reserved_special_token_235|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128244": {
1988
+ "content": "<|reserved_special_token_236|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128245": {
1996
+ "content": "<|reserved_special_token_237|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128246": {
2004
+ "content": "<|reserved_special_token_238|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128247": {
2012
+ "content": "<|reserved_special_token_239|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128248": {
2020
+ "content": "<|reserved_special_token_240|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128249": {
2028
+ "content": "<|reserved_special_token_241|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128250": {
2036
+ "content": "<|reserved_special_token_242|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128251": {
2044
+ "content": "<|reserved_special_token_243|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ },
2051
+ "128252": {
2052
+ "content": "<|reserved_special_token_244|>",
2053
+ "lstrip": false,
2054
+ "normalized": false,
2055
+ "rstrip": false,
2056
+ "single_word": false,
2057
+ "special": true
2058
+ },
2059
+ "128253": {
2060
+ "content": "<|reserved_special_token_245|>",
2061
+ "lstrip": false,
2062
+ "normalized": false,
2063
+ "rstrip": false,
2064
+ "single_word": false,
2065
+ "special": true
2066
+ },
2067
+ "128254": {
2068
+ "content": "<|reserved_special_token_246|>",
2069
+ "lstrip": false,
2070
+ "normalized": false,
2071
+ "rstrip": false,
2072
+ "single_word": false,
2073
+ "special": true
2074
+ },
2075
+ "128255": {
2076
+ "content": "<|reserved_special_token_247|>",
2077
+ "lstrip": false,
2078
+ "normalized": false,
2079
+ "rstrip": false,
2080
+ "single_word": false,
2081
+ "special": true
2082
+ },
2083
+ "128256": {
2084
+ "content": "<+>",
2085
+ "lstrip": false,
2086
+ "normalized": false,
2087
+ "rstrip": false,
2088
+ "single_word": false,
2089
+ "special": true
2090
+ }
2091
+ },
2092
+ "additional_special_tokens": [
2093
+ "[",
2094
+ "]",
2095
+ "{",
2096
+ "}",
2097
+ "<+>"
2098
+ ],
2099
+ "bos_token": "<|begin_of_text|>",
2100
+ "clean_up_tokenization_spaces": true,
2101
+ "eos_token": "<|eot_id|>",
2102
+ "extra_special_tokens": {},
2103
+ "model_input_names": [
2104
+ "input_ids",
2105
+ "attention_mask"
2106
+ ],
2107
+ "model_max_length": 131072,
2108
+ "pad_token": "<|eot_id|>",
2109
+ "tokenizer_class": "PreTrainedTokenizerFast"
2110
+ }
trainer_state.json ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 162645,
3
+ "best_metric": 0.75,
4
+ "best_model_checkpoint": "models/NED/MedMentions_human_only_tfidf_hybrid_long_v2_addheaders/Llama-3.1-8B-Instruct/checkpoint-162645",
5
+ "epoch": 5.0,
6
+ "eval_steps": 500,
7
+ "global_step": 162645,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "entropy": 1.3341667487071673,
14
+ "epoch": 1.0,
15
+ "grad_norm": 210.0,
16
+ "learning_rate": 1.99991802270771e-05,
17
+ "loss": 0.6196,
18
+ "mean_token_accuracy": 0.8641792057390866,
19
+ "num_tokens": 128523013.0,
20
+ "step": 32529
21
+ },
22
+ {
23
+ "epoch": 1.0,
24
+ "eval_entropy": 1.421211676299572,
25
+ "eval_loss": 0.41451308131217957,
26
+ "eval_mean_token_accuracy": 0.898672553896904,
27
+ "eval_num_gold": 400,
28
+ "eval_num_guess": 400,
29
+ "eval_num_tokens": 128523013.0,
30
+ "eval_recall": 0.7225,
31
+ "eval_runtime": 12.958,
32
+ "eval_samples_per_second": 30.869,
33
+ "eval_steps_per_second": 6.174,
34
+ "step": 32529
35
+ },
36
+ {
37
+ "entropy": 1.0824028312080687,
38
+ "epoch": 2.0,
39
+ "grad_norm": 44.0,
40
+ "learning_rate": 2.9690750074794507e-05,
41
+ "loss": 0.2808,
42
+ "mean_token_accuracy": 0.9327123904494635,
43
+ "num_tokens": 257046026.0,
44
+ "step": 65058
45
+ },
46
+ {
47
+ "epoch": 2.0,
48
+ "eval_entropy": 1.1366001389920712,
49
+ "eval_loss": 0.5472019910812378,
50
+ "eval_mean_token_accuracy": 0.9010569922626018,
51
+ "eval_num_gold": 400,
52
+ "eval_num_guess": 400,
53
+ "eval_num_tokens": 257046026.0,
54
+ "eval_recall": 0.7425,
55
+ "eval_runtime": 12.8244,
56
+ "eval_samples_per_second": 31.191,
57
+ "eval_steps_per_second": 6.238,
58
+ "step": 65058
59
+ },
60
+ {
61
+ "entropy": 0.8375838961411133,
62
+ "epoch": 3.0,
63
+ "grad_norm": 0.24609375,
64
+ "learning_rate": 2.9072193177726957e-05,
65
+ "loss": 0.1261,
66
+ "mean_token_accuracy": 0.9708323454896451,
67
+ "num_tokens": 128523013.0,
68
+ "step": 97587
69
+ },
70
+ {
71
+ "epoch": 3.0,
72
+ "eval_entropy": 0.8833060540258885,
73
+ "eval_loss": 0.5918775796890259,
74
+ "eval_mean_token_accuracy": 0.9003275491297245,
75
+ "eval_num_gold": 400,
76
+ "eval_num_guess": 400,
77
+ "eval_num_tokens": 128523013.0,
78
+ "eval_recall": 0.7375,
79
+ "eval_runtime": 12.8454,
80
+ "eval_samples_per_second": 31.14,
81
+ "eval_steps_per_second": 6.228,
82
+ "step": 97587
83
+ },
84
+ {
85
+ "entropy": 0.6976695484764193,
86
+ "epoch": 4.0,
87
+ "grad_norm": 0.050537109375,
88
+ "learning_rate": 2.845363628065941e-05,
89
+ "loss": 0.0463,
90
+ "mean_token_accuracy": 0.9893607802650367,
91
+ "num_tokens": 257046026.0,
92
+ "step": 130116
93
+ },
94
+ {
95
+ "epoch": 4.0,
96
+ "eval_entropy": 0.7260291546583175,
97
+ "eval_loss": 0.7683401703834534,
98
+ "eval_mean_token_accuracy": 0.9001270815730095,
99
+ "eval_num_gold": 400,
100
+ "eval_num_guess": 400,
101
+ "eval_num_tokens": 257046026.0,
102
+ "eval_recall": 0.7425,
103
+ "eval_runtime": 12.7469,
104
+ "eval_samples_per_second": 31.38,
105
+ "eval_steps_per_second": 6.276,
106
+ "step": 130116
107
+ },
108
+ {
109
+ "entropy": 0.6267366673389785,
110
+ "epoch": 5.0,
111
+ "grad_norm": 0.000751495361328125,
112
+ "learning_rate": 2.7835079383591863e-05,
113
+ "loss": 0.014,
114
+ "mean_token_accuracy": 0.9968704790790366,
115
+ "num_tokens": 128523013.0,
116
+ "step": 162645
117
+ },
118
+ {
119
+ "epoch": 5.0,
120
+ "eval_entropy": 0.6898447863757611,
121
+ "eval_loss": 0.8766273260116577,
122
+ "eval_mean_token_accuracy": 0.8994662061333656,
123
+ "eval_num_gold": 400,
124
+ "eval_num_guess": 400,
125
+ "eval_num_tokens": 128523013.0,
126
+ "eval_recall": 0.75,
127
+ "eval_runtime": 13.1668,
128
+ "eval_samples_per_second": 30.38,
129
+ "eval_steps_per_second": 6.076,
130
+ "step": 162645
131
+ }
132
+ ],
133
+ "logging_steps": 0,
134
+ "max_steps": 1626450,
135
+ "num_input_tokens_seen": 0,
136
+ "num_train_epochs": 50,
137
+ "save_steps": 0,
138
+ "stateful_callbacks": {
139
+ "TrainerControl": {
140
+ "args": {
141
+ "should_epoch_stop": false,
142
+ "should_evaluate": false,
143
+ "should_log": false,
144
+ "should_save": true,
145
+ "should_training_stop": false
146
+ },
147
+ "attributes": {}
148
+ }
149
+ },
150
+ "total_flos": 2.8936680388239032e+19,
151
+ "train_batch_size": 5,
152
+ "trial_name": null,
153
+ "trial_params": null
154
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bd2288704445bf1e107bb6d6ece5c0a854f3a840b54adb53a974dcb70c0329a
3
+ size 6353