AnonymousARR42 commited on
Commit
772467b
·
verified ·
1 Parent(s): ef640a2

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ text_to_code.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+
4
+ base_model:
5
+ - meta-llama/Llama-3.1-8B-Instruct
6
+
7
+ language:
8
+ - fr
9
+
10
+ tags:
11
+ - biomedical-entity-linking
12
+ - entity-linking
13
+ - entity-disambiguation
14
+ - named-entity-linking
15
+ - biomedical
16
+ - healthcare
17
+ - umls
18
+ - quaero
19
+ - text-generation
20
+ - constrained-decoding
21
+ - causal-lm
22
+ - llm
23
+
24
+ library_name: transformers
25
+ pipeline_tag: text-generation
26
+
27
+ datasets:
28
+ - bigbio/quaero
29
+
30
+ finetuning_task:
31
+ - entity-linking
32
+
33
+ metrics:
34
+ - recall
35
+
36
+ model-index:
37
+ - name: LongBEL-8B-QUAERO-EMEA
38
+ results:
39
+ - task:
40
+ type: entity-linking
41
+ name: Biomedical Entity Linking
42
+ dataset:
43
+ type: bigbio/quaero
44
+ name: QUAERO-EMEA
45
+ config: quaero_emea_bigbio_kb
46
+ metrics:
47
+ - type: recall
48
+ name: Recall@1
49
+ value: 0.754
50
+ ---
51
+
52
+ # LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
53
+
54
+ ## LongBEL
55
+
56
+ **LongBEL** is a novel document-level framework for biomedical entity linking (BEL). Instead of normalizing each mention independently, LongBEL conditions each prediction on the document context and on previous normalizations produced in the same document. This design enforces document-level consistency and is enhanced by our **robust memory** mechanism. The method is introduced in our paper, currently under review.
57
+
58
+ ## LongBEL (QUAERO-EMEA Edition)
59
+
60
+ This is a **finetuned version of LLaMA-3-8B** trained on **QUAERO-EMEA**, applying the LongBEL framework to enable long context and robust memory predictions.
61
+
62
+ | Field | Value |
63
+ |---|---|
64
+ | Base model | `meta-llama/Llama-3.1-8B-Instruct` |
65
+ | Task | Biomedical Entity Linking |
66
+ | Dataset | QUAERO-EMEA |
67
+ | Knowledge base | UMLS 2014AA |
68
+ | Input | BigBio-like documents with mention spans and semantic groups |
69
+ | Output | Ranked UMLS concept predictions |
70
+ | Decoding | Semantic-guided constrained decoding |
71
+ | Main metric | Recall@1 |
72
+
73
+
74
+ ## Intended Use
75
+
76
+ This model is intended for research on biomedical entity linking and document-level consistency.
77
+
78
+ It assumes that mention spans and semantic groups are already provided. It does **not** perform named entity recognition. In a full pipeline, a NER model should first detect mentions and assign semantic groups, then LongBEL can normalize these mentions to UMLS concepts.
79
+
80
+ ## Usage
81
+
82
+ ### Loading the model
83
+
84
+ ```python
85
+ import torch
86
+ from transformers import AutoModelForCausalLM
87
+
88
+ model = AutoModelForCausalLM.from_pretrained(
89
+ "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
90
+ trust_remote_code=True,
91
+ torch_dtype=torch.bfloat16,
92
+ device_map="auto",
93
+ )
94
+ ````
95
+
96
+ ### Inference example
97
+
98
+ The model expects BigBio-like documents. Each entity should include a mention text, character offsets, and a semantic group in the `type` field.
99
+
100
+ ```python
101
+ num_beams = 5
102
+
103
+ bigbio_pages = [
104
+ {
105
+ "id": "001",
106
+ "document_id": "doc_001",
107
+ "passages": [
108
+ {
109
+ "id": "0",
110
+ "type": "paragraph",
111
+ "text": [
112
+ "A 29-year-old pregnant woman presented with severe-range hypertension, "
113
+ "headache, and epigastric pain. Laboratory testing showed proteinuria "
114
+ "and mildly elevated liver enzymes. She was admitted overnight with "
115
+ "suspected PET and was started on urgent treatment."
116
+ ],
117
+ "offsets": [[0, 257]],
118
+ }
119
+ ],
120
+ "entities": [
121
+ {
122
+ "id": "T1",
123
+ "type": "Living Beings",
124
+ "text": ["pregnant woman"],
125
+ "offsets": [[14, 28]],
126
+ },
127
+ {
128
+ "id": "T2",
129
+ "type": "Disorders",
130
+ "text": ["severe-range hypertension"],
131
+ "offsets": [[44, 69]],
132
+ },
133
+ {
134
+ "id": "T3",
135
+ "type": "Disorders",
136
+ "text": ["proteinuria"],
137
+ "offsets": [[128, 139]],
138
+ },
139
+ {
140
+ "id": "T4",
141
+ "type": "Disorders",
142
+ "text": ["PET"],
143
+ "offsets": [[217, 220]],
144
+ },
145
+ ],
146
+ "events": [],
147
+ "coreferences": [],
148
+ "relations": [],
149
+ }
150
+ ]
151
+
152
+ predictions = model.sample(
153
+ bigbio_pages=bigbio_pages,
154
+ num_beams=num_beams,
155
+ )
156
+
157
+ for i in range(0, len(predictions), num_beams):
158
+ mention = predictions[i]["mention"]
159
+ print(f"## Mention {(i // num_beams) + 1}: {mention}")
160
+
161
+ for j in range(num_beams):
162
+ pred = predictions[i + j]
163
+ print(
164
+ f" - Beam {j + 1}:\n"
165
+ f" Predicted concept name: {pred['pred_concept_name']}\n"
166
+ f" Predicted code: {pred['pred_concept_code']}\n"
167
+ f" Beam score: {pred['beam_score']:.3f}\n"
168
+ )
169
+ ```
170
+
171
+
172
+ **Example Output:**
173
+
174
+ ```text
175
+ ## Mention 1: pregnant woman
176
+ - Beam 1:
177
+ - Predicted concept name:Pregnant Woman
178
+ - Predicted code: C0033011
179
+ - Beam score: 1.000
180
+
181
+ - Beam 2:
182
+ - Predicted concept name:Pregnant woman
183
+ - Predicted code: C0033011
184
+ - Beam score: 0.003
185
+
186
+ - Beam 3:
187
+ - Predicted concept name:Pregnant woman (person)
188
+ - Predicted code: C0033011
189
+ - Beam score: 0.001
190
+
191
+ - Beam 4:
192
+ - Predicted concept name:Pregnancy Partner
193
+ - Predicted code: C3538996
194
+ - Beam score: 0.000
195
+
196
+ - Beam 5:
197
+ - Predicted concept name:Pregnant woman (person)
198
+ - Predicted code: C0033011
199
+ - Beam score: 0.000
200
+
201
+ ## Mention 2: severe-range hypertension
202
+ - Beam 1:
203
+ - Predicted concept name:Hypertensive disease
204
+ - Predicted code: C0020538
205
+ - Beam score: 0.078
206
+
207
+ - Beam 2:
208
+ - Predicted concept name:Hypertension (in some patients)
209
+ - Predicted code: C3280936
210
+ - Beam score: 0.022
211
+
212
+ - Beam 3:
213
+ - Predicted concept name:Hypertensive disease (disorder)
214
+ - Predicted code: C0020538
215
+ - Beam score: 0.010
216
+
217
+ - Beam 4:
218
+ - Predicted concept name:Hypertension, severe
219
+ - Predicted code: C4013784
220
+ - Beam score: 0.010
221
+
222
+ - Beam 5:
223
+ - Predicted concept name:Hypertension (patient A)
224
+ - Predicted code: C4313262
225
+ - Beam score: 0.004
226
+
227
+ ## Mention 3: proteinuria
228
+ - Beam 1:
229
+ - Predicted concept name:Proteinurias
230
+ - Predicted code: C0033687
231
+ - Beam score: 1.000
232
+
233
+ - Beam 2:
234
+ - Predicted concept name:Proteinuric diabetic nephropathy (disorder)
235
+ - Predicted code: C0403519
236
+ - Beam score: 0.003
237
+
238
+ - Beam 3:
239
+ - Predicted concept name:Proteinuria
240
+ - Predicted code: C0033687
241
+ - Beam score: 0.003
242
+
243
+ - Beam 4:
244
+ - Predicted concept name:Proteinuric diabetic nephropathy
245
+ - Predicted code: C0403519
246
+ - Beam score: 0.002
247
+
248
+ - Beam 5:
249
+ - Predicted concept name:Proteinuric hypertension of pregnancy (disorder)
250
+ - Predicted code: C0032914
251
+ - Beam score: 0.001
252
+
253
+ ## Mention 4: PET
254
+ - Beam 1:
255
+ - Predicted concept name:PET - Pre-eclamptic toxemia
256
+ - Predicted code: C0032914
257
+ - Beam score: 0.075
258
+
259
+ - Beam 2:
260
+ - Predicted concept name:PET - Pre-eclamptic toxaemia
261
+ - Predicted code: C0032914
262
+ - Beam score: 0.039
263
+
264
+ - Beam 3:
265
+ - Predicted concept name:Preeclamptic toxemia
266
+ - Predicted code: C2931877
267
+ - Beam score: 0.027
268
+
269
+ - Beam 4:
270
+ - Predicted concept name:Preeclampsia
271
+ - Predicted code: C0032914
272
+ - Beam score: 0.023
273
+
274
+ - Beam 5:
275
+ - Predicted concept name:Preeclampsia with Severe Features
276
+ - Predicted code: C0341950
277
+ - Beam score: 0.019
278
+ ```
279
+
280
+ ## Evaluation
281
+
282
+ Entity linking performance is reported using Recall@1 with bootstrap confidence intervals. The best result is shown in **bold**, and the second-best result is <u>underlined</u>.
283
+
284
+ | Model | MM-ST21PV<br>(English) | QUAERO-EMEA<br>(French) | SympTEMIST<br>(Spanish) | DisTEMIST<br>(Spanish) | MedProcNER<br>(Spanish) |
285
+ | :--- | :---: | :---: | :---: | :---: | :---: |
286
+ | **Context-Free BEL** ||||| |
287
+ | SciSpacy | 53.8 ± 1.0 | 37.1 ± 4.3 | 9.8 ± 1.3 | 21.1 ± 1.9 | 10.3 ± 1.2 |
288
+ | SapBERT | 65.6 ± 1.0 | 59.7 ± 3.8 | 34.2 ± 2.0 | 38.6 ± 2.6 | 30.4 ± 2.1 |
289
+ | CODER-all | 62.9 ± 1.1 | 66.9 ± 4.0 | 42.2 ± 2.2 | 47.0 ± 2.6 | 42.7 ± 2.1 |
290
+ | SapBERT-all | 64.6 ± 1.1 | 67.9 ± 3.9 | 49.8 ± 2.4 | 49.6 ± 2.6 | 45.1 ± 2.2 |
291
+ | BERGAMOT | 60.9 ± 1.1 | 63.8 ± 4.9 | 48.0 ± 2.7 | 48.9 ± 2.4 | 42.3 ± 2.2 |
292
+ | **Local-Context BEL** ||||| |
293
+ | ArboEL | 76.9 ± 0.9 | 63.0 ± 3.9 | 55.4 ± 2.5 | 54.7 ± 2.6 | 59.7 ± 2.6 |
294
+ | GENRE / mBART-large | 69.6 ± 1.0 | 69.3 ± 5.4 | 59.8 ± 2.7 | 58.7 ± 2.7 | 66.0 ± 2.3 |
295
+ | GENRE / Llama-1B | 73.1 ± 1.0 | 75.1 ± 3.6 | 60.5 ± 2.4 | 62.5 ± 2.3 | 67.4 ± 2.1 |
296
+ | GENRE / Llama-8B | 75.0 ± 0.9 | 73.8 ± 4.0 | 61.7 ± 2.5 | 63.2 ± 2.5 | 68.3 ± 2.2 |
297
+ | **Global-Context BEL: LongBEL** ||||| |
298
+ | LongBEL-1B | 77.6 ± 0.9 | 74.5 ± 3.7 | 59.8 ± 2.5 | 61.9 ± 2.4 | 66.6 ± 2.1 |
299
+ | LongBEL-1B + Ensemble | 78.6 ± 0.8 | <u>77.2 ± 3.0</u> | 61.8 ± 2.5 | <u>64.3 ± 2.2</u> | <u>69.0 ± 2.0</u> |
300
+ | **LongBEL-8B** | <u>79.3 ± 0.8</u> | 75.4 ± 3.4 | <u>62.0 ± 2.6</u> | 63.6 ± 2.1 | <u>69.0 ± 2.1</u> |
301
+ | LongBEL-8B + Ensemble | **80.0 ± 0.8** | **77.6 ± 3.0** | **63.3 ± 2.5** | **65.8 ± 2.2** | **71.0 ± 2.0** |
302
+
303
+ The score reported for this checkpoint is the **single LongBEL-8B model**. The ensemble result requires fusing several LongBEL input configurations and is not produced by this checkpoint alone.
304
+
305
+ ## Speed and Memory
306
+
307
+ Measured on a single NVIDIA H100 80GB GPU.
308
+
309
+ | Model | Model memory | Candidate memory | Speed |
310
+ | ----------------------- | -----------: | ---------------: | --------------: |
311
+ | GENRE-Llama-8B baseline | 28.6 GB | 5.4 GB | 38.2 mentions/s |
312
+ | LongBEL-8B | 28.6 GB | 5.4 GB | 15.2 mentions/s |
313
+
314
+ LongBEL has the same model memory footprint as the sentence-level Llama-8B baseline, but it is slower because it processes longer contexts and updates document-level memory during inference.
315
+
316
+ ## Limitations
317
+
318
+ This model assumes that mention spans and semantic groups are given. It does not perform mention detection.
319
+
320
+ LongBEL is most useful when concepts recur within a document. When most concepts appear only once, the memory mechanism has less information to exploit.
321
+
322
+ Because LongBEL uses previous predictions as memory, early mistakes can still influence later predictions. Robust memory training reduces this risk but does not remove it completely.
323
+
324
+ This model is intended for research use. It should not be used for clinical decision-making without additional validation and human oversight.
325
+
326
+ ## Reproducibility
327
+
328
+ Code and evaluation scripts are available in this [GitHub repository](https://anonymous.4open.science/r/LongBEL-31AD).
329
+
330
+ Trained model checkpoints and processed datasets are available in the anonymous Hugging Face collection associated with LongBEL.
331
+
332
+ <!-- ## Citation
333
+
334
+ If you use this model, please cite the LongBEL paper.
335
+
336
+ ```bibtex
337
+ @inproceedings{longbel2026,
338
+ title = {LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking},
339
+ author = {Anonymous},
340
+ booktitle = {Anonymous submission},
341
+ year = {2026}
342
+ }
343
+ ``` -->
.ipynb_checkpoints/config-checkpoint.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LLamaLongBEL"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 128009,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 14336,
15
+ "max_position_embeddings": 131072,
16
+ "mlp_bias": false,
17
+ "model_type": "llama_longbel",
18
+ "auto_map": {
19
+ "AutoConfig": "longbel.LLamaLongBELConfig",
20
+ "AutoModelForCausalLM": "longbel.LLamaLongBEL"
21
+ },
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 32,
24
+ "num_key_value_heads": 8,
25
+ "pad_token_id": 128009,
26
+ "pretraining_tp": 1,
27
+ "rms_norm_eps": 1e-05,
28
+ "rope_scaling": {
29
+ "factor": 8.0,
30
+ "high_freq_factor": 4.0,
31
+ "low_freq_factor": 1.0,
32
+ "original_max_position_embeddings": 8192,
33
+ "rope_type": "llama3"
34
+ },
35
+ "rope_theta": 500000.0,
36
+ "tie_word_embeddings": false,
37
+ "transformers_version": "4.57.1",
38
+ "use_cache": true,
39
+ "vocab_size": 128257
40
+ }
.ipynb_checkpoints/trainer_state-checkpoint.json ADDED
@@ -0,0 +1,1234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 7359,
3
+ "best_metric": 0.8462,
4
+ "best_model_checkpoint": "models/NED/EMEA_human_only_tfidf_hybrid_long_v2_addheaders/Llama-3.1-8B-Instruct/checkpoint-7359",
5
+ "epoch": 50.0,
6
+ "eval_steps": 500,
7
+ "global_step": 122650,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "entropy": 1.1526817805758311,
14
+ "epoch": 1.0,
15
+ "grad_norm": 304.0,
16
+ "learning_rate": 1.9989130434782608e-05,
17
+ "loss": 0.7669,
18
+ "mean_token_accuracy": 0.8752253057546777,
19
+ "num_tokens": 15010779.0,
20
+ "step": 2453
21
+ },
22
+ {
23
+ "epoch": 1.0,
24
+ "eval_entropy": 1.2358426589232225,
25
+ "eval_loss": 0.6339517831802368,
26
+ "eval_mean_token_accuracy": 0.8988095246828519,
27
+ "eval_num_gold": 26,
28
+ "eval_num_guess": 26,
29
+ "eval_num_tokens": 15010779.0,
30
+ "eval_recall": 0.7308,
31
+ "eval_runtime": 3.6399,
32
+ "eval_samples_per_second": 7.143,
33
+ "eval_steps_per_second": 3.571,
34
+ "step": 2453
35
+ },
36
+ {
37
+ "entropy": 1.3605892632720036,
38
+ "epoch": 2.0,
39
+ "grad_norm": 12.1875,
40
+ "learning_rate": 2.9691098596284776e-05,
41
+ "loss": 0.5437,
42
+ "mean_token_accuracy": 0.9150349811612466,
43
+ "num_tokens": 30021558.0,
44
+ "step": 4906
45
+ },
46
+ {
47
+ "epoch": 2.0,
48
+ "eval_entropy": 1.1509519540346587,
49
+ "eval_loss": 0.4853871166706085,
50
+ "eval_mean_token_accuracy": 0.9201437464127173,
51
+ "eval_num_gold": 26,
52
+ "eval_num_guess": 26,
53
+ "eval_num_tokens": 30021558.0,
54
+ "eval_recall": 0.7692,
55
+ "eval_runtime": 3.627,
56
+ "eval_samples_per_second": 7.168,
57
+ "eval_steps_per_second": 3.584,
58
+ "step": 4906
59
+ },
60
+ {
61
+ "entropy": 1.1862413553719222,
62
+ "epoch": 3.0,
63
+ "grad_norm": 2.1875,
64
+ "learning_rate": 2.9072539295620746e-05,
65
+ "loss": 0.2619,
66
+ "mean_token_accuracy": 0.9548876376794495,
67
+ "num_tokens": 45032337.0,
68
+ "step": 7359
69
+ },
70
+ {
71
+ "epoch": 3.0,
72
+ "eval_entropy": 1.019592651954064,
73
+ "eval_loss": 0.5770813822746277,
74
+ "eval_mean_token_accuracy": 0.9220362993387076,
75
+ "eval_num_gold": 26,
76
+ "eval_num_guess": 26,
77
+ "eval_num_tokens": 45032337.0,
78
+ "eval_recall": 0.8462,
79
+ "eval_runtime": 3.6363,
80
+ "eval_samples_per_second": 7.15,
81
+ "eval_steps_per_second": 3.575,
82
+ "step": 7359
83
+ },
84
+ {
85
+ "entropy": 0.9634018300311497,
86
+ "epoch": 4.0,
87
+ "grad_norm": 0.1240234375,
88
+ "learning_rate": 2.8453979994956713e-05,
89
+ "loss": 0.1216,
90
+ "mean_token_accuracy": 0.9782008502466845,
91
+ "num_tokens": 60043116.0,
92
+ "step": 9812
93
+ },
94
+ {
95
+ "epoch": 4.0,
96
+ "eval_entropy": 0.8699520321992728,
97
+ "eval_loss": 0.5446107387542725,
98
+ "eval_mean_token_accuracy": 0.940018314581651,
99
+ "eval_num_gold": 26,
100
+ "eval_num_guess": 26,
101
+ "eval_num_tokens": 60043116.0,
102
+ "eval_recall": 0.8462,
103
+ "eval_runtime": 3.6143,
104
+ "eval_samples_per_second": 7.194,
105
+ "eval_steps_per_second": 3.597,
106
+ "step": 9812
107
+ },
108
+ {
109
+ "entropy": 0.7849812429144681,
110
+ "epoch": 5.0,
111
+ "grad_norm": 0.002227783203125,
112
+ "learning_rate": 2.783542069429268e-05,
113
+ "loss": 0.0517,
114
+ "mean_token_accuracy": 0.9894482943411997,
115
+ "num_tokens": 75053895.0,
116
+ "step": 12265
117
+ },
118
+ {
119
+ "epoch": 5.0,
120
+ "eval_entropy": 0.6801113898937519,
121
+ "eval_loss": 0.7289856672286987,
122
+ "eval_mean_token_accuracy": 0.9444444454633273,
123
+ "eval_num_gold": 26,
124
+ "eval_num_guess": 26,
125
+ "eval_num_tokens": 75053895.0,
126
+ "eval_recall": 0.8462,
127
+ "eval_runtime": 3.6486,
128
+ "eval_samples_per_second": 7.126,
129
+ "eval_steps_per_second": 3.563,
130
+ "step": 12265
131
+ },
132
+ {
133
+ "entropy": 0.6892432886826181,
134
+ "epoch": 6.0,
135
+ "grad_norm": 0.0004749298095703125,
136
+ "learning_rate": 2.721686139362865e-05,
137
+ "loss": 0.0209,
138
+ "mean_token_accuracy": 0.9958273216359138,
139
+ "num_tokens": 90064674.0,
140
+ "step": 14718
141
+ },
142
+ {
143
+ "epoch": 6.0,
144
+ "eval_entropy": 0.577189931502709,
145
+ "eval_loss": 0.7246649265289307,
146
+ "eval_mean_token_accuracy": 0.9444444454633273,
147
+ "eval_num_gold": 26,
148
+ "eval_num_guess": 26,
149
+ "eval_num_tokens": 90064674.0,
150
+ "eval_recall": 0.8462,
151
+ "eval_runtime": 3.6456,
152
+ "eval_samples_per_second": 7.132,
153
+ "eval_steps_per_second": 3.566,
154
+ "step": 14718
155
+ },
156
+ {
157
+ "entropy": 0.6557439389371696,
158
+ "epoch": 7.0,
159
+ "grad_norm": 0.000888824462890625,
160
+ "learning_rate": 2.659830209296461e-05,
161
+ "loss": 0.0078,
162
+ "mean_token_accuracy": 0.9979321826393635,
163
+ "num_tokens": 105075453.0,
164
+ "step": 17171
165
+ },
166
+ {
167
+ "epoch": 7.0,
168
+ "eval_entropy": 0.5603500146132249,
169
+ "eval_loss": 0.8045116662979126,
170
+ "eval_mean_token_accuracy": 0.9358974374257601,
171
+ "eval_num_gold": 26,
172
+ "eval_num_guess": 26,
173
+ "eval_num_tokens": 105075453.0,
174
+ "eval_recall": 0.8462,
175
+ "eval_runtime": 5.557,
176
+ "eval_samples_per_second": 4.679,
177
+ "eval_steps_per_second": 2.339,
178
+ "step": 17171
179
+ },
180
+ {
181
+ "entropy": 0.6481096161976669,
182
+ "epoch": 8.0,
183
+ "grad_norm": 8.96453857421875e-05,
184
+ "learning_rate": 2.597974279230058e-05,
185
+ "loss": 0.0028,
186
+ "mean_token_accuracy": 0.9993061645391568,
187
+ "num_tokens": 120086232.0,
188
+ "step": 19624
189
+ },
190
+ {
191
+ "epoch": 8.0,
192
+ "eval_entropy": 0.5650725089586698,
193
+ "eval_loss": 0.8335245847702026,
194
+ "eval_mean_token_accuracy": 0.9358974374257601,
195
+ "eval_num_gold": 26,
196
+ "eval_num_guess": 26,
197
+ "eval_num_tokens": 120086232.0,
198
+ "eval_recall": 0.8462,
199
+ "eval_runtime": 3.6391,
200
+ "eval_samples_per_second": 7.145,
201
+ "eval_steps_per_second": 3.572,
202
+ "step": 19624
203
+ },
204
+ {
205
+ "entropy": 0.6384989822756452,
206
+ "epoch": 9.0,
207
+ "grad_norm": 0.00102996826171875,
208
+ "learning_rate": 2.5361183491636548e-05,
209
+ "loss": 0.0011,
210
+ "mean_token_accuracy": 0.9997574686275129,
211
+ "num_tokens": 135097011.0,
212
+ "step": 22077
213
+ },
214
+ {
215
+ "epoch": 9.0,
216
+ "eval_entropy": 0.5437194108963013,
217
+ "eval_loss": 0.8720409870147705,
218
+ "eval_mean_token_accuracy": 0.9358974374257601,
219
+ "eval_num_gold": 26,
220
+ "eval_num_guess": 26,
221
+ "eval_num_tokens": 135097011.0,
222
+ "eval_recall": 0.8462,
223
+ "eval_runtime": 3.6696,
224
+ "eval_samples_per_second": 7.085,
225
+ "eval_steps_per_second": 3.543,
226
+ "step": 22077
227
+ },
228
+ {
229
+ "entropy": 0.6327040182586792,
230
+ "epoch": 10.0,
231
+ "grad_norm": 0.00011968612670898438,
232
+ "learning_rate": 2.4742624190972517e-05,
233
+ "loss": 0.0002,
234
+ "mean_token_accuracy": 0.9999592335818012,
235
+ "num_tokens": 150107790.0,
236
+ "step": 24530
237
+ },
238
+ {
239
+ "epoch": 10.0,
240
+ "eval_entropy": 0.5456434029799241,
241
+ "eval_loss": 0.8786986470222473,
242
+ "eval_mean_token_accuracy": 0.9358974374257601,
243
+ "eval_num_gold": 26,
244
+ "eval_num_guess": 26,
245
+ "eval_num_tokens": 150107790.0,
246
+ "eval_recall": 0.8462,
247
+ "eval_runtime": 3.7213,
248
+ "eval_samples_per_second": 6.987,
249
+ "eval_steps_per_second": 3.493,
250
+ "step": 24530
251
+ },
252
+ {
253
+ "entropy": 0.6342776355527636,
254
+ "epoch": 11.0,
255
+ "grad_norm": 2.9206275939941406e-05,
256
+ "learning_rate": 2.412406489030848e-05,
257
+ "loss": 0.0001,
258
+ "mean_token_accuracy": 0.9999629396397,
259
+ "num_tokens": 165118569.0,
260
+ "step": 26983
261
+ },
262
+ {
263
+ "epoch": 11.0,
264
+ "eval_entropy": 0.5441241906239436,
265
+ "eval_loss": 0.8776129484176636,
266
+ "eval_mean_token_accuracy": 0.9358974374257601,
267
+ "eval_num_gold": 26,
268
+ "eval_num_guess": 26,
269
+ "eval_num_tokens": 165118569.0,
270
+ "eval_recall": 0.8462,
271
+ "eval_runtime": 3.6285,
272
+ "eval_samples_per_second": 7.165,
273
+ "eval_steps_per_second": 3.583,
274
+ "step": 26983
275
+ },
276
+ {
277
+ "entropy": 0.6330991076222742,
278
+ "epoch": 12.0,
279
+ "grad_norm": 0.000823974609375,
280
+ "learning_rate": 2.350550558964445e-05,
281
+ "loss": 0.0,
282
+ "mean_token_accuracy": 1.0,
283
+ "num_tokens": 180129348.0,
284
+ "step": 29436
285
+ },
286
+ {
287
+ "epoch": 12.0,
288
+ "eval_entropy": 0.544509245799138,
289
+ "eval_loss": 0.88084477186203,
290
+ "eval_mean_token_accuracy": 0.9358974374257601,
291
+ "eval_num_gold": 26,
292
+ "eval_num_guess": 26,
293
+ "eval_num_tokens": 180129348.0,
294
+ "eval_recall": 0.8462,
295
+ "eval_runtime": 3.6661,
296
+ "eval_samples_per_second": 7.092,
297
+ "eval_steps_per_second": 3.546,
298
+ "step": 29436
299
+ },
300
+ {
301
+ "entropy": 0.6322705759061291,
302
+ "epoch": 13.0,
303
+ "grad_norm": 0.010498046875,
304
+ "learning_rate": 2.2886946288980416e-05,
305
+ "loss": 0.0,
306
+ "mean_token_accuracy": 1.0,
307
+ "num_tokens": 195140127.0,
308
+ "step": 31889
309
+ },
310
+ {
311
+ "epoch": 13.0,
312
+ "eval_entropy": 0.5434356606923617,
313
+ "eval_loss": 0.8842343091964722,
314
+ "eval_mean_token_accuracy": 0.9358974374257601,
315
+ "eval_num_gold": 26,
316
+ "eval_num_guess": 26,
317
+ "eval_num_tokens": 195140127.0,
318
+ "eval_recall": 0.8462,
319
+ "eval_runtime": 4.1268,
320
+ "eval_samples_per_second": 6.3,
321
+ "eval_steps_per_second": 3.15,
322
+ "step": 31889
323
+ },
324
+ {
325
+ "entropy": 0.6316640121908612,
326
+ "epoch": 14.0,
327
+ "grad_norm": 0.0035552978515625,
328
+ "learning_rate": 2.2268386988316383e-05,
329
+ "loss": 0.0,
330
+ "mean_token_accuracy": 1.0,
331
+ "num_tokens": 210150906.0,
332
+ "step": 34342
333
+ },
334
+ {
335
+ "epoch": 14.0,
336
+ "eval_entropy": 0.543243577847114,
337
+ "eval_loss": 0.885927140712738,
338
+ "eval_mean_token_accuracy": 0.9358974374257601,
339
+ "eval_num_gold": 26,
340
+ "eval_num_guess": 26,
341
+ "eval_num_tokens": 210150906.0,
342
+ "eval_recall": 0.8462,
343
+ "eval_runtime": 3.7188,
344
+ "eval_samples_per_second": 6.991,
345
+ "eval_steps_per_second": 3.496,
346
+ "step": 34342
347
+ },
348
+ {
349
+ "entropy": 0.6321596540070241,
350
+ "epoch": 15.0,
351
+ "grad_norm": 2.4199485778808594e-05,
352
+ "learning_rate": 2.164982768765235e-05,
353
+ "loss": 0.0,
354
+ "mean_token_accuracy": 1.0,
355
+ "num_tokens": 225161685.0,
356
+ "step": 36795
357
+ },
358
+ {
359
+ "epoch": 15.0,
360
+ "eval_entropy": 0.5422769280580374,
361
+ "eval_loss": 0.8823052644729614,
362
+ "eval_mean_token_accuracy": 0.9358974374257601,
363
+ "eval_num_gold": 26,
364
+ "eval_num_guess": 26,
365
+ "eval_num_tokens": 225161685.0,
366
+ "eval_recall": 0.8462,
367
+ "eval_runtime": 3.6723,
368
+ "eval_samples_per_second": 7.08,
369
+ "eval_steps_per_second": 3.54,
370
+ "step": 36795
371
+ },
372
+ {
373
+ "entropy": 0.6315903761194426,
374
+ "epoch": 16.0,
375
+ "grad_norm": 0.0291748046875,
376
+ "learning_rate": 2.1031268386988316e-05,
377
+ "loss": 0.0,
378
+ "mean_token_accuracy": 1.0,
379
+ "num_tokens": 240172464.0,
380
+ "step": 39248
381
+ },
382
+ {
383
+ "epoch": 16.0,
384
+ "eval_entropy": 0.5426660546889672,
385
+ "eval_loss": 0.8869765996932983,
386
+ "eval_mean_token_accuracy": 0.9358974374257601,
387
+ "eval_num_gold": 26,
388
+ "eval_num_guess": 26,
389
+ "eval_num_tokens": 240172464.0,
390
+ "eval_recall": 0.8462,
391
+ "eval_runtime": 3.6896,
392
+ "eval_samples_per_second": 7.047,
393
+ "eval_steps_per_second": 3.523,
394
+ "step": 39248
395
+ },
396
+ {
397
+ "entropy": 0.6317922561279472,
398
+ "epoch": 17.0,
399
+ "grad_norm": 0.0001850128173828125,
400
+ "learning_rate": 2.0412709086324285e-05,
401
+ "loss": 0.0,
402
+ "mean_token_accuracy": 1.0,
403
+ "num_tokens": 255183243.0,
404
+ "step": 41701
405
+ },
406
+ {
407
+ "epoch": 17.0,
408
+ "eval_entropy": 0.542809899036701,
409
+ "eval_loss": 0.8864607214927673,
410
+ "eval_mean_token_accuracy": 0.9358974374257601,
411
+ "eval_num_gold": 26,
412
+ "eval_num_guess": 26,
413
+ "eval_num_tokens": 255183243.0,
414
+ "eval_recall": 0.8462,
415
+ "eval_runtime": 3.6498,
416
+ "eval_samples_per_second": 7.124,
417
+ "eval_steps_per_second": 3.562,
418
+ "step": 41701
419
+ },
420
+ {
421
+ "entropy": 0.6319634849034763,
422
+ "epoch": 18.0,
423
+ "grad_norm": 2.1457672119140625e-05,
424
+ "learning_rate": 1.979414978566025e-05,
425
+ "loss": 0.0,
426
+ "mean_token_accuracy": 1.0,
427
+ "num_tokens": 270194022.0,
428
+ "step": 44154
429
+ },
430
+ {
431
+ "epoch": 18.0,
432
+ "eval_entropy": 0.5426488243616544,
433
+ "eval_loss": 0.8861849308013916,
434
+ "eval_mean_token_accuracy": 0.9358974374257601,
435
+ "eval_num_gold": 26,
436
+ "eval_num_guess": 26,
437
+ "eval_num_tokens": 270194022.0,
438
+ "eval_recall": 0.8462,
439
+ "eval_runtime": 3.6568,
440
+ "eval_samples_per_second": 7.11,
441
+ "eval_steps_per_second": 3.555,
442
+ "step": 44154
443
+ },
444
+ {
445
+ "entropy": 0.631338802688325,
446
+ "epoch": 19.0,
447
+ "grad_norm": 4.076957702636719e-05,
448
+ "learning_rate": 1.9175590484996218e-05,
449
+ "loss": 0.0,
450
+ "mean_token_accuracy": 1.0,
451
+ "num_tokens": 285204801.0,
452
+ "step": 46607
453
+ },
454
+ {
455
+ "epoch": 19.0,
456
+ "eval_entropy": 0.5423762339812058,
457
+ "eval_loss": 0.885791540145874,
458
+ "eval_mean_token_accuracy": 0.9358974374257601,
459
+ "eval_num_gold": 26,
460
+ "eval_num_guess": 26,
461
+ "eval_num_tokens": 285204801.0,
462
+ "eval_recall": 0.8462,
463
+ "eval_runtime": 3.653,
464
+ "eval_samples_per_second": 7.118,
465
+ "eval_steps_per_second": 3.559,
466
+ "step": 46607
467
+ },
468
+ {
469
+ "entropy": 0.6311312203036976,
470
+ "epoch": 20.0,
471
+ "grad_norm": 0.0004634857177734375,
472
+ "learning_rate": 1.8557031184332184e-05,
473
+ "loss": 0.0,
474
+ "mean_token_accuracy": 1.0,
475
+ "num_tokens": 300215580.0,
476
+ "step": 49060
477
+ },
478
+ {
479
+ "epoch": 20.0,
480
+ "eval_entropy": 0.5424229686076825,
481
+ "eval_loss": 0.8889456987380981,
482
+ "eval_mean_token_accuracy": 0.9358974374257601,
483
+ "eval_num_gold": 26,
484
+ "eval_num_guess": 26,
485
+ "eval_num_tokens": 300215580.0,
486
+ "eval_recall": 0.8462,
487
+ "eval_runtime": 3.651,
488
+ "eval_samples_per_second": 7.121,
489
+ "eval_steps_per_second": 3.561,
490
+ "step": 49060
491
+ },
492
+ {
493
+ "entropy": 0.631198678741249,
494
+ "epoch": 21.0,
495
+ "grad_norm": 0.00031280517578125,
496
+ "learning_rate": 1.793847188366815e-05,
497
+ "loss": 0.0,
498
+ "mean_token_accuracy": 1.0,
499
+ "num_tokens": 315226359.0,
500
+ "step": 51513
501
+ },
502
+ {
503
+ "epoch": 21.0,
504
+ "eval_entropy": 0.5428222968028142,
505
+ "eval_loss": 0.8843169808387756,
506
+ "eval_mean_token_accuracy": 0.9358974374257601,
507
+ "eval_num_gold": 26,
508
+ "eval_num_guess": 26,
509
+ "eval_num_tokens": 315226359.0,
510
+ "eval_recall": 0.8462,
511
+ "eval_runtime": 3.6619,
512
+ "eval_samples_per_second": 7.1,
513
+ "eval_steps_per_second": 3.55,
514
+ "step": 51513
515
+ },
516
+ {
517
+ "entropy": 0.6313406728478388,
518
+ "epoch": 22.0,
519
+ "grad_norm": 0.000759124755859375,
520
+ "learning_rate": 1.731991258300412e-05,
521
+ "loss": 0.0,
522
+ "mean_token_accuracy": 1.0,
523
+ "num_tokens": 330237138.0,
524
+ "step": 53966
525
+ },
526
+ {
527
+ "epoch": 22.0,
528
+ "eval_entropy": 0.5427144765853882,
529
+ "eval_loss": 0.8861469030380249,
530
+ "eval_mean_token_accuracy": 0.9358974374257601,
531
+ "eval_num_gold": 26,
532
+ "eval_num_guess": 26,
533
+ "eval_num_tokens": 330237138.0,
534
+ "eval_recall": 0.8462,
535
+ "eval_runtime": 3.6544,
536
+ "eval_samples_per_second": 7.115,
537
+ "eval_steps_per_second": 3.557,
538
+ "step": 53966
539
+ },
540
+ {
541
+ "entropy": 0.6313331465647263,
542
+ "epoch": 23.0,
543
+ "grad_norm": 0.00051116943359375,
544
+ "learning_rate": 1.6701353282340083e-05,
545
+ "loss": 0.0,
546
+ "mean_token_accuracy": 1.0,
547
+ "num_tokens": 345247917.0,
548
+ "step": 56419
549
+ },
550
+ {
551
+ "epoch": 23.0,
552
+ "eval_entropy": 0.5423137545585632,
553
+ "eval_loss": 0.8892049193382263,
554
+ "eval_mean_token_accuracy": 0.9358974374257601,
555
+ "eval_num_gold": 26,
556
+ "eval_num_guess": 26,
557
+ "eval_num_tokens": 345247917.0,
558
+ "eval_recall": 0.8462,
559
+ "eval_runtime": 3.6537,
560
+ "eval_samples_per_second": 7.116,
561
+ "eval_steps_per_second": 3.558,
562
+ "step": 56419
563
+ },
564
+ {
565
+ "entropy": 0.6310314053401527,
566
+ "epoch": 24.0,
567
+ "grad_norm": 3.600120544433594e-05,
568
+ "learning_rate": 1.6082793981676053e-05,
569
+ "loss": 0.0,
570
+ "mean_token_accuracy": 1.0,
571
+ "num_tokens": 360258696.0,
572
+ "step": 58872
573
+ },
574
+ {
575
+ "epoch": 24.0,
576
+ "eval_entropy": 0.5423843631377587,
577
+ "eval_loss": 0.8886714577674866,
578
+ "eval_mean_token_accuracy": 0.9358974374257601,
579
+ "eval_num_gold": 26,
580
+ "eval_num_guess": 26,
581
+ "eval_num_tokens": 360258696.0,
582
+ "eval_recall": 0.8462,
583
+ "eval_runtime": 3.6316,
584
+ "eval_samples_per_second": 7.159,
585
+ "eval_steps_per_second": 3.58,
586
+ "step": 58872
587
+ },
588
+ {
589
+ "entropy": 0.6315073234496484,
590
+ "epoch": 25.0,
591
+ "grad_norm": 7.82012939453125e-05,
592
+ "learning_rate": 1.546423468101202e-05,
593
+ "loss": 0.0,
594
+ "mean_token_accuracy": 1.0,
595
+ "num_tokens": 375269475.0,
596
+ "step": 61325
597
+ },
598
+ {
599
+ "epoch": 25.0,
600
+ "eval_entropy": 0.5420686419193561,
601
+ "eval_loss": 0.8865240812301636,
602
+ "eval_mean_token_accuracy": 0.9358974374257601,
603
+ "eval_num_gold": 26,
604
+ "eval_num_guess": 26,
605
+ "eval_num_tokens": 375269475.0,
606
+ "eval_recall": 0.8462,
607
+ "eval_runtime": 3.613,
608
+ "eval_samples_per_second": 7.196,
609
+ "eval_steps_per_second": 3.598,
610
+ "step": 61325
611
+ },
612
+ {
613
+ "entropy": 0.632054461467718,
614
+ "epoch": 26.0,
615
+ "grad_norm": 0.00024318695068359375,
616
+ "learning_rate": 1.4845675380347987e-05,
617
+ "loss": 0.0,
618
+ "mean_token_accuracy": 1.0,
619
+ "num_tokens": 15010779.0,
620
+ "step": 63778
621
+ },
622
+ {
623
+ "epoch": 26.0,
624
+ "eval_entropy": 0.5426568893285898,
625
+ "eval_loss": 0.88667893409729,
626
+ "eval_mean_token_accuracy": 0.9358974374257601,
627
+ "eval_num_gold": 26,
628
+ "eval_num_guess": 26,
629
+ "eval_num_tokens": 15010779.0,
630
+ "eval_recall": 0.8462,
631
+ "eval_runtime": 3.647,
632
+ "eval_samples_per_second": 7.129,
633
+ "eval_steps_per_second": 3.565,
634
+ "step": 63778
635
+ },
636
+ {
637
+ "entropy": 0.6314872418356777,
638
+ "epoch": 27.0,
639
+ "grad_norm": 0.00011396408081054688,
640
+ "learning_rate": 1.4227116079683954e-05,
641
+ "loss": 0.0,
642
+ "mean_token_accuracy": 1.0,
643
+ "num_tokens": 30021558.0,
644
+ "step": 66231
645
+ },
646
+ {
647
+ "epoch": 27.0,
648
+ "eval_entropy": 0.5423887417866633,
649
+ "eval_loss": 0.8907365798950195,
650
+ "eval_mean_token_accuracy": 0.9358974374257601,
651
+ "eval_num_gold": 26,
652
+ "eval_num_guess": 26,
653
+ "eval_num_tokens": 30021558.0,
654
+ "eval_recall": 0.8462,
655
+ "eval_runtime": 3.6242,
656
+ "eval_samples_per_second": 7.174,
657
+ "eval_steps_per_second": 3.587,
658
+ "step": 66231
659
+ },
660
+ {
661
+ "entropy": 0.6317801613055392,
662
+ "epoch": 28.0,
663
+ "grad_norm": 8.392333984375e-05,
664
+ "learning_rate": 1.3608556779019922e-05,
665
+ "loss": 0.0,
666
+ "mean_token_accuracy": 1.0,
667
+ "num_tokens": 45032337.0,
668
+ "step": 68684
669
+ },
670
+ {
671
+ "epoch": 28.0,
672
+ "eval_entropy": 0.5428364735383254,
673
+ "eval_loss": 0.885719358921051,
674
+ "eval_mean_token_accuracy": 0.9358974374257601,
675
+ "eval_num_gold": 26,
676
+ "eval_num_guess": 26,
677
+ "eval_num_tokens": 45032337.0,
678
+ "eval_recall": 0.8462,
679
+ "eval_runtime": 3.6828,
680
+ "eval_samples_per_second": 7.06,
681
+ "eval_steps_per_second": 3.53,
682
+ "step": 68684
683
+ },
684
+ {
685
+ "entropy": 0.6310389586555389,
686
+ "epoch": 29.0,
687
+ "grad_norm": 0.000774383544921875,
688
+ "learning_rate": 1.2989997478355888e-05,
689
+ "loss": 0.0,
690
+ "mean_token_accuracy": 1.0,
691
+ "num_tokens": 60043116.0,
692
+ "step": 71137
693
+ },
694
+ {
695
+ "epoch": 29.0,
696
+ "eval_entropy": 0.5424722524789664,
697
+ "eval_loss": 0.8864960074424744,
698
+ "eval_mean_token_accuracy": 0.9358974374257601,
699
+ "eval_num_gold": 26,
700
+ "eval_num_guess": 26,
701
+ "eval_num_tokens": 60043116.0,
702
+ "eval_recall": 0.8462,
703
+ "eval_runtime": 3.6359,
704
+ "eval_samples_per_second": 7.151,
705
+ "eval_steps_per_second": 3.576,
706
+ "step": 71137
707
+ },
708
+ {
709
+ "entropy": 0.6310345640461444,
710
+ "epoch": 30.0,
711
+ "grad_norm": 3.5762786865234375e-05,
712
+ "learning_rate": 1.2371438177691856e-05,
713
+ "loss": 0.0,
714
+ "mean_token_accuracy": 1.0,
715
+ "num_tokens": 75053895.0,
716
+ "step": 73590
717
+ },
718
+ {
719
+ "epoch": 30.0,
720
+ "eval_entropy": 0.5427528161268967,
721
+ "eval_loss": 0.8871183395385742,
722
+ "eval_mean_token_accuracy": 0.9358974374257601,
723
+ "eval_num_gold": 26,
724
+ "eval_num_guess": 26,
725
+ "eval_num_tokens": 75053895.0,
726
+ "eval_recall": 0.8462,
727
+ "eval_runtime": 3.6648,
728
+ "eval_samples_per_second": 7.095,
729
+ "eval_steps_per_second": 3.547,
730
+ "step": 73590
731
+ },
732
+ {
733
+ "entropy": 0.6307261824680745,
734
+ "epoch": 31.0,
735
+ "grad_norm": 0.00015163421630859375,
736
+ "learning_rate": 1.1752878877027823e-05,
737
+ "loss": 0.0,
738
+ "mean_token_accuracy": 1.0,
739
+ "num_tokens": 90064674.0,
740
+ "step": 76043
741
+ },
742
+ {
743
+ "epoch": 31.0,
744
+ "eval_entropy": 0.5423439878683823,
745
+ "eval_loss": 0.890313982963562,
746
+ "eval_mean_token_accuracy": 0.9358974374257601,
747
+ "eval_num_gold": 26,
748
+ "eval_num_guess": 26,
749
+ "eval_num_tokens": 90064674.0,
750
+ "eval_recall": 0.8462,
751
+ "eval_runtime": 3.6589,
752
+ "eval_samples_per_second": 7.106,
753
+ "eval_steps_per_second": 3.553,
754
+ "step": 76043
755
+ },
756
+ {
757
+ "entropy": 0.6317850742056279,
758
+ "epoch": 32.0,
759
+ "grad_norm": 0.0005035400390625,
760
+ "learning_rate": 1.113431957636379e-05,
761
+ "loss": 0.0,
762
+ "mean_token_accuracy": 1.0,
763
+ "num_tokens": 105075453.0,
764
+ "step": 78496
765
+ },
766
+ {
767
+ "epoch": 32.0,
768
+ "eval_entropy": 0.5422184283916767,
769
+ "eval_loss": 0.8882402181625366,
770
+ "eval_mean_token_accuracy": 0.9358974374257601,
771
+ "eval_num_gold": 26,
772
+ "eval_num_guess": 26,
773
+ "eval_num_tokens": 105075453.0,
774
+ "eval_recall": 0.8462,
775
+ "eval_runtime": 3.6075,
776
+ "eval_samples_per_second": 7.207,
777
+ "eval_steps_per_second": 3.604,
778
+ "step": 78496
779
+ },
780
+ {
781
+ "entropy": 0.6315069926961121,
782
+ "epoch": 33.0,
783
+ "grad_norm": 0.0079345703125,
784
+ "learning_rate": 1.0515760275699757e-05,
785
+ "loss": 0.0,
786
+ "mean_token_accuracy": 1.0,
787
+ "num_tokens": 120086232.0,
788
+ "step": 80949
789
+ },
790
+ {
791
+ "epoch": 33.0,
792
+ "eval_entropy": 0.5428683024186355,
793
+ "eval_loss": 0.8859032988548279,
794
+ "eval_mean_token_accuracy": 0.9358974374257601,
795
+ "eval_num_gold": 26,
796
+ "eval_num_guess": 26,
797
+ "eval_num_tokens": 120086232.0,
798
+ "eval_recall": 0.8462,
799
+ "eval_runtime": 3.6537,
800
+ "eval_samples_per_second": 7.116,
801
+ "eval_steps_per_second": 3.558,
802
+ "step": 80949
803
+ },
804
+ {
805
+ "entropy": 0.6313212784246381,
806
+ "epoch": 34.0,
807
+ "grad_norm": 0.000885009765625,
808
+ "learning_rate": 9.897200975035723e-06,
809
+ "loss": 0.0,
810
+ "mean_token_accuracy": 1.0,
811
+ "num_tokens": 135097011.0,
812
+ "step": 83402
813
+ },
814
+ {
815
+ "epoch": 34.0,
816
+ "eval_entropy": 0.5425068598527175,
817
+ "eval_loss": 0.887780487537384,
818
+ "eval_mean_token_accuracy": 0.9358974374257601,
819
+ "eval_num_gold": 26,
820
+ "eval_num_guess": 26,
821
+ "eval_num_tokens": 135097011.0,
822
+ "eval_recall": 0.8462,
823
+ "eval_runtime": 3.6448,
824
+ "eval_samples_per_second": 7.133,
825
+ "eval_steps_per_second": 3.567,
826
+ "step": 83402
827
+ },
828
+ {
829
+ "entropy": 0.6308202771352254,
830
+ "epoch": 35.0,
831
+ "grad_norm": 0.00032806396484375,
832
+ "learning_rate": 9.27864167437169e-06,
833
+ "loss": 0.0,
834
+ "mean_token_accuracy": 1.0,
835
+ "num_tokens": 150107790.0,
836
+ "step": 85855
837
+ },
838
+ {
839
+ "epoch": 35.0,
840
+ "eval_entropy": 0.54246619114509,
841
+ "eval_loss": 0.8900800347328186,
842
+ "eval_mean_token_accuracy": 0.9358974374257601,
843
+ "eval_num_gold": 26,
844
+ "eval_num_guess": 26,
845
+ "eval_num_tokens": 150107790.0,
846
+ "eval_recall": 0.8462,
847
+ "eval_runtime": 3.6253,
848
+ "eval_samples_per_second": 7.172,
849
+ "eval_steps_per_second": 3.586,
850
+ "step": 85855
851
+ },
852
+ {
853
+ "entropy": 0.6310893858737767,
854
+ "epoch": 36.0,
855
+ "grad_norm": 0.00543212890625,
856
+ "learning_rate": 8.660082373707658e-06,
857
+ "loss": 0.0,
858
+ "mean_token_accuracy": 1.0,
859
+ "num_tokens": 165118569.0,
860
+ "step": 88308
861
+ },
862
+ {
863
+ "epoch": 36.0,
864
+ "eval_entropy": 0.542354785479032,
865
+ "eval_loss": 0.882867157459259,
866
+ "eval_mean_token_accuracy": 0.9358974374257601,
867
+ "eval_num_gold": 26,
868
+ "eval_num_guess": 26,
869
+ "eval_num_tokens": 165118569.0,
870
+ "eval_recall": 0.8462,
871
+ "eval_runtime": 3.6309,
872
+ "eval_samples_per_second": 7.161,
873
+ "eval_steps_per_second": 3.58,
874
+ "step": 88308
875
+ },
876
+ {
877
+ "entropy": 0.6313383878492308,
878
+ "epoch": 37.0,
879
+ "grad_norm": 0.0014495849609375,
880
+ "learning_rate": 8.041523073043624e-06,
881
+ "loss": 0.0,
882
+ "mean_token_accuracy": 1.0,
883
+ "num_tokens": 180129348.0,
884
+ "step": 90761
885
+ },
886
+ {
887
+ "epoch": 37.0,
888
+ "eval_entropy": 0.5429406670423654,
889
+ "eval_loss": 0.8894430994987488,
890
+ "eval_mean_token_accuracy": 0.9358974374257601,
891
+ "eval_num_gold": 26,
892
+ "eval_num_guess": 26,
893
+ "eval_num_tokens": 180129348.0,
894
+ "eval_recall": 0.8462,
895
+ "eval_runtime": 3.6047,
896
+ "eval_samples_per_second": 7.213,
897
+ "eval_steps_per_second": 3.606,
898
+ "step": 90761
899
+ },
900
+ {
901
+ "entropy": 0.6315074832012738,
902
+ "epoch": 38.0,
903
+ "grad_norm": 1.8477439880371094e-05,
904
+ "learning_rate": 7.422963772379592e-06,
905
+ "loss": 0.0,
906
+ "mean_token_accuracy": 1.0,
907
+ "num_tokens": 195140127.0,
908
+ "step": 93214
909
+ },
910
+ {
911
+ "epoch": 38.0,
912
+ "eval_entropy": 0.5428708929281968,
913
+ "eval_loss": 0.8853751420974731,
914
+ "eval_mean_token_accuracy": 0.9358974374257601,
915
+ "eval_num_gold": 26,
916
+ "eval_num_guess": 26,
917
+ "eval_num_tokens": 195140127.0,
918
+ "eval_recall": 0.8462,
919
+ "eval_runtime": 3.6095,
920
+ "eval_samples_per_second": 7.203,
921
+ "eval_steps_per_second": 3.602,
922
+ "step": 93214
923
+ },
924
+ {
925
+ "entropy": 0.6316086658156264,
926
+ "epoch": 39.0,
927
+ "grad_norm": 0.0019378662109375,
928
+ "learning_rate": 6.804404471715559e-06,
929
+ "loss": 0.0,
930
+ "mean_token_accuracy": 1.0,
931
+ "num_tokens": 210150906.0,
932
+ "step": 95667
933
+ },
934
+ {
935
+ "epoch": 39.0,
936
+ "eval_entropy": 0.5423155472828791,
937
+ "eval_loss": 0.8865050673484802,
938
+ "eval_mean_token_accuracy": 0.9358974374257601,
939
+ "eval_num_gold": 26,
940
+ "eval_num_guess": 26,
941
+ "eval_num_tokens": 210150906.0,
942
+ "eval_recall": 0.8462,
943
+ "eval_runtime": 3.6105,
944
+ "eval_samples_per_second": 7.201,
945
+ "eval_steps_per_second": 3.601,
946
+ "step": 95667
947
+ },
948
+ {
949
+ "entropy": 0.6319762418161253,
950
+ "epoch": 40.0,
951
+ "grad_norm": 0.0076904296875,
952
+ "learning_rate": 6.185845171051526e-06,
953
+ "loss": 0.0,
954
+ "mean_token_accuracy": 1.0,
955
+ "num_tokens": 225161685.0,
956
+ "step": 98120
957
+ },
958
+ {
959
+ "epoch": 40.0,
960
+ "eval_entropy": 0.5423448315033546,
961
+ "eval_loss": 0.887237012386322,
962
+ "eval_mean_token_accuracy": 0.9358974374257601,
963
+ "eval_num_gold": 26,
964
+ "eval_num_guess": 26,
965
+ "eval_num_tokens": 225161685.0,
966
+ "eval_recall": 0.8462,
967
+ "eval_runtime": 3.6062,
968
+ "eval_samples_per_second": 7.21,
969
+ "eval_steps_per_second": 3.605,
970
+ "step": 98120
971
+ },
972
+ {
973
+ "entropy": 0.6316094772090632,
974
+ "epoch": 41.0,
975
+ "grad_norm": 0.00040435791015625,
976
+ "learning_rate": 5.567285870387493e-06,
977
+ "loss": 0.0,
978
+ "mean_token_accuracy": 1.0,
979
+ "num_tokens": 240172464.0,
980
+ "step": 100573
981
+ },
982
+ {
983
+ "epoch": 41.0,
984
+ "eval_entropy": 0.5424330555475675,
985
+ "eval_loss": 0.8862788081169128,
986
+ "eval_mean_token_accuracy": 0.9358974374257601,
987
+ "eval_num_gold": 26,
988
+ "eval_num_guess": 26,
989
+ "eval_num_tokens": 240172464.0,
990
+ "eval_recall": 0.8462,
991
+ "eval_runtime": 3.6042,
992
+ "eval_samples_per_second": 7.214,
993
+ "eval_steps_per_second": 3.607,
994
+ "step": 100573
995
+ },
996
+ {
997
+ "entropy": 0.6310035889118581,
998
+ "epoch": 42.0,
999
+ "grad_norm": 0.0020294189453125,
1000
+ "learning_rate": 4.94872656972346e-06,
1001
+ "loss": 0.0,
1002
+ "mean_token_accuracy": 1.0,
1003
+ "num_tokens": 255183243.0,
1004
+ "step": 103026
1005
+ },
1006
+ {
1007
+ "epoch": 42.0,
1008
+ "eval_entropy": 0.5431472292313209,
1009
+ "eval_loss": 0.890018105506897,
1010
+ "eval_mean_token_accuracy": 0.9358974374257601,
1011
+ "eval_num_gold": 26,
1012
+ "eval_num_guess": 26,
1013
+ "eval_num_tokens": 255183243.0,
1014
+ "eval_recall": 0.8462,
1015
+ "eval_runtime": 3.6041,
1016
+ "eval_samples_per_second": 7.214,
1017
+ "eval_steps_per_second": 3.607,
1018
+ "step": 103026
1019
+ },
1020
+ {
1021
+ "entropy": 0.6312229550229838,
1022
+ "epoch": 43.0,
1023
+ "grad_norm": 0.0012969970703125,
1024
+ "learning_rate": 4.330167269059427e-06,
1025
+ "loss": 0.0,
1026
+ "mean_token_accuracy": 1.0,
1027
+ "num_tokens": 270194022.0,
1028
+ "step": 105479
1029
+ },
1030
+ {
1031
+ "epoch": 43.0,
1032
+ "eval_entropy": 0.5424636235603919,
1033
+ "eval_loss": 0.8868480324745178,
1034
+ "eval_mean_token_accuracy": 0.9358974374257601,
1035
+ "eval_num_gold": 26,
1036
+ "eval_num_guess": 26,
1037
+ "eval_num_tokens": 270194022.0,
1038
+ "eval_recall": 0.8462,
1039
+ "eval_runtime": 3.606,
1040
+ "eval_samples_per_second": 7.21,
1041
+ "eval_steps_per_second": 3.605,
1042
+ "step": 105479
1043
+ },
1044
+ {
1045
+ "entropy": 0.631434175660063,
1046
+ "epoch": 44.0,
1047
+ "grad_norm": 7.390975952148438e-05,
1048
+ "learning_rate": 3.711607968395394e-06,
1049
+ "loss": 0.0,
1050
+ "mean_token_accuracy": 1.0,
1051
+ "num_tokens": 285204801.0,
1052
+ "step": 107932
1053
+ },
1054
+ {
1055
+ "epoch": 44.0,
1056
+ "eval_entropy": 0.5421680899766775,
1057
+ "eval_loss": 0.8860384821891785,
1058
+ "eval_mean_token_accuracy": 0.9358974374257601,
1059
+ "eval_num_gold": 26,
1060
+ "eval_num_guess": 26,
1061
+ "eval_num_tokens": 285204801.0,
1062
+ "eval_recall": 0.8462,
1063
+ "eval_runtime": 3.6344,
1064
+ "eval_samples_per_second": 7.154,
1065
+ "eval_steps_per_second": 3.577,
1066
+ "step": 107932
1067
+ },
1068
+ {
1069
+ "entropy": 0.6307510763127319,
1070
+ "epoch": 45.0,
1071
+ "grad_norm": 0.00927734375,
1072
+ "learning_rate": 3.0930486677313608e-06,
1073
+ "loss": 0.0,
1074
+ "mean_token_accuracy": 1.0,
1075
+ "num_tokens": 300215580.0,
1076
+ "step": 110385
1077
+ },
1078
+ {
1079
+ "epoch": 45.0,
1080
+ "eval_entropy": 0.54229736328125,
1081
+ "eval_loss": 0.8853968977928162,
1082
+ "eval_mean_token_accuracy": 0.9358974374257601,
1083
+ "eval_num_gold": 26,
1084
+ "eval_num_guess": 26,
1085
+ "eval_num_tokens": 300215580.0,
1086
+ "eval_recall": 0.8462,
1087
+ "eval_runtime": 3.61,
1088
+ "eval_samples_per_second": 7.202,
1089
+ "eval_steps_per_second": 3.601,
1090
+ "step": 110385
1091
+ },
1092
+ {
1093
+ "entropy": 0.6315490893937595,
1094
+ "epoch": 46.0,
1095
+ "grad_norm": 0.0001239776611328125,
1096
+ "learning_rate": 2.474489367067328e-06,
1097
+ "loss": 0.0,
1098
+ "mean_token_accuracy": 1.0,
1099
+ "num_tokens": 315226359.0,
1100
+ "step": 112838
1101
+ },
1102
+ {
1103
+ "epoch": 46.0,
1104
+ "eval_entropy": 0.5422170620698196,
1105
+ "eval_loss": 0.8882192373275757,
1106
+ "eval_mean_token_accuracy": 0.9358974374257601,
1107
+ "eval_num_gold": 26,
1108
+ "eval_num_guess": 26,
1109
+ "eval_num_tokens": 315226359.0,
1110
+ "eval_recall": 0.8462,
1111
+ "eval_runtime": 3.7084,
1112
+ "eval_samples_per_second": 7.011,
1113
+ "eval_steps_per_second": 3.506,
1114
+ "step": 112838
1115
+ },
1116
+ {
1117
+ "entropy": 0.6317317981380761,
1118
+ "epoch": 47.0,
1119
+ "grad_norm": 3.3855438232421875e-05,
1120
+ "learning_rate": 1.855930066403295e-06,
1121
+ "loss": 0.0,
1122
+ "mean_token_accuracy": 1.0,
1123
+ "num_tokens": 330237138.0,
1124
+ "step": 115291
1125
+ },
1126
+ {
1127
+ "epoch": 47.0,
1128
+ "eval_entropy": 0.5427549022894639,
1129
+ "eval_loss": 0.8879793882369995,
1130
+ "eval_mean_token_accuracy": 0.9358974374257601,
1131
+ "eval_num_gold": 26,
1132
+ "eval_num_guess": 26,
1133
+ "eval_num_tokens": 330237138.0,
1134
+ "eval_recall": 0.8462,
1135
+ "eval_runtime": 3.6923,
1136
+ "eval_samples_per_second": 7.042,
1137
+ "eval_steps_per_second": 3.521,
1138
+ "step": 115291
1139
+ },
1140
+ {
1141
+ "entropy": 0.6314135375092869,
1142
+ "epoch": 48.0,
1143
+ "grad_norm": 0.0025634765625,
1144
+ "learning_rate": 1.2373707657392621e-06,
1145
+ "loss": 0.0,
1146
+ "mean_token_accuracy": 1.0,
1147
+ "num_tokens": 345247917.0,
1148
+ "step": 117744
1149
+ },
1150
+ {
1151
+ "epoch": 48.0,
1152
+ "eval_entropy": 0.5423269546948947,
1153
+ "eval_loss": 0.887828528881073,
1154
+ "eval_mean_token_accuracy": 0.9358974374257601,
1155
+ "eval_num_gold": 26,
1156
+ "eval_num_guess": 26,
1157
+ "eval_num_tokens": 345247917.0,
1158
+ "eval_recall": 0.8462,
1159
+ "eval_runtime": 3.661,
1160
+ "eval_samples_per_second": 7.102,
1161
+ "eval_steps_per_second": 3.551,
1162
+ "step": 117744
1163
+ },
1164
+ {
1165
+ "entropy": 0.6317788491650499,
1166
+ "epoch": 49.0,
1167
+ "grad_norm": 0.0015106201171875,
1168
+ "learning_rate": 6.18811465075229e-07,
1169
+ "loss": 0.0,
1170
+ "mean_token_accuracy": 1.0,
1171
+ "num_tokens": 360258696.0,
1172
+ "step": 120197
1173
+ },
1174
+ {
1175
+ "epoch": 49.0,
1176
+ "eval_entropy": 0.5421000031324533,
1177
+ "eval_loss": 0.886226236820221,
1178
+ "eval_mean_token_accuracy": 0.9358974374257601,
1179
+ "eval_num_gold": 26,
1180
+ "eval_num_guess": 26,
1181
+ "eval_num_tokens": 360258696.0,
1182
+ "eval_recall": 0.8462,
1183
+ "eval_runtime": 3.8724,
1184
+ "eval_samples_per_second": 6.714,
1185
+ "eval_steps_per_second": 3.357,
1186
+ "step": 120197
1187
+ },
1188
+ {
1189
+ "entropy": 0.6307675256881722,
1190
+ "epoch": 50.0,
1191
+ "grad_norm": 0.0003414154052734375,
1192
+ "learning_rate": 2.5216441119609984e-10,
1193
+ "loss": 0.0,
1194
+ "mean_token_accuracy": 1.0,
1195
+ "num_tokens": 375269475.0,
1196
+ "step": 122650
1197
+ },
1198
+ {
1199
+ "epoch": 50.0,
1200
+ "eval_entropy": 0.5427401478473957,
1201
+ "eval_loss": 0.888108491897583,
1202
+ "eval_mean_token_accuracy": 0.9358974374257601,
1203
+ "eval_num_gold": 26,
1204
+ "eval_num_guess": 26,
1205
+ "eval_num_tokens": 375269475.0,
1206
+ "eval_recall": 0.8462,
1207
+ "eval_runtime": 3.7116,
1208
+ "eval_samples_per_second": 7.005,
1209
+ "eval_steps_per_second": 3.503,
1210
+ "step": 122650
1211
+ }
1212
+ ],
1213
+ "logging_steps": 0,
1214
+ "max_steps": 122650,
1215
+ "num_input_tokens_seen": 0,
1216
+ "num_train_epochs": 50,
1217
+ "save_steps": 0,
1218
+ "stateful_callbacks": {
1219
+ "TrainerControl": {
1220
+ "args": {
1221
+ "should_epoch_stop": false,
1222
+ "should_evaluate": false,
1223
+ "should_log": false,
1224
+ "should_save": true,
1225
+ "should_training_stop": true
1226
+ },
1227
+ "attributes": {}
1228
+ }
1229
+ },
1230
+ "total_flos": 3.3796448253168845e+19,
1231
+ "train_batch_size": 2,
1232
+ "trial_name": null,
1233
+ "trial_params": null
1234
+ }
LICENSE ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
2
+ Llama 3.1 Version Release Date: July 23, 2024
3
+
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
5
+ Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.1
8
+ distributed by Meta at https://llama.meta.com/doc/overview.
9
+
10
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
11
+ this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
12
+ regulations to provide legal consent and that has legal authority to bind your employer or such other
13
+ person or entity if you are entering in this Agreement on their behalf.
14
+
15
+ “Llama 3.1” means the foundational large language models and software and algorithms, including
16
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
17
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
18
+ https://llama.meta.com/llama-downloads.
19
+
20
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any
21
+ portion thereof) made available under this Agreement.
22
+
23
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
24
+ principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
25
+ outside of the EEA or Switzerland).
26
+
27
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
28
+ you agree to be bound by this Agreement.
29
+
30
+ 1. License Rights and Redistribution.
31
+
32
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
33
+ limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
34
+ Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
35
+ Llama Materials.
36
+
37
+ b. Redistribution and Use.
38
+
39
+ i. If you distribute or make available the Llama Materials (or any derivative works
40
+ thereof), or a product or service (including another AI model) that contains any of them, you shall (A)
41
+ provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with
42
+ Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use
43
+ the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
44
+ otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at
45
+ the beginning of any such AI model name.
46
+
47
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
48
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
49
+
50
+ iii. You must retain in all copies of the Llama Materials that you distribute the following
51
+ attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is
52
+ licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights
53
+ Reserved.”
54
+
55
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
56
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
57
+ Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by
58
+ reference into this Agreement.
59
+
60
+ 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users
61
+ of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
62
+ million monthly active users in the preceding calendar month, you must request a license from Meta,
63
+ which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
64
+ rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
65
+
66
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
67
+ OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
68
+ ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
69
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
70
+ MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
71
+ DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
72
+ ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
73
+ RESULTS.
74
+
75
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
76
+ LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
77
+ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
78
+ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
79
+ OF THE POSSIBILITY OF ANY OF THE FOREGOING.
80
+
81
+ 5. Intellectual Property.
82
+
83
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama
84
+ Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
85
+ or any of its affiliates, except as required for reasonable and customary use in describing and
86
+ redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
87
+ use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
88
+ comply with Meta’s brand guidelines (currently accessible at
89
+ https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
90
+ of the Mark will inure to the benefit of Meta.
91
+
92
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
93
+ respect to any derivative works and modifications of the Llama Materials that are made by you, as
94
+ between you and Meta, you are and will be the owner of such derivative works and modifications.
95
+
96
+ c. If you institute litigation or other proceedings against Meta or any entity (including a
97
+ cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or
98
+ results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
99
+ rights owned or licensable by you, then any licenses granted to you under this Agreement shall
100
+ terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
101
+ harmless Meta from and against any claim by any third party arising out of or related to your use or
102
+ distribution of the Llama Materials.
103
+
104
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
105
+ Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
106
+ accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
107
+ breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
108
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
109
+ Agreement.
110
+
111
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
112
+ the State of California without regard to choice of law principles, and the UN Convention on Contracts
113
+ for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
114
+ exclusive jurisdiction of any dispute arising out of this Agreement.
README.md ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+
4
+ base_model:
5
+ - meta-llama/Llama-3.1-8B-Instruct
6
+
7
+ language:
8
+ - fr
9
+
10
+ tags:
11
+ - biomedical-entity-linking
12
+ - entity-linking
13
+ - entity-disambiguation
14
+ - named-entity-linking
15
+ - biomedical
16
+ - healthcare
17
+ - umls
18
+ - quaero
19
+ - text-generation
20
+ - constrained-decoding
21
+ - causal-lm
22
+ - llm
23
+
24
+ library_name: transformers
25
+ pipeline_tag: text-generation
26
+
27
+ datasets:
28
+ - bigbio/quaero
29
+
30
+ finetuning_task:
31
+ - entity-linking
32
+
33
+ metrics:
34
+ - recall
35
+
36
+ model-index:
37
+ - name: LongBEL-8B-QUAERO-EMEA
38
+ results:
39
+ - task:
40
+ type: entity-linking
41
+ name: Biomedical Entity Linking
42
+ dataset:
43
+ type: bigbio/quaero
44
+ name: QUAERO-EMEA
45
+ config: quaero_emea_bigbio_kb
46
+ metrics:
47
+ - type: recall
48
+ name: Recall@1
49
+ value: 0.754
50
+ ---
51
+
52
+ # LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
53
+
54
+ ## LongBEL
55
+
56
+ **LongBEL** is a novel document-level framework for biomedical entity linking (BEL). Instead of normalizing each mention independently, LongBEL conditions each prediction on the document context and on previous normalizations produced in the same document. This design enforces document-level consistency and is enhanced by our **robust memory** mechanism. The method is introduced in our paper, currently under review.
57
+
58
+ ## LongBEL (QUAERO-EMEA Edition)
59
+
60
+ This is a **finetuned version of LLaMA-3-8B** trained on **QUAERO-EMEA**, applying the LongBEL framework to enable long context and robust memory predictions.
61
+
62
+ | Field | Value |
63
+ |---|---|
64
+ | Base model | `meta-llama/Llama-3.1-8B-Instruct` |
65
+ | Task | Biomedical Entity Linking |
66
+ | Dataset | QUAERO-EMEA |
67
+ | Knowledge base | UMLS 2014AA |
68
+ | Input | BigBio-like documents with mention spans and semantic groups |
69
+ | Output | Ranked UMLS concept predictions |
70
+ | Decoding | Semantic-guided constrained decoding |
71
+ | Main metric | Recall@1 |
72
+
73
+
74
+ ## Intended Use
75
+
76
+ This model is intended for research on biomedical entity linking and document-level consistency.
77
+
78
+ It assumes that mention spans and semantic groups are already provided. It does **not** perform named entity recognition. In a full pipeline, a NER model should first detect mentions and assign semantic groups, then LongBEL can normalize these mentions to UMLS concepts.
79
+
80
+ ## Usage
81
+
82
+ ### Loading the model
83
+
84
+ ```python
85
+ import torch
86
+ from transformers import AutoModelForCausalLM
87
+
88
+ model = AutoModelForCausalLM.from_pretrained(
89
+ "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
90
+ trust_remote_code=True,
91
+ torch_dtype=torch.bfloat16,
92
+ device_map="auto",
93
+ )
94
+ ````
95
+
96
+ ### Inference example
97
+
98
+ The model expects BigBio-like documents. Each entity should include a mention text, character offsets, and a semantic group in the `type` field.
99
+
100
+ ```python
101
+ num_beams = 5
102
+
103
+ bigbio_pages = [
104
+ {
105
+ "id": "001",
106
+ "document_id": "doc_001",
107
+ "passages": [
108
+ {
109
+ "id": "0",
110
+ "type": "paragraph",
111
+ "text": [
112
+ "A 29-year-old pregnant woman presented with severe-range hypertension, "
113
+ "headache, and epigastric pain. Laboratory testing showed proteinuria "
114
+ "and mildly elevated liver enzymes. She was admitted overnight with "
115
+ "suspected PET and was started on urgent treatment."
116
+ ],
117
+ "offsets": [[0, 257]],
118
+ }
119
+ ],
120
+ "entities": [
121
+ {
122
+ "id": "T1",
123
+ "type": "Living Beings",
124
+ "text": ["pregnant woman"],
125
+ "offsets": [[14, 28]],
126
+ },
127
+ {
128
+ "id": "T2",
129
+ "type": "Disorders",
130
+ "text": ["severe-range hypertension"],
131
+ "offsets": [[44, 69]],
132
+ },
133
+ {
134
+ "id": "T3",
135
+ "type": "Disorders",
136
+ "text": ["proteinuria"],
137
+ "offsets": [[128, 139]],
138
+ },
139
+ {
140
+ "id": "T4",
141
+ "type": "Disorders",
142
+ "text": ["PET"],
143
+ "offsets": [[217, 220]],
144
+ },
145
+ ],
146
+ "events": [],
147
+ "coreferences": [],
148
+ "relations": [],
149
+ }
150
+ ]
151
+
152
+ predictions = model.sample(
153
+ bigbio_pages=bigbio_pages,
154
+ num_beams=num_beams,
155
+ )
156
+
157
+ for i in range(0, len(predictions), num_beams):
158
+ mention = predictions[i]["mention"]
159
+ print(f"## Mention {(i // num_beams) + 1}: {mention}")
160
+
161
+ for j in range(num_beams):
162
+ pred = predictions[i + j]
163
+ print(
164
+ f" - Beam {j + 1}:\n"
165
+ f" Predicted concept name: {pred['pred_concept_name']}\n"
166
+ f" Predicted code: {pred['pred_concept_code']}\n"
167
+ f" Beam score: {pred['beam_score']:.3f}\n"
168
+ )
169
+ ```
170
+
171
+
172
+ **Example Output:**
173
+
174
+ ```text
175
+ ## Mention 1: pregnant woman
176
+ - Beam 1:
177
+ - Predicted concept name:Pregnant Woman
178
+ - Predicted code: C0033011
179
+ - Beam score: 1.000
180
+
181
+ - Beam 2:
182
+ - Predicted concept name:Pregnant woman
183
+ - Predicted code: C0033011
184
+ - Beam score: 0.003
185
+
186
+ - Beam 3:
187
+ - Predicted concept name:Pregnant woman (person)
188
+ - Predicted code: C0033011
189
+ - Beam score: 0.001
190
+
191
+ - Beam 4:
192
+ - Predicted concept name:Pregnancy Partner
193
+ - Predicted code: C3538996
194
+ - Beam score: 0.000
195
+
196
+ - Beam 5:
197
+ - Predicted concept name:Pregnant woman (person)
198
+ - Predicted code: C0033011
199
+ - Beam score: 0.000
200
+
201
+ ## Mention 2: severe-range hypertension
202
+ - Beam 1:
203
+ - Predicted concept name:Hypertensive disease
204
+ - Predicted code: C0020538
205
+ - Beam score: 0.078
206
+
207
+ - Beam 2:
208
+ - Predicted concept name:Hypertension (in some patients)
209
+ - Predicted code: C3280936
210
+ - Beam score: 0.022
211
+
212
+ - Beam 3:
213
+ - Predicted concept name:Hypertensive disease (disorder)
214
+ - Predicted code: C0020538
215
+ - Beam score: 0.010
216
+
217
+ - Beam 4:
218
+ - Predicted concept name:Hypertension, severe
219
+ - Predicted code: C4013784
220
+ - Beam score: 0.010
221
+
222
+ - Beam 5:
223
+ - Predicted concept name:Hypertension (patient A)
224
+ - Predicted code: C4313262
225
+ - Beam score: 0.004
226
+
227
+ ## Mention 3: proteinuria
228
+ - Beam 1:
229
+ - Predicted concept name:Proteinurias
230
+ - Predicted code: C0033687
231
+ - Beam score: 1.000
232
+
233
+ - Beam 2:
234
+ - Predicted concept name:Proteinuric diabetic nephropathy (disorder)
235
+ - Predicted code: C0403519
236
+ - Beam score: 0.003
237
+
238
+ - Beam 3:
239
+ - Predicted concept name:Proteinuria
240
+ - Predicted code: C0033687
241
+ - Beam score: 0.003
242
+
243
+ - Beam 4:
244
+ - Predicted concept name:Proteinuric diabetic nephropathy
245
+ - Predicted code: C0403519
246
+ - Beam score: 0.002
247
+
248
+ - Beam 5:
249
+ - Predicted concept name:Proteinuric hypertension of pregnancy (disorder)
250
+ - Predicted code: C0032914
251
+ - Beam score: 0.001
252
+
253
+ ## Mention 4: PET
254
+ - Beam 1:
255
+ - Predicted concept name:PET - Pre-eclamptic toxemia
256
+ - Predicted code: C0032914
257
+ - Beam score: 0.075
258
+
259
+ - Beam 2:
260
+ - Predicted concept name:PET - Pre-eclamptic toxaemia
261
+ - Predicted code: C0032914
262
+ - Beam score: 0.039
263
+
264
+ - Beam 3:
265
+ - Predicted concept name:Preeclamptic toxemia
266
+ - Predicted code: C2931877
267
+ - Beam score: 0.027
268
+
269
+ - Beam 4:
270
+ - Predicted concept name:Preeclampsia
271
+ - Predicted code: C0032914
272
+ - Beam score: 0.023
273
+
274
+ - Beam 5:
275
+ - Predicted concept name:Preeclampsia with Severe Features
276
+ - Predicted code: C0341950
277
+ - Beam score: 0.019
278
+ ```
279
+
280
+ ## Evaluation
281
+
282
+ Entity linking performance is reported using Recall@1 with bootstrap confidence intervals. The best result is shown in **bold**, and the second-best result is <u>underlined</u>.
283
+
284
+ | Model | MM-ST21PV<br>(English) | QUAERO-EMEA<br>(French) | SympTEMIST<br>(Spanish) | DisTEMIST<br>(Spanish) | MedProcNER<br>(Spanish) |
285
+ | :--- | :---: | :---: | :---: | :---: | :---: |
286
+ | **Context-Free BEL** ||||| |
287
+ | SciSpacy | 53.8 ± 1.0 | 37.1 ± 4.3 | 9.8 ± 1.3 | 21.1 ± 1.9 | 10.3 ± 1.2 |
288
+ | SapBERT | 65.6 ± 1.0 | 59.7 ± 3.8 | 34.2 ± 2.0 | 38.6 ± 2.6 | 30.4 ± 2.1 |
289
+ | CODER-all | 62.9 ± 1.1 | 66.9 ± 4.0 | 42.2 ± 2.2 | 47.0 ± 2.6 | 42.7 ± 2.1 |
290
+ | SapBERT-all | 64.6 ± 1.1 | 67.9 ± 3.9 | 49.8 ± 2.4 | 49.6 ± 2.6 | 45.1 ± 2.2 |
291
+ | BERGAMOT | 60.9 ± 1.1 | 63.8 ± 4.9 | 48.0 ± 2.7 | 48.9 ± 2.4 | 42.3 ± 2.2 |
292
+ | **Local-Context BEL** ||||| |
293
+ | ArboEL | 76.9 ± 0.9 | 63.0 ± 3.9 | 55.4 ± 2.5 | 54.7 ± 2.6 | 59.7 ± 2.6 |
294
+ | GENRE / mBART-large | 69.6 ± 1.0 | 69.3 ± 5.4 | 59.8 ± 2.7 | 58.7 ± 2.7 | 66.0 ± 2.3 |
295
+ | GENRE / Llama-1B | 73.1 ± 1.0 | 75.1 ± 3.6 | 60.5 ± 2.4 | 62.5 ± 2.3 | 67.4 ± 2.1 |
296
+ | GENRE / Llama-8B | 75.0 ± 0.9 | 73.8 ± 4.0 | 61.7 ± 2.5 | 63.2 ± 2.5 | 68.3 ± 2.2 |
297
+ | **Global-Context BEL: LongBEL** ||||| |
298
+ | LongBEL-1B | 77.6 ± 0.9 | 74.5 ± 3.7 | 59.8 ± 2.5 | 61.9 ± 2.4 | 66.6 ± 2.1 |
299
+ | LongBEL-1B + Ensemble | 78.6 ± 0.8 | <u>77.2 ± 3.0</u> | 61.8 ± 2.5 | <u>64.3 ± 2.2</u> | <u>69.0 ± 2.0</u> |
300
+ | **LongBEL-8B** | <u>79.3 ± 0.8</u> | 75.4 ± 3.4 | <u>62.0 ± 2.6</u> | 63.6 ± 2.1 | <u>69.0 ± 2.1</u> |
301
+ | LongBEL-8B + Ensemble | **80.0 ± 0.8** | **77.6 ± 3.0** | **63.3 ± 2.5** | **65.8 ± 2.2** | **71.0 ± 2.0** |
302
+
303
+ The score reported for this checkpoint is the **single LongBEL-8B model**. The ensemble result requires fusing several LongBEL input configurations and is not produced by this checkpoint alone.
304
+
305
+ ## Speed and Memory
306
+
307
+ Measured on a single NVIDIA H100 80GB GPU.
308
+
309
+ | Model | Model memory | Candidate memory | Speed |
310
+ | ----------------------- | -----------: | ---------------: | --------------: |
311
+ | GENRE-Llama-8B baseline | 28.6 GB | 5.4 GB | 38.2 mentions/s |
312
+ | LongBEL-8B | 28.6 GB | 5.4 GB | 15.2 mentions/s |
313
+
314
+ LongBEL has the same model memory footprint as the sentence-level Llama-8B baseline, but it is slower because it processes longer contexts and updates document-level memory during inference.
315
+
316
+ ## Limitations
317
+
318
+ This model assumes that mention spans and semantic groups are given. It does not perform mention detection.
319
+
320
+ LongBEL is most useful when concepts recur within a document. When most concepts appear only once, the memory mechanism has less information to exploit.
321
+
322
+ Because LongBEL uses previous predictions as memory, early mistakes can still influence later predictions. Robust memory training reduces this risk but does not remove it completely.
323
+
324
+ This model is intended for research use. It should not be used for clinical decision-making without additional validation and human oversight.
325
+
326
+ ## Reproducibility
327
+
328
+ Code and evaluation scripts are available in this [GitHub repository](https://anonymous.4open.science/r/LongBEL-31AD).
329
+
330
+ Trained model checkpoints and processed datasets are available in the anonymous Hugging Face collection associated with LongBEL.
331
+
332
+ <!-- ## Citation
333
+
334
+ If you use this model, please cite the LongBEL paper.
335
+
336
+ ```bibtex
337
+ @inproceedings{longbel2026,
338
+ title = {LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking},
339
+ author = {Anonymous},
340
+ booktitle = {Anonymous submission},
341
+ year = {2026}
342
+ }
343
+ ``` -->
__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # __init__.py
2
+ from .longbel import LLamaLongBEL, LLamaLongBELConfig
3
+
4
+ __all__ = ["LLamaLongBEL", "LLamaLongBELConfig"]
candidate_trie.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d060c96d86bbd5f0531a1eca465a4f645c8fd85fc4c47e2fb5197a5795d053b6
3
+ size 298349465
chat_template.jinja ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
2
+
3
+ '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>
4
+
5
+ ' }}
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LLamaLongBEL"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 128009,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 14336,
15
+ "max_position_embeddings": 131072,
16
+ "mlp_bias": false,
17
+ "model_type": "llama_longbel",
18
+ "auto_map": {
19
+ "AutoConfig": "longbel.LLamaLongBELConfig",
20
+ "AutoModelForCausalLM": "longbel.LLamaLongBEL"
21
+ },
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 32,
24
+ "num_key_value_heads": 8,
25
+ "pad_token_id": 128009,
26
+ "pretraining_tp": 1,
27
+ "rms_norm_eps": 1e-05,
28
+ "rope_scaling": {
29
+ "factor": 8.0,
30
+ "high_freq_factor": 4.0,
31
+ "low_freq_factor": 1.0,
32
+ "original_max_position_embeddings": 8192,
33
+ "rope_type": "llama3"
34
+ },
35
+ "rope_theta": 500000.0,
36
+ "tie_word_embeddings": false,
37
+ "transformers_version": "4.57.1",
38
+ "use_cache": true,
39
+ "vocab_size": 128257
40
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128009,
6
+ 128001,
7
+ 128008,
8
+ 128009
9
+ ],
10
+ "pad_token_id": 128009,
11
+ "temperature": 0.6,
12
+ "top_p": 0.9,
13
+ "transformers_version": "4.57.1"
14
+ }
longbel.py ADDED
@@ -0,0 +1,981 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core models for LongBEL
3
+ """
4
+ # Copyright (c) Facebook, Inc. and its affiliates.
5
+ # All rights reserved.
6
+ #
7
+ # This source code is licensed under the license found in the
8
+ # LICENSE file in the root directory of this source tree.
9
+
10
+ import json
11
+ import logging
12
+ import os
13
+ import pickle
14
+ import re
15
+ from html import escape
16
+ from typing import Optional
17
+
18
+ import nltk
19
+ import torch
20
+ import torch.nn.functional as F
21
+ from huggingface_hub import hf_hub_download
22
+ from tqdm.auto import tqdm
23
+ from transformers import (
24
+ AutoTokenizer,
25
+ LlamaForCausalLM,
26
+ PretrainedConfig,
27
+ )
28
+
29
+ logger = logging.getLogger(__name__)
30
+ logging.basicConfig(
31
+ level=logging.INFO, # Display INFO and above
32
+ format="%(levelname)s - %(message)s",
33
+ )
34
+
35
+
36
+ # Define a simple config class that inherits from PretrainedConfig
37
+ class LLamaLongBELConfig(PretrainedConfig):
38
+ model_type = "llama_longbel"
39
+
40
+ def __init__(self, **kwargs):
41
+ # Ensure it has llama as base
42
+ kwargs.setdefault("model_type", "llama")
43
+ super().__init__(**kwargs)
44
+
45
+
46
+ def clean_natural(text):
47
+ return (
48
+ text.replace("\xa0", " ")
49
+ .replace("{", "(")
50
+ .replace("}", ")")
51
+ .replace("[", "(")
52
+ .replace("]", ")")
53
+ .replace("\n", " ")
54
+ )
55
+
56
+
57
+ def parse_text(
58
+ data,
59
+ start_entity,
60
+ end_entity,
61
+ start_group,
62
+ end_group,
63
+ nlp,
64
+ ) -> tuple[list[str], list[str], list[dict[str, str]]]:
65
+ """Create simple (source, target) pairs per entity.
66
+
67
+ For each entity in the BigBio page, returns one pair where:
68
+ - source: the sentence text that contains the entity mention
69
+ - target: "<entity> is <annotation>" where <annotation> is the best synonym
70
+ if available (or the normalized id otherwise).
71
+ """
72
+ source_sentences: list[str] = []
73
+ tsv_lines: list[dict[str, str]] = []
74
+ target_texts_dict: dict[tuple[tuple[int, int], ...], str] = {}
75
+ source_texts_dict: dict[tuple[tuple[int, int], ...], str] = {}
76
+ tsv_lines_dict: dict[tuple[tuple[int, int], ...], dict[str, str]] = {}
77
+ all_passages = {}
78
+ for i, passage in enumerate(data.get("passages", [])):
79
+ all_passages[i] = clean_natural(passage["text"][0])
80
+ for passage_id, passage in enumerate(data.get("passages", [])):
81
+ passage_text = passage["text"][0]
82
+ start_offset_passage = passage["offsets"][0][0]
83
+ end_offset_passage = passage["offsets"][0][1]
84
+
85
+ passage_text = clean_natural(passage_text)
86
+
87
+ # Iterate over entities and emit one pair per entity found in this passage
88
+ for entity in data.get("entities", []):
89
+ # min and max of all entity offsets to get the global span of the entity for filtering sentences
90
+ global_start = min(off[0] for off in entity["offsets"])
91
+ global_end = max(off[1] for off in entity["offsets"])
92
+ # Keep only entities whose start falls inside this passage
93
+ if not (start_offset_passage <= global_start < end_offset_passage):
94
+ continue
95
+ entity_text = " ".join(entity["text"])
96
+ entity_text = clean_natural(entity_text)
97
+ # Define entity group
98
+ group_annotation = entity.get("type")
99
+ # Get all offsets, convert to relative, and filter for this sentence
100
+ relative_entity_spans = []
101
+ for off in entity["offsets"]:
102
+ global_start_off, global_end_off = off
103
+ if not (start_offset_passage <= global_start_off < end_offset_passage):
104
+ continue
105
+
106
+ rel_start_off = global_start_off - start_offset_passage
107
+ rel_end_off = global_end_off - start_offset_passage
108
+ relative_entity_spans.append((rel_start_off, rel_end_off))
109
+ relative_entity_spans.sort(key=lambda x: x[0])
110
+
111
+ marked_text = passage_text
112
+ for start_in_sent, end_in_sent in relative_entity_spans:
113
+ marked_text = (
114
+ marked_text[:start_in_sent]
115
+ + start_entity
116
+ + marked_text[start_in_sent:end_in_sent]
117
+ + end_entity
118
+ + marked_text[end_in_sent:]
119
+ )
120
+
121
+ for other_passage_id, other_passage_text in all_passages.items():
122
+ if other_passage_id < passage_id:
123
+ marked_text = other_passage_text + "\n" + marked_text
124
+ elif other_passage_id > passage_id:
125
+ marked_text = marked_text + "\n" + other_passage_text
126
+ # Emit the pair
127
+ doc_id = data.get("id", "")
128
+ tsv_line = {
129
+ "doc_id": doc_id,
130
+ "semantic_group": group_annotation,
131
+ "start_span": global_start,
132
+ "end_span": global_end,
133
+ "mention": entity_text,
134
+ }
135
+ if entity.get("normalized"):
136
+ tsv_line["gold_concept_code"] = entity["normalized"][0]["db_id"]
137
+ tsv_line["gold_concept_name"] = entity["normalized"][0]["db_match"]
138
+
139
+ tsv_lines_dict[(global_start, global_end)] = tsv_line
140
+ source_texts_dict[(global_start, global_end)] = marked_text
141
+ target_entity_text = (
142
+ start_entity
143
+ + entity_text
144
+ + end_entity
145
+ + start_group
146
+ + group_annotation
147
+ + end_group
148
+ )
149
+ target_texts_dict[(global_start, global_end)] = target_entity_text
150
+ # Sort keys to have a deterministic order
151
+ target_texts = []
152
+ sorted_keys = sorted(tsv_lines_dict.keys(), key=lambda x: (x[0], x[1]))
153
+ for entity_id, entity_span in enumerate(sorted_keys):
154
+ tsv_line = tsv_lines_dict[entity_span]
155
+ tsv_line["mention_id"] = f"{data.get('id', '')}.{entity_id + 1}"
156
+ tsv_lines.append(tsv_line)
157
+ source_sentences.append(source_texts_dict[entity_span])
158
+ target_texts.append(target_texts_dict[entity_span])
159
+
160
+ return source_sentences, target_texts, tsv_lines # type: ignore
161
+
162
+
163
+ def get_prefix_allowed_tokens_fn(
164
+ model,
165
+ sources: list[str],
166
+ sem_groups: list[str],
167
+ multiple_answers: bool = False,
168
+ ):
169
+ candidates_trie = model.candidate_trie # type: ignore
170
+ sep_token_id = model.tokenizer.sep_token_id
171
+ eos_token_id = model.tokenizer.eos_token_id
172
+ pad_token_id = model.tokenizer.pad_token_id
173
+ plus_token_id = model.tokenizer.convert_tokens_to_ids("<+>") # type: ignore
174
+ end_group_token_id = model.tokenizer.convert_tokens_to_ids("}") # type: ignore
175
+
176
+ def prefix_allowed_tokens_fn(batch_id, sent):
177
+ sent = sent.tolist()
178
+ if len(sent) > 1 and sent[-1] in [eos_token_id, pad_token_id, sep_token_id]:
179
+ if sep_token_id:
180
+ return [sep_token_id, pad_token_id, eos_token_id]
181
+ else:
182
+ return [pad_token_id, eos_token_id]
183
+
184
+ # Remove the prefix from the sent
185
+ index_sep = len(sent) - 1 - sent[::-1].index(end_group_token_id)
186
+ sent = sent[index_sep:]
187
+
188
+ sem_group = sem_groups[batch_id]
189
+ # Remove everything up to last sep_token_id and add prefix and tgt_lang_id
190
+ if multiple_answers and plus_token_id in sent:
191
+ index_plus = len(sent) - 1 - sent[::-1].index(plus_token_id)
192
+ # Start fresh with decoder start
193
+ if index_plus == len(sent) - 1:
194
+ sent = [end_group_token_id]
195
+ # If there are tokens after the last plus_token_id, keep them
196
+ else:
197
+ sent = [end_group_token_id] + sent[index_plus + 1 :]
198
+ trie_out = candidates_trie[
199
+ sem_group # type: ignore
200
+ ].get(sent)
201
+ if eos_token_id in trie_out:
202
+ if sep_token_id:
203
+ trie_out += [sep_token_id]
204
+ if multiple_answers:
205
+ trie_out += [plus_token_id]
206
+ elif not trie_out:
207
+ if sep_token_id:
208
+ return [sep_token_id, pad_token_id, eos_token_id]
209
+ else:
210
+ return [pad_token_id, eos_token_id]
211
+ return trie_out
212
+
213
+ return prefix_allowed_tokens_fn
214
+
215
+
216
+ def add_headers_to_prompt(source: str, target: str, previous_targets: str):
217
+ if not previous_targets:
218
+ previous_targets = "None"
219
+ input_sentence = f"### Context\n{source.rstrip()}\n\n### Previous Normalizations\n{previous_targets.rstrip()}\n\n### Prediction\n{target.rstrip()}"
220
+ return input_sentence
221
+
222
+
223
+ def parse_prediction(
224
+ outputs: list[str],
225
+ sem_groups: list[str],
226
+ text_to_code: Optional[dict[str, dict[str, str]]] = None,
227
+ multiple_answers: bool = False,
228
+ ) -> tuple[list[str], list[str]]:
229
+ codes = []
230
+ predictions = []
231
+ for output, group in zip(outputs, sem_groups):
232
+ splits = output.split("} ") # type: ignore
233
+ if len(splits) > 1 and splits[-1].strip():
234
+ prediction = splits[-1].strip().replace("<SEP>", "")
235
+ if text_to_code:
236
+ if multiple_answers:
237
+ prediction_list = prediction.split("<+>") # type: ignore
238
+ code_list = set()
239
+ for pred in prediction_list:
240
+ code_list.add(text_to_code[group].get(pred.strip(), "NO_CODE"))
241
+ if len(code_list) > 1 and "NO_CODE" in code_list:
242
+ code_list.remove("NO_CODE")
243
+ code = "+".join(code_list)
244
+ else:
245
+ code = text_to_code[group].get(prediction, "NO_CODE")
246
+ else:
247
+ code = "NO_CODE"
248
+ else:
249
+ print(
250
+ "IndexError: splitting failed or empty prediction, adding empty string as prediction."
251
+ )
252
+ prediction = "NO_PREDICTION"
253
+ code = "NO_CODE"
254
+ codes.append(code)
255
+ predictions.append(prediction)
256
+ return codes, predictions
257
+
258
+
259
+ def compute_score(outputs, tokenizer, prefix_len=0):
260
+ sequences = outputs.sequences # (N, seq_len)
261
+ scores = outputs.scores # list length T = # generated tokens
262
+
263
+ N, total_len = sequences.shape
264
+ T = len(scores)
265
+
266
+ # keep only the generated part (completion)
267
+ sequences = sequences[:, prefix_len : prefix_len + T]
268
+
269
+ # Make sure score is not longer than sequences
270
+ if len(scores) > sequences.size(1):
271
+ scores = scores[: sequences.size(1)]
272
+
273
+ # Compute as usual but now only for completion tokens
274
+ mask = (
275
+ (sequences != tokenizer.pad_token_id)
276
+ & (sequences != tokenizer.eos_token_id)
277
+ & (sequences != tokenizer.bos_token_id)
278
+ )
279
+
280
+ # log-prob for each generated token
281
+ logprob_steps = []
282
+ for t, logits in enumerate(scores):
283
+ log_probs_t = F.log_softmax(logits, dim=-1)
284
+ token_t = sequences[:, t]
285
+ idx = torch.arange(N)
286
+ logprob_steps.append(log_probs_t[idx, token_t])
287
+
288
+ logprobs = torch.stack(logprob_steps, dim=1)
289
+ logprobs.masked_fill_(~mask, 0)
290
+
291
+ lengths = mask.sum(dim=1).clamp(min=1)
292
+ confidence = torch.exp(logprobs.sum(dim=1) / lengths)
293
+
294
+ return confidence.tolist()
295
+
296
+
297
+ def skip_undesired_tokens(outputs, tokenizer):
298
+ sep_token = "<SEP>"
299
+ plus_token = "<+>"
300
+ # Build the list of special tokens to remove
301
+ tokens_to_remove = tokenizer.all_special_tokens[:2]
302
+
303
+ cleaned_outputs = []
304
+ for sequence in outputs:
305
+ # Remove undesired special tokens
306
+ for token in tokens_to_remove:
307
+ sequence = sequence.replace(token, "")
308
+
309
+ # Remove spaces *immediately* after the sep_token adn plus_token (e.g. "<sep> text" → "<sep>text")
310
+ sequence = re.sub(rf"({re.escape(plus_token)})\s+", r"\1", sequence)
311
+ sequence = re.sub(rf"({re.escape(sep_token)})\s+", r"\1", sequence)
312
+
313
+ cleaned_outputs.append(sequence.strip())
314
+
315
+ return cleaned_outputs
316
+
317
+
318
+ def _score_to_rgb(score: float) -> tuple[int, int, int]:
319
+ clipped_score = max(0.0, min(1.0, score))
320
+ red = 255
321
+ channel = int(255 * (1.0 - clipped_score))
322
+ return red, channel, channel
323
+
324
+
325
+ def _build_ansi_saliency_text(
326
+ token_texts: list[str], saliency_scores: list[float]
327
+ ) -> str:
328
+ chunks = []
329
+ for token_text, score in zip(token_texts, saliency_scores):
330
+ red, green, blue = _score_to_rgb(score)
331
+ chunks.append(f"\x1b[48;2;{red};{green};{blue}m{token_text}\x1b[0m")
332
+ return "".join(chunks)
333
+
334
+
335
+ def _build_html_saliency_text(
336
+ token_texts: list[str], saliency_scores: list[float]
337
+ ) -> str:
338
+ chunks = []
339
+ for token_text, score in zip(token_texts, saliency_scores):
340
+ red, green, blue = _score_to_rgb(score)
341
+ chunks.append(
342
+ f'<span style="background-color: rgb({red}, {green}, {blue});">{escape(token_text)}</span>'
343
+ )
344
+ return "".join(chunks)
345
+
346
+
347
+ class LLamaLongBEL(LlamaForCausalLM):
348
+ config_class = LLamaLongBELConfig
349
+
350
+ def __init__(self, config, *args, **kwargs):
351
+ # Initialize the parent LlamaForCausalLM
352
+ super().__init__(config, *args, **kwargs)
353
+
354
+ # Store language from config
355
+ self.lang = getattr(config, "lang", "en")
356
+ self.text_to_code = None
357
+ self.candidate_trie = None
358
+ self.tokenizer = None
359
+
360
+ @classmethod
361
+ def from_pretrained(
362
+ cls,
363
+ pretrained_model_name_or_path,
364
+ *args,
365
+ lang=None,
366
+ text_to_code_path=None,
367
+ candidate_trie_path=None,
368
+ **kwargs,
369
+ ):
370
+ # Remove custom kwargs before passing to parent
371
+ custom_kwargs = {
372
+ "lang": lang,
373
+ "text_to_code_path": text_to_code_path,
374
+ "candidate_trie_path": candidate_trie_path,
375
+ }
376
+
377
+ # Call parent's from_pretrained
378
+ model = super().from_pretrained(
379
+ pretrained_model_name_or_path,
380
+ *args,
381
+ **{k: v for k, v in kwargs.items() if k not in custom_kwargs},
382
+ )
383
+
384
+ # Set up tokenizer
385
+ model.tokenizer = AutoTokenizer.from_pretrained(
386
+ pretrained_model_name_or_path, use_fast=True
387
+ )
388
+ model.tokenizer.padding_side = "left"
389
+
390
+ # Set language: explicit override > config > default
391
+ if lang is not None:
392
+ model.lang = lang
393
+ elif hasattr(model.config, "lang"):
394
+ model.lang = model.config.lang
395
+ else:
396
+ model.lang = "en"
397
+
398
+ logger.info(f"Model language set to: {model.lang}")
399
+
400
+ # Load text_to_code
401
+ text_to_code_file_local = (
402
+ text_to_code_path
403
+ if text_to_code_path is not None
404
+ else os.path.join(pretrained_model_name_or_path, "text_to_code.json")
405
+ )
406
+ try:
407
+ if os.path.exists(text_to_code_file_local):
408
+ with open(text_to_code_file_local, encoding="utf-8") as f:
409
+ model.text_to_code = json.load(f)
410
+ logger.info(
411
+ f"Loaded text_to_code.json from local path: {text_to_code_file_local}"
412
+ )
413
+ else:
414
+ text_to_code_path_hf = hf_hub_download(
415
+ repo_id=pretrained_model_name_or_path,
416
+ filename="text_to_code.json",
417
+ )
418
+ with open(text_to_code_path_hf, encoding="utf-8") as f:
419
+ model.text_to_code = json.load(f)
420
+ logger.info(
421
+ f"Loaded text_to_code.json from HF Hub: {text_to_code_path_hf}"
422
+ )
423
+ except Exception:
424
+ logger.warning("text_to_code.json not found (local or HF hub)")
425
+ model.text_to_code = None
426
+
427
+ # Load candidate_trie
428
+ candidate_trie_file_local = (
429
+ candidate_trie_path
430
+ if candidate_trie_path is not None
431
+ else os.path.join(pretrained_model_name_or_path, "candidate_trie.pkl")
432
+ )
433
+ try:
434
+ if os.path.exists(candidate_trie_file_local):
435
+ with open(candidate_trie_file_local, "rb") as f:
436
+ model.candidate_trie = pickle.load(f)
437
+ logger.info(
438
+ f"Loaded candidate_trie.pkl from local path: {candidate_trie_file_local}"
439
+ )
440
+ else:
441
+ candidate_trie_path_hf = hf_hub_download(
442
+ repo_id=pretrained_model_name_or_path,
443
+ filename="candidate_trie.pkl",
444
+ )
445
+ with open(candidate_trie_path_hf, "rb") as f:
446
+ model.candidate_trie = pickle.load(f)
447
+ logger.info(
448
+ f"Loaded candidate_trie.pkl from HF Hub: {candidate_trie_path_hf}"
449
+ )
450
+ except Exception:
451
+ logger.warning("candidate_trie.pkl not found (local or HF hub)")
452
+ model.candidate_trie = None
453
+
454
+ return model
455
+
456
+ def _compute_gradient_saliency(
457
+ self,
458
+ input_sentences: list[str],
459
+ generated_sequences: torch.Tensor,
460
+ num_beams: int,
461
+ prefix_len: int,
462
+ saliency_method: str = "integrated",
463
+ ig_steps: int = 20,
464
+ ig_baseline: str = "pad",
465
+ ) -> list[dict[str, object]]:
466
+ if not input_sentences:
467
+ return []
468
+
469
+ method = saliency_method.strip().lower()
470
+ if method == "integerated":
471
+ method = "integrated"
472
+ if method not in {"simple", "integrated"}:
473
+ raise ValueError("saliency_method must be one of: 'simple', 'integrated'.")
474
+
475
+ top_sequence_indices = (
476
+ torch.arange(
477
+ len(input_sentences),
478
+ device=generated_sequences.device,
479
+ )
480
+ * num_beams
481
+ )
482
+ top_sequences = generated_sequences.index_select(0, top_sequence_indices)
483
+
484
+ attention_mask = (top_sequences != self.tokenizer.pad_token_id).long() # type: ignore
485
+ input_embeddings = self.get_input_embeddings()(top_sequences).detach() # type: ignore
486
+
487
+ next_tokens = top_sequences[:, 1:]
488
+ output_token_mask = torch.zeros_like(next_tokens, dtype=torch.bool)
489
+ if prefix_len > 0:
490
+ output_token_mask[:, prefix_len - 1 :] = True
491
+
492
+ valid_token_mask = output_token_mask & (
493
+ (next_tokens != self.tokenizer.pad_token_id) # type: ignore
494
+ & (next_tokens != self.tokenizer.eos_token_id) # type: ignore
495
+ & (next_tokens != self.tokenizer.bos_token_id) # type: ignore
496
+ )
497
+
498
+ def _objective_from_embeddings(embeddings: torch.Tensor) -> torch.Tensor:
499
+ forward_outputs = self( # type: ignore
500
+ inputs_embeds=embeddings,
501
+ attention_mask=attention_mask,
502
+ use_cache=False,
503
+ return_dict=True,
504
+ )
505
+ logits = forward_outputs.logits[:, :-1, :]
506
+ log_probs = F.log_softmax(logits, dim=-1)
507
+ token_log_probs = log_probs.gather(
508
+ dim=-1,
509
+ index=next_tokens.unsqueeze(-1),
510
+ ).squeeze(-1)
511
+ return token_log_probs.masked_select(valid_token_mask).sum()
512
+
513
+ if method == "simple":
514
+ simple_embeddings = input_embeddings.detach()
515
+ simple_embeddings.requires_grad_(True)
516
+ self.zero_grad(set_to_none=True) # type: ignore
517
+ with torch.enable_grad():
518
+ objective = _objective_from_embeddings(simple_embeddings)
519
+ gradients = torch.autograd.grad(
520
+ outputs=objective,
521
+ inputs=simple_embeddings,
522
+ retain_graph=False,
523
+ create_graph=False,
524
+ )[0]
525
+ token_importance = gradients.norm(p=2, dim=-1)
526
+ else:
527
+ if ig_baseline == "pad": # type: ignore
528
+ baseline_ids = torch.full_like(
529
+ top_sequences,
530
+ self.tokenizer.pad_token_id, # type: ignore
531
+ )
532
+ baseline_embeddings = self.get_input_embeddings()(baseline_ids).detach() # type: ignore
533
+ elif ig_baseline == "zero":
534
+ baseline_embeddings = torch.zeros_like(input_embeddings)
535
+ elif ig_baseline == "random":
536
+ baseline_embeddings = torch.randn_like(input_embeddings)
537
+ elif ig_baseline == "avg":
538
+ baseline_embeddings = input_embeddings.mean(
539
+ dim=1, keepdim=True
540
+ ).expand_as(input_embeddings)
541
+ else:
542
+ raise ValueError(
543
+ f"Unsupported baseline type '{ig_baseline}'. Choose from 'pad', 'zero', 'random', 'avg'."
544
+ )
545
+
546
+ embedding_delta = input_embeddings - baseline_embeddings
547
+ total_gradients = torch.zeros_like(input_embeddings)
548
+ steps = max(1, ig_steps)
549
+ for step in range(1, steps + 1):
550
+ alpha = float(step) / float(steps)
551
+ interpolated_embeddings = (
552
+ baseline_embeddings + alpha * embedding_delta
553
+ ).detach()
554
+ interpolated_embeddings.requires_grad_(True)
555
+ self.zero_grad(set_to_none=True) # type: ignore
556
+
557
+ with torch.enable_grad():
558
+ objective = _objective_from_embeddings(interpolated_embeddings)
559
+
560
+ gradients = torch.autograd.grad(
561
+ outputs=objective,
562
+ inputs=interpolated_embeddings,
563
+ retain_graph=False,
564
+ create_graph=False,
565
+ )[0]
566
+ total_gradients += gradients.detach()
567
+
568
+ averaged_gradients = total_gradients / float(steps)
569
+ integrated_gradients = embedding_delta * averaged_gradients
570
+ token_importance = integrated_gradients.norm(p=2, dim=-1)
571
+ saliency_maps = []
572
+ sequence_len = top_sequences.size(1)
573
+ prompt_positions = torch.arange(sequence_len, device=top_sequences.device)
574
+ prompt_mask = (prompt_positions.unsqueeze(0) < prefix_len) & (
575
+ top_sequences != self.tokenizer.pad_token_id # type: ignore
576
+ )
577
+
578
+ for sequence_ids, importance_scores, sentence, mask in zip(
579
+ top_sequences,
580
+ token_importance,
581
+ input_sentences,
582
+ prompt_mask,
583
+ ):
584
+ selected_ids = sequence_ids[mask]
585
+ selected_scores = importance_scores[mask]
586
+
587
+ if selected_scores.numel() == 0:
588
+ saliency_maps.append({
589
+ "input_sentence": sentence,
590
+ "token_ids": [],
591
+ "token_strings": [],
592
+ "saliency_scores": [],
593
+ "saliency_method": method,
594
+ "saliency_ansi": "",
595
+ "saliency_html": "",
596
+ })
597
+ continue
598
+
599
+ max_score = selected_scores.max().clamp(min=1e-12)
600
+ normalized_scores = (selected_scores / max_score).tolist()
601
+ selected_ids_list = selected_ids.tolist()
602
+ token_strings = [
603
+ self.tokenizer.decode( # type: ignore
604
+ [token_id],
605
+ skip_special_tokens=False,
606
+ clean_up_tokenization_spaces=False,
607
+ )
608
+ for token_id in selected_ids_list
609
+ ]
610
+
611
+ saliency_maps.append({
612
+ "input_sentence": sentence,
613
+ "token_ids": selected_ids_list,
614
+ "token_strings": token_strings,
615
+ "saliency_scores": normalized_scores,
616
+ "saliency_method": method,
617
+ "saliency_ansi": _build_ansi_saliency_text(
618
+ token_strings,
619
+ normalized_scores,
620
+ ),
621
+ "saliency_html": _build_html_saliency_text(
622
+ token_strings,
623
+ normalized_scores,
624
+ ),
625
+ })
626
+
627
+ return saliency_maps
628
+
629
+ def predict_batch(
630
+ self,
631
+ all_outputs,
632
+ batch_size,
633
+ input_sentences,
634
+ sem_groups,
635
+ mentions,
636
+ mentions_id,
637
+ doc_ids,
638
+ start_spans,
639
+ end_spans,
640
+ gold_concept_codes,
641
+ gold_concept_names,
642
+ constrained,
643
+ multiple_answers,
644
+ num_beams,
645
+ explicability_mode: str = "",
646
+ ig_steps: int = 20,
647
+ ig_baseline: str = "pad",
648
+ **kwargs,
649
+ ):
650
+ input_args = {
651
+ k: v.to(self.device) # type: ignore
652
+ for k, v in self.tokenizer.batch_encode_plus( # type: ignore
653
+ input_sentences, padding="longest", return_tensors="pt"
654
+ ).items()
655
+ }
656
+
657
+ # Constrained decoding
658
+ prefix_allowed_tokens_fn = None
659
+ if constrained:
660
+ if self.candidate_trie is None: # type: ignore
661
+ raise ValueError(
662
+ "candidate_trie is not loaded in the model. Use constrained=False."
663
+ )
664
+ prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(
665
+ model=self,
666
+ sources=input_sentences,
667
+ sem_groups=sem_groups,
668
+ multiple_answers=multiple_answers,
669
+ )
670
+ if self.tokenizer.sep_token_id: # type: ignore
671
+ eos_token_id = self.tokenizer.sep_token_id # type: ignore
672
+ else:
673
+ eos_token_id = self.tokenizer.eos_token_id # type: ignore
674
+ outputs = self.generate( # type: ignore
675
+ **input_args,
676
+ max_new_tokens=128,
677
+ num_beams=num_beams,
678
+ num_return_sequences=num_beams,
679
+ output_scores=True,
680
+ return_dict_in_generate=True,
681
+ prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
682
+ eos_token_id=eos_token_id, # type: ignore
683
+ **kwargs,
684
+ )
685
+ decoded_sequences = self.tokenizer.batch_decode( # type: ignore
686
+ outputs.sequences, # type: ignore
687
+ skip_special_tokens=False,
688
+ clean_up_tokenization_spaces=True,
689
+ )
690
+ cleaned_output_sequences = skip_undesired_tokens(
691
+ decoded_sequences,
692
+ self.tokenizer, # type: ignore
693
+ )
694
+
695
+ prefix_len = input_args["input_ids"].size(1)
696
+
697
+ base_sem_groups = sem_groups.copy()
698
+ base_mentions = mentions.copy()
699
+ base_mentions_id = mentions_id.copy()
700
+ base_doc_ids = doc_ids.copy()
701
+ base_start_spans = start_spans.copy()
702
+ base_end_spans = end_spans.copy()
703
+ base_gold_concept_codes = gold_concept_codes.copy()
704
+ base_gold_concept_names = gold_concept_names.copy()
705
+
706
+ # Duplicate sem_groups and mentions for each beam
707
+ sem_groups = [x for x in sem_groups for _ in range(num_beams)]
708
+ mentions = [x for x in mentions for _ in range(num_beams)]
709
+ mentions_id = [x for x in mentions_id for _ in range(num_beams)]
710
+ gold_concept_codes = [x for x in gold_concept_codes for _ in range(num_beams)] # type: ignore
711
+ gold_concept_names = [x for x in gold_concept_names for _ in range(num_beams)] # type: ignore
712
+ start_spans = [x for x in start_spans for _ in range(num_beams)]
713
+ end_spans = [x for x in end_spans for _ in range(num_beams)]
714
+ doc_ids = [x for x in doc_ids for _ in range(num_beams)]
715
+ # Parse predictions
716
+ pred_concept_codes, pred_concept_names = parse_prediction(
717
+ cleaned_output_sequences,
718
+ sem_groups,
719
+ self.text_to_code, # type: ignore
720
+ multiple_answers=multiple_answers,
721
+ )
722
+ scores = compute_score(
723
+ outputs,
724
+ self.tokenizer, # type: ignore
725
+ prefix_len=prefix_len,
726
+ )
727
+ beam_scores = [
728
+ float(torch.exp(s)) if num_beams > 1 else float("nan")
729
+ for s in (
730
+ outputs.sequences_scores # type: ignore
731
+ if num_beams > 1
732
+ else [torch.tensor(float("nan"))] * len(scores)
733
+ )
734
+ ]
735
+ all_outputs.extend([
736
+ {
737
+ "mention": mention,
738
+ "doc_id": doc_id,
739
+ "mention_id": mention_id,
740
+ "start_span": start_span,
741
+ "end_span": end_span,
742
+ "semantic_group": group,
743
+ "gold_concept_code": gold_concept_code,
744
+ "gold_concept_name": gold_concept_name,
745
+ "pred_concept_name": pred_concept_name,
746
+ "pred_concept_code": pred_concept_code,
747
+ "score": score,
748
+ "beam_score": beam_score,
749
+ "rank": rank + 1,
750
+ }
751
+ for score, beam_score, pred_concept_code, pred_concept_name, mention, doc_id, mention_id, start_span, end_span, group, gold_concept_code, gold_concept_name, rank in zip(
752
+ scores,
753
+ beam_scores,
754
+ pred_concept_codes,
755
+ pred_concept_names,
756
+ mentions,
757
+ doc_ids,
758
+ mentions_id,
759
+ start_spans,
760
+ end_spans,
761
+ sem_groups,
762
+ gold_concept_codes,
763
+ gold_concept_names,
764
+ list(range(num_beams)) * batch_size,
765
+ )
766
+ ])
767
+
768
+ explicability_mode = explicability_mode.strip().lower()
769
+ if explicability_mode not in {"", "simple", "integrated"}:
770
+ raise ValueError(
771
+ "explicability must be one of: '', 'simple', 'integrated'."
772
+ )
773
+
774
+ saliency_maps = []
775
+ if explicability_mode:
776
+ saliency_maps = self._compute_gradient_saliency(
777
+ input_sentences=input_sentences,
778
+ generated_sequences=outputs.sequences, # type: ignore
779
+ num_beams=num_beams,
780
+ prefix_len=prefix_len,
781
+ saliency_method=explicability_mode,
782
+ ig_steps=ig_steps,
783
+ ig_baseline=ig_baseline,
784
+ )
785
+ for idx, saliency_map in enumerate(saliency_maps):
786
+ top_prediction_index = idx * num_beams
787
+ saliency_map.update({
788
+ "mention": base_mentions[idx],
789
+ "doc_id": base_doc_ids[idx],
790
+ "mention_id": base_mentions_id[idx],
791
+ "start_span": base_start_spans[idx],
792
+ "end_span": base_end_spans[idx],
793
+ "semantic_group": base_sem_groups[idx],
794
+ "gold_concept_code": base_gold_concept_codes[idx],
795
+ "gold_concept_name": base_gold_concept_names[idx],
796
+ "pred_concept_name": pred_concept_names[top_prediction_index],
797
+ "pred_concept_code": pred_concept_codes[top_prediction_index],
798
+ "score": scores[top_prediction_index],
799
+ "rank": 1,
800
+ })
801
+
802
+ print(f"Sampling completed. Generated {len(all_outputs)} predictions.")
803
+ return all_outputs, cleaned_output_sequences, saliency_maps
804
+
805
+ def sample(
806
+ self,
807
+ bigbio_pages: list[dict], # type: ignore
808
+ num_beams: int = 5,
809
+ constrained: bool = True,
810
+ explicability_mode: str = "",
811
+ multiple_answers: bool = False,
812
+ batch_size: int = 8,
813
+ start_entity: str = "[",
814
+ end_entity: str = "]",
815
+ start_group: str = "{",
816
+ end_group: str = "}",
817
+ show_progress: bool = True,
818
+ **kwargs,
819
+ ) -> (
820
+ list[dict[str, object]]
821
+ | tuple[list[dict[str, object]], list[dict[str, object]]]
822
+ ):
823
+ explicability_mode = explicability_mode.strip().lower()
824
+ if explicability_mode not in {"", "simple", "integrated"}:
825
+ raise ValueError(
826
+ "explicability must be one of: '', 'simple', 'integrated'."
827
+ )
828
+
829
+ # Prepare input batch
830
+ if self.lang == "fr": # type: ignore
831
+ nlp = nltk.data.load("tokenizers/punkt/french.pickle")
832
+ elif self.lang == "en": # type: ignore
833
+ nlp = nltk.data.load("tokenizers/punkt/english.pickle")
834
+ elif self.lang == "es": # type: ignore
835
+ nlp = nltk.data.load("tokenizers/punkt/spanish.pickle")
836
+ else:
837
+ raise ValueError(f"Unsupported language: {self.lang}") # type: ignore
838
+
839
+ print(
840
+ f"Starting sampling on {len(bigbio_pages)} pages (lang={getattr(self, 'lang', 'unknown')}, constrained={constrained}, beams={num_beams}, batch_size={batch_size})"
841
+ )
842
+
843
+ def _progress(
844
+ iterable, desc: str, total: Optional[int] = None, show: bool = True
845
+ ):
846
+ if show:
847
+ return tqdm(iterable, desc=desc, total=total)
848
+ return iterable
849
+
850
+ all_outputs = []
851
+ all_sources = []
852
+ all_targets = []
853
+ all_entities_info = []
854
+ for data in bigbio_pages:
855
+ sources, targets, entities_info = parse_text(
856
+ data=data,
857
+ start_entity=start_entity,
858
+ end_entity=end_entity,
859
+ start_group=start_group,
860
+ end_group=end_group,
861
+ nlp=nlp, # type: ignore
862
+ )
863
+ all_sources.append(sources)
864
+ all_targets.append(targets)
865
+ all_entities_info.append(entities_info)
866
+
867
+ def _build_sequential_batches():
868
+ # Keep per-page order while still processing multiple pages per batch.
869
+ page_positions = [0] * len(all_sources)
870
+ next_page_idx = 0
871
+ active_pages = []
872
+ batches = []
873
+
874
+ while active_pages or next_page_idx < len(all_sources):
875
+ while len(active_pages) < batch_size and next_page_idx < len(
876
+ all_sources
877
+ ):
878
+ if len(all_sources[next_page_idx]) > 0:
879
+ active_pages.append(next_page_idx)
880
+ next_page_idx += 1
881
+
882
+ if not active_pages:
883
+ break
884
+
885
+ batch = []
886
+ next_active_pages = []
887
+ for page_idx in active_pages:
888
+ item_idx = page_positions[page_idx]
889
+ batch.append((
890
+ all_sources[page_idx][item_idx],
891
+ all_targets[page_idx][item_idx],
892
+ all_entities_info[page_idx][item_idx],
893
+ ))
894
+ page_positions[page_idx] += 1
895
+ if page_positions[page_idx] < len(all_sources[page_idx]):
896
+ next_active_pages.append(page_idx)
897
+
898
+ batches.append(batch)
899
+ active_pages = next_active_pages
900
+
901
+ return batches
902
+
903
+ all_batches = _build_sequential_batches()
904
+
905
+ print(
906
+ f"Input preparation completed. Running generation on {len(all_batches)} batches."
907
+ )
908
+
909
+ all_outputs = []
910
+ all_saliency_maps = []
911
+ batch_previous_targets = {}
912
+ for batch in _progress(
913
+ all_batches,
914
+ desc="Processing batches",
915
+ total=len(all_batches),
916
+ show=show_progress,
917
+ ):
918
+ input_sentences = []
919
+ sem_groups = []
920
+ mentions = []
921
+ doc_ids = []
922
+ mentions_id = []
923
+ gold_concept_codes = []
924
+ gold_concept_names = []
925
+ start_spans = []
926
+ end_spans = []
927
+ for source, target, entity in batch:
928
+ doc_id = entity["doc_id"]
929
+ if doc_id not in batch_previous_targets:
930
+ batch_previous_targets[doc_id] = ""
931
+ previous_targets = batch_previous_targets.get(doc_id)
932
+
933
+ input_sentences.append(
934
+ add_headers_to_prompt(
935
+ source,
936
+ target,
937
+ previous_targets, # type: ignore
938
+ )
939
+ )
940
+ sem_groups.append(entity["semantic_group"])
941
+ mentions.append(entity["mention"])
942
+ doc_ids.append(doc_id)
943
+ mentions_id.append(entity["mention_id"])
944
+ start_spans.append(entity["start_span"])
945
+ end_spans.append(entity["end_span"])
946
+ gold_concept_codes.append(entity.get("gold_concept_code", None)) # type: ignore
947
+ gold_concept_names.append(entity.get("gold_concept_name", None)) # type: ignore
948
+ all_outputs, cleaned_output_sequences, batch_saliency_maps = (
949
+ self.predict_batch(
950
+ all_outputs=all_outputs,
951
+ batch_size=batch_size,
952
+ input_sentences=input_sentences,
953
+ sem_groups=sem_groups,
954
+ mentions=mentions,
955
+ mentions_id=mentions_id,
956
+ doc_ids=doc_ids,
957
+ start_spans=start_spans,
958
+ end_spans=end_spans,
959
+ gold_concept_codes=gold_concept_codes,
960
+ gold_concept_names=gold_concept_names,
961
+ constrained=constrained,
962
+ multiple_answers=multiple_answers,
963
+ num_beams=num_beams,
964
+ explicability_mode=explicability_mode,
965
+ **kwargs,
966
+ )
967
+ )
968
+ if explicability_mode:
969
+ all_saliency_maps.extend(batch_saliency_maps)
970
+ for i, doc_id in enumerate(doc_ids):
971
+ clean_sentence = cleaned_output_sequences[num_beams * i]
972
+ clean_sentence = start_entity + clean_sentence.split(start_entity)[-1]
973
+ clean_sentence = clean_sentence.rstrip() + "\n"
974
+ batch_previous_targets[doc_id] += clean_sentence
975
+
976
+ if explicability_mode:
977
+ return all_outputs, all_saliency_maps # type: ignore
978
+ return all_outputs # type: ignore
979
+
980
+ def encode(self, sentence):
981
+ return self.tokenizer.encode(sentence, return_tensors="pt")[0] # type: ignore
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f11122a7fb2e5016088d38ef16605df6e93811b248f39182c1e20b8cff1b7463
3
+ size 4976706864
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43f1682082f23c098f046c60ff58b5c1eb5dd35eac8306601afb75450842eb69
3
+ size 4999802720
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f52fa7dab26ec4615ba2d5b2ed4130173450947dd1c000a9d9739b5063ca2f87
3
+ size 4915916176
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af65da5f6a8ecc159dd34b4d0be4dc8b1d5335432e8b8bea657e70bb6e91b470
3
+ size 1168147000
model.safetensors.index.json ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 8030269440,
4
+ "total_size": 16060538880
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00004-of-00004.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
11
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
13
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
17
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
18
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
19
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
20
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
21
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
22
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
23
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
24
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
25
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
26
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
27
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
28
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
29
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
30
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
31
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
32
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
33
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
34
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
35
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
36
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
37
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
38
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
40
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
41
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
43
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
44
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
45
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
46
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
47
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
48
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
49
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
50
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
51
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
53
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
54
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
55
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
57
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
58
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
59
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
60
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
61
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
62
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
63
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
64
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
65
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
67
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
68
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
69
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
70
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
71
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
72
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
73
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
74
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
75
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
76
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
77
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
79
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
80
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
81
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
82
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
84
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
85
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
86
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
87
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
89
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
90
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
91
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
92
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
93
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
94
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
95
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
97
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
98
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
99
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
100
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
101
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
102
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
103
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
104
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
105
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
106
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
107
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
108
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
109
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
110
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
111
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
112
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
113
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
114
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
115
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
117
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
118
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
119
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
120
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
121
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
122
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
123
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
124
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
125
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
126
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
127
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
128
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
129
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
130
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
131
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
132
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
133
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
134
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
135
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
136
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
137
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
138
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
139
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
140
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
141
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
142
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
143
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
144
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
145
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
146
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
147
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
148
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
149
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
150
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
151
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
152
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
153
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
154
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
155
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
156
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
157
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
158
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
159
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
160
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
161
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
162
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
163
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
164
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
165
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
166
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
167
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
168
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
169
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
170
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
171
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
172
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
173
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
174
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
175
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
176
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
177
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
179
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
180
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
181
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
182
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
183
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
184
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
185
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
186
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
187
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
189
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
190
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
191
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
193
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
194
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
195
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
197
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
198
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
199
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
200
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
201
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
202
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
203
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
204
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
205
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
206
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
207
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
208
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
209
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
210
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
211
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
212
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
213
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
214
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
215
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
216
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
217
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
218
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
219
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
220
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
221
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
222
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
223
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
224
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
225
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
226
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
227
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
229
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
230
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
231
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
233
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
234
+ "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
235
+ "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
236
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
237
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
238
+ "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
239
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
240
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
241
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
242
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
243
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
244
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
245
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
246
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
247
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
248
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
249
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
250
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
251
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
252
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
253
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
254
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
255
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
256
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
257
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
258
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
259
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
260
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
261
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
262
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
263
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
264
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
265
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
266
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
267
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
268
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
269
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
270
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
271
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
272
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
273
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
274
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
275
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
276
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
277
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
278
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
279
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
280
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
281
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
282
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
283
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
284
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
285
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
286
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
287
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
288
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
289
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
290
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
291
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
292
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
293
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
294
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
295
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
296
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
297
+ "model.norm.weight": "model-00004-of-00004.safetensors"
298
+ }
299
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1273e28f2c3d3a2e7df4915698b8ac32334b9b1a7b964a7ee6b0b640313a404f
3
+ size 32121333167
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91a1a9bf984f5d845a3b4eb95d54e9cfc7ed36490e795a1b355975eae9b98700
3
+ size 14645
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed5c2aba955db99894ccedba5e103eb87b693fa40acb652acf91dd1a19aef81b
3
+ size 1465
special_tokens_map.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "[",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "]",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "{",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "}",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<+>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ }
38
+ ],
39
+ "bos_token": {
40
+ "content": "<|begin_of_text|>",
41
+ "lstrip": false,
42
+ "normalized": false,
43
+ "rstrip": false,
44
+ "single_word": false
45
+ },
46
+ "eos_token": {
47
+ "content": "<|eot_id|>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false
52
+ },
53
+ "pad_token": {
54
+ "content": "<|eot_id|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false
59
+ }
60
+ }
text_to_code.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b95ae9165d70692681902ea91875f9120f94415bbba754fabe6047fafb78bae0
3
+ size 494763280
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11ac3b66638a75d981484ee3713682e63c142ad255bd7cd96d9635ad5e654cdd
3
+ size 17210796
tokenizer_config.json ADDED
@@ -0,0 +1,2110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "58": {
4
+ "content": "[",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "60": {
12
+ "content": "]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "90": {
20
+ "content": "{",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "92": {
28
+ "content": "}",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "<|begin_of_text|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128001": {
44
+ "content": "<|end_of_text|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128002": {
52
+ "content": "<|reserved_special_token_0|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128003": {
60
+ "content": "<|reserved_special_token_1|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128004": {
68
+ "content": "<|finetune_right_pad_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128005": {
76
+ "content": "<|reserved_special_token_2|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128006": {
84
+ "content": "<|start_header_id|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128007": {
92
+ "content": "<|end_header_id|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128008": {
100
+ "content": "<|eom_id|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128009": {
108
+ "content": "<|eot_id|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128010": {
116
+ "content": "<|python_tag|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128011": {
124
+ "content": "<|reserved_special_token_3|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128012": {
132
+ "content": "<|reserved_special_token_4|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128013": {
140
+ "content": "<|reserved_special_token_5|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128014": {
148
+ "content": "<|reserved_special_token_6|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128015": {
156
+ "content": "<|reserved_special_token_7|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128016": {
164
+ "content": "<|reserved_special_token_8|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128017": {
172
+ "content": "<|reserved_special_token_9|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128018": {
180
+ "content": "<|reserved_special_token_10|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128019": {
188
+ "content": "<|reserved_special_token_11|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128020": {
196
+ "content": "<|reserved_special_token_12|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128021": {
204
+ "content": "<|reserved_special_token_13|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128022": {
212
+ "content": "<|reserved_special_token_14|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128023": {
220
+ "content": "<|reserved_special_token_15|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128024": {
228
+ "content": "<|reserved_special_token_16|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128025": {
236
+ "content": "<|reserved_special_token_17|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128026": {
244
+ "content": "<|reserved_special_token_18|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128027": {
252
+ "content": "<|reserved_special_token_19|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128028": {
260
+ "content": "<|reserved_special_token_20|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128029": {
268
+ "content": "<|reserved_special_token_21|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128030": {
276
+ "content": "<|reserved_special_token_22|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128031": {
284
+ "content": "<|reserved_special_token_23|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128032": {
292
+ "content": "<|reserved_special_token_24|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128033": {
300
+ "content": "<|reserved_special_token_25|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128034": {
308
+ "content": "<|reserved_special_token_26|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128035": {
316
+ "content": "<|reserved_special_token_27|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128036": {
324
+ "content": "<|reserved_special_token_28|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128037": {
332
+ "content": "<|reserved_special_token_29|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128038": {
340
+ "content": "<|reserved_special_token_30|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128039": {
348
+ "content": "<|reserved_special_token_31|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128040": {
356
+ "content": "<|reserved_special_token_32|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128041": {
364
+ "content": "<|reserved_special_token_33|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128042": {
372
+ "content": "<|reserved_special_token_34|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128043": {
380
+ "content": "<|reserved_special_token_35|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128044": {
388
+ "content": "<|reserved_special_token_36|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128045": {
396
+ "content": "<|reserved_special_token_37|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128046": {
404
+ "content": "<|reserved_special_token_38|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128047": {
412
+ "content": "<|reserved_special_token_39|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128048": {
420
+ "content": "<|reserved_special_token_40|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128049": {
428
+ "content": "<|reserved_special_token_41|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128050": {
436
+ "content": "<|reserved_special_token_42|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128051": {
444
+ "content": "<|reserved_special_token_43|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128052": {
452
+ "content": "<|reserved_special_token_44|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128053": {
460
+ "content": "<|reserved_special_token_45|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128054": {
468
+ "content": "<|reserved_special_token_46|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128055": {
476
+ "content": "<|reserved_special_token_47|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128056": {
484
+ "content": "<|reserved_special_token_48|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128057": {
492
+ "content": "<|reserved_special_token_49|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128058": {
500
+ "content": "<|reserved_special_token_50|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128059": {
508
+ "content": "<|reserved_special_token_51|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128060": {
516
+ "content": "<|reserved_special_token_52|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128061": {
524
+ "content": "<|reserved_special_token_53|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128062": {
532
+ "content": "<|reserved_special_token_54|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128063": {
540
+ "content": "<|reserved_special_token_55|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128064": {
548
+ "content": "<|reserved_special_token_56|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128065": {
556
+ "content": "<|reserved_special_token_57|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128066": {
564
+ "content": "<|reserved_special_token_58|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128067": {
572
+ "content": "<|reserved_special_token_59|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128068": {
580
+ "content": "<|reserved_special_token_60|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128069": {
588
+ "content": "<|reserved_special_token_61|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128070": {
596
+ "content": "<|reserved_special_token_62|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128071": {
604
+ "content": "<|reserved_special_token_63|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128072": {
612
+ "content": "<|reserved_special_token_64|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128073": {
620
+ "content": "<|reserved_special_token_65|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128074": {
628
+ "content": "<|reserved_special_token_66|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128075": {
636
+ "content": "<|reserved_special_token_67|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128076": {
644
+ "content": "<|reserved_special_token_68|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128077": {
652
+ "content": "<|reserved_special_token_69|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128078": {
660
+ "content": "<|reserved_special_token_70|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128079": {
668
+ "content": "<|reserved_special_token_71|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128080": {
676
+ "content": "<|reserved_special_token_72|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128081": {
684
+ "content": "<|reserved_special_token_73|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128082": {
692
+ "content": "<|reserved_special_token_74|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128083": {
700
+ "content": "<|reserved_special_token_75|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128084": {
708
+ "content": "<|reserved_special_token_76|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128085": {
716
+ "content": "<|reserved_special_token_77|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128086": {
724
+ "content": "<|reserved_special_token_78|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128087": {
732
+ "content": "<|reserved_special_token_79|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128088": {
740
+ "content": "<|reserved_special_token_80|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128089": {
748
+ "content": "<|reserved_special_token_81|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128090": {
756
+ "content": "<|reserved_special_token_82|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128091": {
764
+ "content": "<|reserved_special_token_83|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128092": {
772
+ "content": "<|reserved_special_token_84|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128093": {
780
+ "content": "<|reserved_special_token_85|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128094": {
788
+ "content": "<|reserved_special_token_86|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128095": {
796
+ "content": "<|reserved_special_token_87|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128096": {
804
+ "content": "<|reserved_special_token_88|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128097": {
812
+ "content": "<|reserved_special_token_89|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128098": {
820
+ "content": "<|reserved_special_token_90|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128099": {
828
+ "content": "<|reserved_special_token_91|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128100": {
836
+ "content": "<|reserved_special_token_92|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128101": {
844
+ "content": "<|reserved_special_token_93|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128102": {
852
+ "content": "<|reserved_special_token_94|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128103": {
860
+ "content": "<|reserved_special_token_95|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128104": {
868
+ "content": "<|reserved_special_token_96|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128105": {
876
+ "content": "<|reserved_special_token_97|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128106": {
884
+ "content": "<|reserved_special_token_98|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128107": {
892
+ "content": "<|reserved_special_token_99|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128108": {
900
+ "content": "<|reserved_special_token_100|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128109": {
908
+ "content": "<|reserved_special_token_101|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128110": {
916
+ "content": "<|reserved_special_token_102|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128111": {
924
+ "content": "<|reserved_special_token_103|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128112": {
932
+ "content": "<|reserved_special_token_104|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128113": {
940
+ "content": "<|reserved_special_token_105|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128114": {
948
+ "content": "<|reserved_special_token_106|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128115": {
956
+ "content": "<|reserved_special_token_107|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128116": {
964
+ "content": "<|reserved_special_token_108|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128117": {
972
+ "content": "<|reserved_special_token_109|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128118": {
980
+ "content": "<|reserved_special_token_110|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128119": {
988
+ "content": "<|reserved_special_token_111|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128120": {
996
+ "content": "<|reserved_special_token_112|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128121": {
1004
+ "content": "<|reserved_special_token_113|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128122": {
1012
+ "content": "<|reserved_special_token_114|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128123": {
1020
+ "content": "<|reserved_special_token_115|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128124": {
1028
+ "content": "<|reserved_special_token_116|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128125": {
1036
+ "content": "<|reserved_special_token_117|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128126": {
1044
+ "content": "<|reserved_special_token_118|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128127": {
1052
+ "content": "<|reserved_special_token_119|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128128": {
1060
+ "content": "<|reserved_special_token_120|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128129": {
1068
+ "content": "<|reserved_special_token_121|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128130": {
1076
+ "content": "<|reserved_special_token_122|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128131": {
1084
+ "content": "<|reserved_special_token_123|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128132": {
1092
+ "content": "<|reserved_special_token_124|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128133": {
1100
+ "content": "<|reserved_special_token_125|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128134": {
1108
+ "content": "<|reserved_special_token_126|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128135": {
1116
+ "content": "<|reserved_special_token_127|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128136": {
1124
+ "content": "<|reserved_special_token_128|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128137": {
1132
+ "content": "<|reserved_special_token_129|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128138": {
1140
+ "content": "<|reserved_special_token_130|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128139": {
1148
+ "content": "<|reserved_special_token_131|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128140": {
1156
+ "content": "<|reserved_special_token_132|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128141": {
1164
+ "content": "<|reserved_special_token_133|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128142": {
1172
+ "content": "<|reserved_special_token_134|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128143": {
1180
+ "content": "<|reserved_special_token_135|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128144": {
1188
+ "content": "<|reserved_special_token_136|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128145": {
1196
+ "content": "<|reserved_special_token_137|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128146": {
1204
+ "content": "<|reserved_special_token_138|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128147": {
1212
+ "content": "<|reserved_special_token_139|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128148": {
1220
+ "content": "<|reserved_special_token_140|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128149": {
1228
+ "content": "<|reserved_special_token_141|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128150": {
1236
+ "content": "<|reserved_special_token_142|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128151": {
1244
+ "content": "<|reserved_special_token_143|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128152": {
1252
+ "content": "<|reserved_special_token_144|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128153": {
1260
+ "content": "<|reserved_special_token_145|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128154": {
1268
+ "content": "<|reserved_special_token_146|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128155": {
1276
+ "content": "<|reserved_special_token_147|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128156": {
1284
+ "content": "<|reserved_special_token_148|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128157": {
1292
+ "content": "<|reserved_special_token_149|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128158": {
1300
+ "content": "<|reserved_special_token_150|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128159": {
1308
+ "content": "<|reserved_special_token_151|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128160": {
1316
+ "content": "<|reserved_special_token_152|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128161": {
1324
+ "content": "<|reserved_special_token_153|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128162": {
1332
+ "content": "<|reserved_special_token_154|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128163": {
1340
+ "content": "<|reserved_special_token_155|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128164": {
1348
+ "content": "<|reserved_special_token_156|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128165": {
1356
+ "content": "<|reserved_special_token_157|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128166": {
1364
+ "content": "<|reserved_special_token_158|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128167": {
1372
+ "content": "<|reserved_special_token_159|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128168": {
1380
+ "content": "<|reserved_special_token_160|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128169": {
1388
+ "content": "<|reserved_special_token_161|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128170": {
1396
+ "content": "<|reserved_special_token_162|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128171": {
1404
+ "content": "<|reserved_special_token_163|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128172": {
1412
+ "content": "<|reserved_special_token_164|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128173": {
1420
+ "content": "<|reserved_special_token_165|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128174": {
1428
+ "content": "<|reserved_special_token_166|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128175": {
1436
+ "content": "<|reserved_special_token_167|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128176": {
1444
+ "content": "<|reserved_special_token_168|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128177": {
1452
+ "content": "<|reserved_special_token_169|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128178": {
1460
+ "content": "<|reserved_special_token_170|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128179": {
1468
+ "content": "<|reserved_special_token_171|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128180": {
1476
+ "content": "<|reserved_special_token_172|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128181": {
1484
+ "content": "<|reserved_special_token_173|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128182": {
1492
+ "content": "<|reserved_special_token_174|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128183": {
1500
+ "content": "<|reserved_special_token_175|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128184": {
1508
+ "content": "<|reserved_special_token_176|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128185": {
1516
+ "content": "<|reserved_special_token_177|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128186": {
1524
+ "content": "<|reserved_special_token_178|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128187": {
1532
+ "content": "<|reserved_special_token_179|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128188": {
1540
+ "content": "<|reserved_special_token_180|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128189": {
1548
+ "content": "<|reserved_special_token_181|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128190": {
1556
+ "content": "<|reserved_special_token_182|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128191": {
1564
+ "content": "<|reserved_special_token_183|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128192": {
1572
+ "content": "<|reserved_special_token_184|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128193": {
1580
+ "content": "<|reserved_special_token_185|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128194": {
1588
+ "content": "<|reserved_special_token_186|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128195": {
1596
+ "content": "<|reserved_special_token_187|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128196": {
1604
+ "content": "<|reserved_special_token_188|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128197": {
1612
+ "content": "<|reserved_special_token_189|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128198": {
1620
+ "content": "<|reserved_special_token_190|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128199": {
1628
+ "content": "<|reserved_special_token_191|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128200": {
1636
+ "content": "<|reserved_special_token_192|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128201": {
1644
+ "content": "<|reserved_special_token_193|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128202": {
1652
+ "content": "<|reserved_special_token_194|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128203": {
1660
+ "content": "<|reserved_special_token_195|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128204": {
1668
+ "content": "<|reserved_special_token_196|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128205": {
1676
+ "content": "<|reserved_special_token_197|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128206": {
1684
+ "content": "<|reserved_special_token_198|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128207": {
1692
+ "content": "<|reserved_special_token_199|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128208": {
1700
+ "content": "<|reserved_special_token_200|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128209": {
1708
+ "content": "<|reserved_special_token_201|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128210": {
1716
+ "content": "<|reserved_special_token_202|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128211": {
1724
+ "content": "<|reserved_special_token_203|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128212": {
1732
+ "content": "<|reserved_special_token_204|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128213": {
1740
+ "content": "<|reserved_special_token_205|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128214": {
1748
+ "content": "<|reserved_special_token_206|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128215": {
1756
+ "content": "<|reserved_special_token_207|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128216": {
1764
+ "content": "<|reserved_special_token_208|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128217": {
1772
+ "content": "<|reserved_special_token_209|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128218": {
1780
+ "content": "<|reserved_special_token_210|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128219": {
1788
+ "content": "<|reserved_special_token_211|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128220": {
1796
+ "content": "<|reserved_special_token_212|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128221": {
1804
+ "content": "<|reserved_special_token_213|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128222": {
1812
+ "content": "<|reserved_special_token_214|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128223": {
1820
+ "content": "<|reserved_special_token_215|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128224": {
1828
+ "content": "<|reserved_special_token_216|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128225": {
1836
+ "content": "<|reserved_special_token_217|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128226": {
1844
+ "content": "<|reserved_special_token_218|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128227": {
1852
+ "content": "<|reserved_special_token_219|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128228": {
1860
+ "content": "<|reserved_special_token_220|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128229": {
1868
+ "content": "<|reserved_special_token_221|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128230": {
1876
+ "content": "<|reserved_special_token_222|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128231": {
1884
+ "content": "<|reserved_special_token_223|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128232": {
1892
+ "content": "<|reserved_special_token_224|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128233": {
1900
+ "content": "<|reserved_special_token_225|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128234": {
1908
+ "content": "<|reserved_special_token_226|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128235": {
1916
+ "content": "<|reserved_special_token_227|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128236": {
1924
+ "content": "<|reserved_special_token_228|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128237": {
1932
+ "content": "<|reserved_special_token_229|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128238": {
1940
+ "content": "<|reserved_special_token_230|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128239": {
1948
+ "content": "<|reserved_special_token_231|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128240": {
1956
+ "content": "<|reserved_special_token_232|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128241": {
1964
+ "content": "<|reserved_special_token_233|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128242": {
1972
+ "content": "<|reserved_special_token_234|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128243": {
1980
+ "content": "<|reserved_special_token_235|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128244": {
1988
+ "content": "<|reserved_special_token_236|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128245": {
1996
+ "content": "<|reserved_special_token_237|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128246": {
2004
+ "content": "<|reserved_special_token_238|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128247": {
2012
+ "content": "<|reserved_special_token_239|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128248": {
2020
+ "content": "<|reserved_special_token_240|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128249": {
2028
+ "content": "<|reserved_special_token_241|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128250": {
2036
+ "content": "<|reserved_special_token_242|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128251": {
2044
+ "content": "<|reserved_special_token_243|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ },
2051
+ "128252": {
2052
+ "content": "<|reserved_special_token_244|>",
2053
+ "lstrip": false,
2054
+ "normalized": false,
2055
+ "rstrip": false,
2056
+ "single_word": false,
2057
+ "special": true
2058
+ },
2059
+ "128253": {
2060
+ "content": "<|reserved_special_token_245|>",
2061
+ "lstrip": false,
2062
+ "normalized": false,
2063
+ "rstrip": false,
2064
+ "single_word": false,
2065
+ "special": true
2066
+ },
2067
+ "128254": {
2068
+ "content": "<|reserved_special_token_246|>",
2069
+ "lstrip": false,
2070
+ "normalized": false,
2071
+ "rstrip": false,
2072
+ "single_word": false,
2073
+ "special": true
2074
+ },
2075
+ "128255": {
2076
+ "content": "<|reserved_special_token_247|>",
2077
+ "lstrip": false,
2078
+ "normalized": false,
2079
+ "rstrip": false,
2080
+ "single_word": false,
2081
+ "special": true
2082
+ },
2083
+ "128256": {
2084
+ "content": "<+>",
2085
+ "lstrip": false,
2086
+ "normalized": false,
2087
+ "rstrip": false,
2088
+ "single_word": false,
2089
+ "special": true
2090
+ }
2091
+ },
2092
+ "additional_special_tokens": [
2093
+ "[",
2094
+ "]",
2095
+ "{",
2096
+ "}",
2097
+ "<+>"
2098
+ ],
2099
+ "bos_token": "<|begin_of_text|>",
2100
+ "clean_up_tokenization_spaces": true,
2101
+ "eos_token": "<|eot_id|>",
2102
+ "extra_special_tokens": {},
2103
+ "model_input_names": [
2104
+ "input_ids",
2105
+ "attention_mask"
2106
+ ],
2107
+ "model_max_length": 131072,
2108
+ "pad_token": "<|eot_id|>",
2109
+ "tokenizer_class": "PreTrainedTokenizerFast"
2110
+ }
trainer_state.json ADDED
@@ -0,0 +1,1234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 7359,
3
+ "best_metric": 0.8462,
4
+ "best_model_checkpoint": "models/NED/EMEA_human_only_tfidf_hybrid_long_v2_addheaders/Llama-3.1-8B-Instruct/checkpoint-7359",
5
+ "epoch": 50.0,
6
+ "eval_steps": 500,
7
+ "global_step": 122650,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "entropy": 1.1526817805758311,
14
+ "epoch": 1.0,
15
+ "grad_norm": 304.0,
16
+ "learning_rate": 1.9989130434782608e-05,
17
+ "loss": 0.7669,
18
+ "mean_token_accuracy": 0.8752253057546777,
19
+ "num_tokens": 15010779.0,
20
+ "step": 2453
21
+ },
22
+ {
23
+ "epoch": 1.0,
24
+ "eval_entropy": 1.2358426589232225,
25
+ "eval_loss": 0.6339517831802368,
26
+ "eval_mean_token_accuracy": 0.8988095246828519,
27
+ "eval_num_gold": 26,
28
+ "eval_num_guess": 26,
29
+ "eval_num_tokens": 15010779.0,
30
+ "eval_recall": 0.7308,
31
+ "eval_runtime": 3.6399,
32
+ "eval_samples_per_second": 7.143,
33
+ "eval_steps_per_second": 3.571,
34
+ "step": 2453
35
+ },
36
+ {
37
+ "entropy": 1.3605892632720036,
38
+ "epoch": 2.0,
39
+ "grad_norm": 12.1875,
40
+ "learning_rate": 2.9691098596284776e-05,
41
+ "loss": 0.5437,
42
+ "mean_token_accuracy": 0.9150349811612466,
43
+ "num_tokens": 30021558.0,
44
+ "step": 4906
45
+ },
46
+ {
47
+ "epoch": 2.0,
48
+ "eval_entropy": 1.1509519540346587,
49
+ "eval_loss": 0.4853871166706085,
50
+ "eval_mean_token_accuracy": 0.9201437464127173,
51
+ "eval_num_gold": 26,
52
+ "eval_num_guess": 26,
53
+ "eval_num_tokens": 30021558.0,
54
+ "eval_recall": 0.7692,
55
+ "eval_runtime": 3.627,
56
+ "eval_samples_per_second": 7.168,
57
+ "eval_steps_per_second": 3.584,
58
+ "step": 4906
59
+ },
60
+ {
61
+ "entropy": 1.1862413553719222,
62
+ "epoch": 3.0,
63
+ "grad_norm": 2.1875,
64
+ "learning_rate": 2.9072539295620746e-05,
65
+ "loss": 0.2619,
66
+ "mean_token_accuracy": 0.9548876376794495,
67
+ "num_tokens": 45032337.0,
68
+ "step": 7359
69
+ },
70
+ {
71
+ "epoch": 3.0,
72
+ "eval_entropy": 1.019592651954064,
73
+ "eval_loss": 0.5770813822746277,
74
+ "eval_mean_token_accuracy": 0.9220362993387076,
75
+ "eval_num_gold": 26,
76
+ "eval_num_guess": 26,
77
+ "eval_num_tokens": 45032337.0,
78
+ "eval_recall": 0.8462,
79
+ "eval_runtime": 3.6363,
80
+ "eval_samples_per_second": 7.15,
81
+ "eval_steps_per_second": 3.575,
82
+ "step": 7359
83
+ },
84
+ {
85
+ "entropy": 0.9634018300311497,
86
+ "epoch": 4.0,
87
+ "grad_norm": 0.1240234375,
88
+ "learning_rate": 2.8453979994956713e-05,
89
+ "loss": 0.1216,
90
+ "mean_token_accuracy": 0.9782008502466845,
91
+ "num_tokens": 60043116.0,
92
+ "step": 9812
93
+ },
94
+ {
95
+ "epoch": 4.0,
96
+ "eval_entropy": 0.8699520321992728,
97
+ "eval_loss": 0.5446107387542725,
98
+ "eval_mean_token_accuracy": 0.940018314581651,
99
+ "eval_num_gold": 26,
100
+ "eval_num_guess": 26,
101
+ "eval_num_tokens": 60043116.0,
102
+ "eval_recall": 0.8462,
103
+ "eval_runtime": 3.6143,
104
+ "eval_samples_per_second": 7.194,
105
+ "eval_steps_per_second": 3.597,
106
+ "step": 9812
107
+ },
108
+ {
109
+ "entropy": 0.7849812429144681,
110
+ "epoch": 5.0,
111
+ "grad_norm": 0.002227783203125,
112
+ "learning_rate": 2.783542069429268e-05,
113
+ "loss": 0.0517,
114
+ "mean_token_accuracy": 0.9894482943411997,
115
+ "num_tokens": 75053895.0,
116
+ "step": 12265
117
+ },
118
+ {
119
+ "epoch": 5.0,
120
+ "eval_entropy": 0.6801113898937519,
121
+ "eval_loss": 0.7289856672286987,
122
+ "eval_mean_token_accuracy": 0.9444444454633273,
123
+ "eval_num_gold": 26,
124
+ "eval_num_guess": 26,
125
+ "eval_num_tokens": 75053895.0,
126
+ "eval_recall": 0.8462,
127
+ "eval_runtime": 3.6486,
128
+ "eval_samples_per_second": 7.126,
129
+ "eval_steps_per_second": 3.563,
130
+ "step": 12265
131
+ },
132
+ {
133
+ "entropy": 0.6892432886826181,
134
+ "epoch": 6.0,
135
+ "grad_norm": 0.0004749298095703125,
136
+ "learning_rate": 2.721686139362865e-05,
137
+ "loss": 0.0209,
138
+ "mean_token_accuracy": 0.9958273216359138,
139
+ "num_tokens": 90064674.0,
140
+ "step": 14718
141
+ },
142
+ {
143
+ "epoch": 6.0,
144
+ "eval_entropy": 0.577189931502709,
145
+ "eval_loss": 0.7246649265289307,
146
+ "eval_mean_token_accuracy": 0.9444444454633273,
147
+ "eval_num_gold": 26,
148
+ "eval_num_guess": 26,
149
+ "eval_num_tokens": 90064674.0,
150
+ "eval_recall": 0.8462,
151
+ "eval_runtime": 3.6456,
152
+ "eval_samples_per_second": 7.132,
153
+ "eval_steps_per_second": 3.566,
154
+ "step": 14718
155
+ },
156
+ {
157
+ "entropy": 0.6557439389371696,
158
+ "epoch": 7.0,
159
+ "grad_norm": 0.000888824462890625,
160
+ "learning_rate": 2.659830209296461e-05,
161
+ "loss": 0.0078,
162
+ "mean_token_accuracy": 0.9979321826393635,
163
+ "num_tokens": 105075453.0,
164
+ "step": 17171
165
+ },
166
+ {
167
+ "epoch": 7.0,
168
+ "eval_entropy": 0.5603500146132249,
169
+ "eval_loss": 0.8045116662979126,
170
+ "eval_mean_token_accuracy": 0.9358974374257601,
171
+ "eval_num_gold": 26,
172
+ "eval_num_guess": 26,
173
+ "eval_num_tokens": 105075453.0,
174
+ "eval_recall": 0.8462,
175
+ "eval_runtime": 5.557,
176
+ "eval_samples_per_second": 4.679,
177
+ "eval_steps_per_second": 2.339,
178
+ "step": 17171
179
+ },
180
+ {
181
+ "entropy": 0.6481096161976669,
182
+ "epoch": 8.0,
183
+ "grad_norm": 8.96453857421875e-05,
184
+ "learning_rate": 2.597974279230058e-05,
185
+ "loss": 0.0028,
186
+ "mean_token_accuracy": 0.9993061645391568,
187
+ "num_tokens": 120086232.0,
188
+ "step": 19624
189
+ },
190
+ {
191
+ "epoch": 8.0,
192
+ "eval_entropy": 0.5650725089586698,
193
+ "eval_loss": 0.8335245847702026,
194
+ "eval_mean_token_accuracy": 0.9358974374257601,
195
+ "eval_num_gold": 26,
196
+ "eval_num_guess": 26,
197
+ "eval_num_tokens": 120086232.0,
198
+ "eval_recall": 0.8462,
199
+ "eval_runtime": 3.6391,
200
+ "eval_samples_per_second": 7.145,
201
+ "eval_steps_per_second": 3.572,
202
+ "step": 19624
203
+ },
204
+ {
205
+ "entropy": 0.6384989822756452,
206
+ "epoch": 9.0,
207
+ "grad_norm": 0.00102996826171875,
208
+ "learning_rate": 2.5361183491636548e-05,
209
+ "loss": 0.0011,
210
+ "mean_token_accuracy": 0.9997574686275129,
211
+ "num_tokens": 135097011.0,
212
+ "step": 22077
213
+ },
214
+ {
215
+ "epoch": 9.0,
216
+ "eval_entropy": 0.5437194108963013,
217
+ "eval_loss": 0.8720409870147705,
218
+ "eval_mean_token_accuracy": 0.9358974374257601,
219
+ "eval_num_gold": 26,
220
+ "eval_num_guess": 26,
221
+ "eval_num_tokens": 135097011.0,
222
+ "eval_recall": 0.8462,
223
+ "eval_runtime": 3.6696,
224
+ "eval_samples_per_second": 7.085,
225
+ "eval_steps_per_second": 3.543,
226
+ "step": 22077
227
+ },
228
+ {
229
+ "entropy": 0.6327040182586792,
230
+ "epoch": 10.0,
231
+ "grad_norm": 0.00011968612670898438,
232
+ "learning_rate": 2.4742624190972517e-05,
233
+ "loss": 0.0002,
234
+ "mean_token_accuracy": 0.9999592335818012,
235
+ "num_tokens": 150107790.0,
236
+ "step": 24530
237
+ },
238
+ {
239
+ "epoch": 10.0,
240
+ "eval_entropy": 0.5456434029799241,
241
+ "eval_loss": 0.8786986470222473,
242
+ "eval_mean_token_accuracy": 0.9358974374257601,
243
+ "eval_num_gold": 26,
244
+ "eval_num_guess": 26,
245
+ "eval_num_tokens": 150107790.0,
246
+ "eval_recall": 0.8462,
247
+ "eval_runtime": 3.7213,
248
+ "eval_samples_per_second": 6.987,
249
+ "eval_steps_per_second": 3.493,
250
+ "step": 24530
251
+ },
252
+ {
253
+ "entropy": 0.6342776355527636,
254
+ "epoch": 11.0,
255
+ "grad_norm": 2.9206275939941406e-05,
256
+ "learning_rate": 2.412406489030848e-05,
257
+ "loss": 0.0001,
258
+ "mean_token_accuracy": 0.9999629396397,
259
+ "num_tokens": 165118569.0,
260
+ "step": 26983
261
+ },
262
+ {
263
+ "epoch": 11.0,
264
+ "eval_entropy": 0.5441241906239436,
265
+ "eval_loss": 0.8776129484176636,
266
+ "eval_mean_token_accuracy": 0.9358974374257601,
267
+ "eval_num_gold": 26,
268
+ "eval_num_guess": 26,
269
+ "eval_num_tokens": 165118569.0,
270
+ "eval_recall": 0.8462,
271
+ "eval_runtime": 3.6285,
272
+ "eval_samples_per_second": 7.165,
273
+ "eval_steps_per_second": 3.583,
274
+ "step": 26983
275
+ },
276
+ {
277
+ "entropy": 0.6330991076222742,
278
+ "epoch": 12.0,
279
+ "grad_norm": 0.000823974609375,
280
+ "learning_rate": 2.350550558964445e-05,
281
+ "loss": 0.0,
282
+ "mean_token_accuracy": 1.0,
283
+ "num_tokens": 180129348.0,
284
+ "step": 29436
285
+ },
286
+ {
287
+ "epoch": 12.0,
288
+ "eval_entropy": 0.544509245799138,
289
+ "eval_loss": 0.88084477186203,
290
+ "eval_mean_token_accuracy": 0.9358974374257601,
291
+ "eval_num_gold": 26,
292
+ "eval_num_guess": 26,
293
+ "eval_num_tokens": 180129348.0,
294
+ "eval_recall": 0.8462,
295
+ "eval_runtime": 3.6661,
296
+ "eval_samples_per_second": 7.092,
297
+ "eval_steps_per_second": 3.546,
298
+ "step": 29436
299
+ },
300
+ {
301
+ "entropy": 0.6322705759061291,
302
+ "epoch": 13.0,
303
+ "grad_norm": 0.010498046875,
304
+ "learning_rate": 2.2886946288980416e-05,
305
+ "loss": 0.0,
306
+ "mean_token_accuracy": 1.0,
307
+ "num_tokens": 195140127.0,
308
+ "step": 31889
309
+ },
310
+ {
311
+ "epoch": 13.0,
312
+ "eval_entropy": 0.5434356606923617,
313
+ "eval_loss": 0.8842343091964722,
314
+ "eval_mean_token_accuracy": 0.9358974374257601,
315
+ "eval_num_gold": 26,
316
+ "eval_num_guess": 26,
317
+ "eval_num_tokens": 195140127.0,
318
+ "eval_recall": 0.8462,
319
+ "eval_runtime": 4.1268,
320
+ "eval_samples_per_second": 6.3,
321
+ "eval_steps_per_second": 3.15,
322
+ "step": 31889
323
+ },
324
+ {
325
+ "entropy": 0.6316640121908612,
326
+ "epoch": 14.0,
327
+ "grad_norm": 0.0035552978515625,
328
+ "learning_rate": 2.2268386988316383e-05,
329
+ "loss": 0.0,
330
+ "mean_token_accuracy": 1.0,
331
+ "num_tokens": 210150906.0,
332
+ "step": 34342
333
+ },
334
+ {
335
+ "epoch": 14.0,
336
+ "eval_entropy": 0.543243577847114,
337
+ "eval_loss": 0.885927140712738,
338
+ "eval_mean_token_accuracy": 0.9358974374257601,
339
+ "eval_num_gold": 26,
340
+ "eval_num_guess": 26,
341
+ "eval_num_tokens": 210150906.0,
342
+ "eval_recall": 0.8462,
343
+ "eval_runtime": 3.7188,
344
+ "eval_samples_per_second": 6.991,
345
+ "eval_steps_per_second": 3.496,
346
+ "step": 34342
347
+ },
348
+ {
349
+ "entropy": 0.6321596540070241,
350
+ "epoch": 15.0,
351
+ "grad_norm": 2.4199485778808594e-05,
352
+ "learning_rate": 2.164982768765235e-05,
353
+ "loss": 0.0,
354
+ "mean_token_accuracy": 1.0,
355
+ "num_tokens": 225161685.0,
356
+ "step": 36795
357
+ },
358
+ {
359
+ "epoch": 15.0,
360
+ "eval_entropy": 0.5422769280580374,
361
+ "eval_loss": 0.8823052644729614,
362
+ "eval_mean_token_accuracy": 0.9358974374257601,
363
+ "eval_num_gold": 26,
364
+ "eval_num_guess": 26,
365
+ "eval_num_tokens": 225161685.0,
366
+ "eval_recall": 0.8462,
367
+ "eval_runtime": 3.6723,
368
+ "eval_samples_per_second": 7.08,
369
+ "eval_steps_per_second": 3.54,
370
+ "step": 36795
371
+ },
372
+ {
373
+ "entropy": 0.6315903761194426,
374
+ "epoch": 16.0,
375
+ "grad_norm": 0.0291748046875,
376
+ "learning_rate": 2.1031268386988316e-05,
377
+ "loss": 0.0,
378
+ "mean_token_accuracy": 1.0,
379
+ "num_tokens": 240172464.0,
380
+ "step": 39248
381
+ },
382
+ {
383
+ "epoch": 16.0,
384
+ "eval_entropy": 0.5426660546889672,
385
+ "eval_loss": 0.8869765996932983,
386
+ "eval_mean_token_accuracy": 0.9358974374257601,
387
+ "eval_num_gold": 26,
388
+ "eval_num_guess": 26,
389
+ "eval_num_tokens": 240172464.0,
390
+ "eval_recall": 0.8462,
391
+ "eval_runtime": 3.6896,
392
+ "eval_samples_per_second": 7.047,
393
+ "eval_steps_per_second": 3.523,
394
+ "step": 39248
395
+ },
396
+ {
397
+ "entropy": 0.6317922561279472,
398
+ "epoch": 17.0,
399
+ "grad_norm": 0.0001850128173828125,
400
+ "learning_rate": 2.0412709086324285e-05,
401
+ "loss": 0.0,
402
+ "mean_token_accuracy": 1.0,
403
+ "num_tokens": 255183243.0,
404
+ "step": 41701
405
+ },
406
+ {
407
+ "epoch": 17.0,
408
+ "eval_entropy": 0.542809899036701,
409
+ "eval_loss": 0.8864607214927673,
410
+ "eval_mean_token_accuracy": 0.9358974374257601,
411
+ "eval_num_gold": 26,
412
+ "eval_num_guess": 26,
413
+ "eval_num_tokens": 255183243.0,
414
+ "eval_recall": 0.8462,
415
+ "eval_runtime": 3.6498,
416
+ "eval_samples_per_second": 7.124,
417
+ "eval_steps_per_second": 3.562,
418
+ "step": 41701
419
+ },
420
+ {
421
+ "entropy": 0.6319634849034763,
422
+ "epoch": 18.0,
423
+ "grad_norm": 2.1457672119140625e-05,
424
+ "learning_rate": 1.979414978566025e-05,
425
+ "loss": 0.0,
426
+ "mean_token_accuracy": 1.0,
427
+ "num_tokens": 270194022.0,
428
+ "step": 44154
429
+ },
430
+ {
431
+ "epoch": 18.0,
432
+ "eval_entropy": 0.5426488243616544,
433
+ "eval_loss": 0.8861849308013916,
434
+ "eval_mean_token_accuracy": 0.9358974374257601,
435
+ "eval_num_gold": 26,
436
+ "eval_num_guess": 26,
437
+ "eval_num_tokens": 270194022.0,
438
+ "eval_recall": 0.8462,
439
+ "eval_runtime": 3.6568,
440
+ "eval_samples_per_second": 7.11,
441
+ "eval_steps_per_second": 3.555,
442
+ "step": 44154
443
+ },
444
+ {
445
+ "entropy": 0.631338802688325,
446
+ "epoch": 19.0,
447
+ "grad_norm": 4.076957702636719e-05,
448
+ "learning_rate": 1.9175590484996218e-05,
449
+ "loss": 0.0,
450
+ "mean_token_accuracy": 1.0,
451
+ "num_tokens": 285204801.0,
452
+ "step": 46607
453
+ },
454
+ {
455
+ "epoch": 19.0,
456
+ "eval_entropy": 0.5423762339812058,
457
+ "eval_loss": 0.885791540145874,
458
+ "eval_mean_token_accuracy": 0.9358974374257601,
459
+ "eval_num_gold": 26,
460
+ "eval_num_guess": 26,
461
+ "eval_num_tokens": 285204801.0,
462
+ "eval_recall": 0.8462,
463
+ "eval_runtime": 3.653,
464
+ "eval_samples_per_second": 7.118,
465
+ "eval_steps_per_second": 3.559,
466
+ "step": 46607
467
+ },
468
+ {
469
+ "entropy": 0.6311312203036976,
470
+ "epoch": 20.0,
471
+ "grad_norm": 0.0004634857177734375,
472
+ "learning_rate": 1.8557031184332184e-05,
473
+ "loss": 0.0,
474
+ "mean_token_accuracy": 1.0,
475
+ "num_tokens": 300215580.0,
476
+ "step": 49060
477
+ },
478
+ {
479
+ "epoch": 20.0,
480
+ "eval_entropy": 0.5424229686076825,
481
+ "eval_loss": 0.8889456987380981,
482
+ "eval_mean_token_accuracy": 0.9358974374257601,
483
+ "eval_num_gold": 26,
484
+ "eval_num_guess": 26,
485
+ "eval_num_tokens": 300215580.0,
486
+ "eval_recall": 0.8462,
487
+ "eval_runtime": 3.651,
488
+ "eval_samples_per_second": 7.121,
489
+ "eval_steps_per_second": 3.561,
490
+ "step": 49060
491
+ },
492
+ {
493
+ "entropy": 0.631198678741249,
494
+ "epoch": 21.0,
495
+ "grad_norm": 0.00031280517578125,
496
+ "learning_rate": 1.793847188366815e-05,
497
+ "loss": 0.0,
498
+ "mean_token_accuracy": 1.0,
499
+ "num_tokens": 315226359.0,
500
+ "step": 51513
501
+ },
502
+ {
503
+ "epoch": 21.0,
504
+ "eval_entropy": 0.5428222968028142,
505
+ "eval_loss": 0.8843169808387756,
506
+ "eval_mean_token_accuracy": 0.9358974374257601,
507
+ "eval_num_gold": 26,
508
+ "eval_num_guess": 26,
509
+ "eval_num_tokens": 315226359.0,
510
+ "eval_recall": 0.8462,
511
+ "eval_runtime": 3.6619,
512
+ "eval_samples_per_second": 7.1,
513
+ "eval_steps_per_second": 3.55,
514
+ "step": 51513
515
+ },
516
+ {
517
+ "entropy": 0.6313406728478388,
518
+ "epoch": 22.0,
519
+ "grad_norm": 0.000759124755859375,
520
+ "learning_rate": 1.731991258300412e-05,
521
+ "loss": 0.0,
522
+ "mean_token_accuracy": 1.0,
523
+ "num_tokens": 330237138.0,
524
+ "step": 53966
525
+ },
526
+ {
527
+ "epoch": 22.0,
528
+ "eval_entropy": 0.5427144765853882,
529
+ "eval_loss": 0.8861469030380249,
530
+ "eval_mean_token_accuracy": 0.9358974374257601,
531
+ "eval_num_gold": 26,
532
+ "eval_num_guess": 26,
533
+ "eval_num_tokens": 330237138.0,
534
+ "eval_recall": 0.8462,
535
+ "eval_runtime": 3.6544,
536
+ "eval_samples_per_second": 7.115,
537
+ "eval_steps_per_second": 3.557,
538
+ "step": 53966
539
+ },
540
+ {
541
+ "entropy": 0.6313331465647263,
542
+ "epoch": 23.0,
543
+ "grad_norm": 0.00051116943359375,
544
+ "learning_rate": 1.6701353282340083e-05,
545
+ "loss": 0.0,
546
+ "mean_token_accuracy": 1.0,
547
+ "num_tokens": 345247917.0,
548
+ "step": 56419
549
+ },
550
+ {
551
+ "epoch": 23.0,
552
+ "eval_entropy": 0.5423137545585632,
553
+ "eval_loss": 0.8892049193382263,
554
+ "eval_mean_token_accuracy": 0.9358974374257601,
555
+ "eval_num_gold": 26,
556
+ "eval_num_guess": 26,
557
+ "eval_num_tokens": 345247917.0,
558
+ "eval_recall": 0.8462,
559
+ "eval_runtime": 3.6537,
560
+ "eval_samples_per_second": 7.116,
561
+ "eval_steps_per_second": 3.558,
562
+ "step": 56419
563
+ },
564
+ {
565
+ "entropy": 0.6310314053401527,
566
+ "epoch": 24.0,
567
+ "grad_norm": 3.600120544433594e-05,
568
+ "learning_rate": 1.6082793981676053e-05,
569
+ "loss": 0.0,
570
+ "mean_token_accuracy": 1.0,
571
+ "num_tokens": 360258696.0,
572
+ "step": 58872
573
+ },
574
+ {
575
+ "epoch": 24.0,
576
+ "eval_entropy": 0.5423843631377587,
577
+ "eval_loss": 0.8886714577674866,
578
+ "eval_mean_token_accuracy": 0.9358974374257601,
579
+ "eval_num_gold": 26,
580
+ "eval_num_guess": 26,
581
+ "eval_num_tokens": 360258696.0,
582
+ "eval_recall": 0.8462,
583
+ "eval_runtime": 3.6316,
584
+ "eval_samples_per_second": 7.159,
585
+ "eval_steps_per_second": 3.58,
586
+ "step": 58872
587
+ },
588
+ {
589
+ "entropy": 0.6315073234496484,
590
+ "epoch": 25.0,
591
+ "grad_norm": 7.82012939453125e-05,
592
+ "learning_rate": 1.546423468101202e-05,
593
+ "loss": 0.0,
594
+ "mean_token_accuracy": 1.0,
595
+ "num_tokens": 375269475.0,
596
+ "step": 61325
597
+ },
598
+ {
599
+ "epoch": 25.0,
600
+ "eval_entropy": 0.5420686419193561,
601
+ "eval_loss": 0.8865240812301636,
602
+ "eval_mean_token_accuracy": 0.9358974374257601,
603
+ "eval_num_gold": 26,
604
+ "eval_num_guess": 26,
605
+ "eval_num_tokens": 375269475.0,
606
+ "eval_recall": 0.8462,
607
+ "eval_runtime": 3.613,
608
+ "eval_samples_per_second": 7.196,
609
+ "eval_steps_per_second": 3.598,
610
+ "step": 61325
611
+ },
612
+ {
613
+ "entropy": 0.632054461467718,
614
+ "epoch": 26.0,
615
+ "grad_norm": 0.00024318695068359375,
616
+ "learning_rate": 1.4845675380347987e-05,
617
+ "loss": 0.0,
618
+ "mean_token_accuracy": 1.0,
619
+ "num_tokens": 15010779.0,
620
+ "step": 63778
621
+ },
622
+ {
623
+ "epoch": 26.0,
624
+ "eval_entropy": 0.5426568893285898,
625
+ "eval_loss": 0.88667893409729,
626
+ "eval_mean_token_accuracy": 0.9358974374257601,
627
+ "eval_num_gold": 26,
628
+ "eval_num_guess": 26,
629
+ "eval_num_tokens": 15010779.0,
630
+ "eval_recall": 0.8462,
631
+ "eval_runtime": 3.647,
632
+ "eval_samples_per_second": 7.129,
633
+ "eval_steps_per_second": 3.565,
634
+ "step": 63778
635
+ },
636
+ {
637
+ "entropy": 0.6314872418356777,
638
+ "epoch": 27.0,
639
+ "grad_norm": 0.00011396408081054688,
640
+ "learning_rate": 1.4227116079683954e-05,
641
+ "loss": 0.0,
642
+ "mean_token_accuracy": 1.0,
643
+ "num_tokens": 30021558.0,
644
+ "step": 66231
645
+ },
646
+ {
647
+ "epoch": 27.0,
648
+ "eval_entropy": 0.5423887417866633,
649
+ "eval_loss": 0.8907365798950195,
650
+ "eval_mean_token_accuracy": 0.9358974374257601,
651
+ "eval_num_gold": 26,
652
+ "eval_num_guess": 26,
653
+ "eval_num_tokens": 30021558.0,
654
+ "eval_recall": 0.8462,
655
+ "eval_runtime": 3.6242,
656
+ "eval_samples_per_second": 7.174,
657
+ "eval_steps_per_second": 3.587,
658
+ "step": 66231
659
+ },
660
+ {
661
+ "entropy": 0.6317801613055392,
662
+ "epoch": 28.0,
663
+ "grad_norm": 8.392333984375e-05,
664
+ "learning_rate": 1.3608556779019922e-05,
665
+ "loss": 0.0,
666
+ "mean_token_accuracy": 1.0,
667
+ "num_tokens": 45032337.0,
668
+ "step": 68684
669
+ },
670
+ {
671
+ "epoch": 28.0,
672
+ "eval_entropy": 0.5428364735383254,
673
+ "eval_loss": 0.885719358921051,
674
+ "eval_mean_token_accuracy": 0.9358974374257601,
675
+ "eval_num_gold": 26,
676
+ "eval_num_guess": 26,
677
+ "eval_num_tokens": 45032337.0,
678
+ "eval_recall": 0.8462,
679
+ "eval_runtime": 3.6828,
680
+ "eval_samples_per_second": 7.06,
681
+ "eval_steps_per_second": 3.53,
682
+ "step": 68684
683
+ },
684
+ {
685
+ "entropy": 0.6310389586555389,
686
+ "epoch": 29.0,
687
+ "grad_norm": 0.000774383544921875,
688
+ "learning_rate": 1.2989997478355888e-05,
689
+ "loss": 0.0,
690
+ "mean_token_accuracy": 1.0,
691
+ "num_tokens": 60043116.0,
692
+ "step": 71137
693
+ },
694
+ {
695
+ "epoch": 29.0,
696
+ "eval_entropy": 0.5424722524789664,
697
+ "eval_loss": 0.8864960074424744,
698
+ "eval_mean_token_accuracy": 0.9358974374257601,
699
+ "eval_num_gold": 26,
700
+ "eval_num_guess": 26,
701
+ "eval_num_tokens": 60043116.0,
702
+ "eval_recall": 0.8462,
703
+ "eval_runtime": 3.6359,
704
+ "eval_samples_per_second": 7.151,
705
+ "eval_steps_per_second": 3.576,
706
+ "step": 71137
707
+ },
708
+ {
709
+ "entropy": 0.6310345640461444,
710
+ "epoch": 30.0,
711
+ "grad_norm": 3.5762786865234375e-05,
712
+ "learning_rate": 1.2371438177691856e-05,
713
+ "loss": 0.0,
714
+ "mean_token_accuracy": 1.0,
715
+ "num_tokens": 75053895.0,
716
+ "step": 73590
717
+ },
718
+ {
719
+ "epoch": 30.0,
720
+ "eval_entropy": 0.5427528161268967,
721
+ "eval_loss": 0.8871183395385742,
722
+ "eval_mean_token_accuracy": 0.9358974374257601,
723
+ "eval_num_gold": 26,
724
+ "eval_num_guess": 26,
725
+ "eval_num_tokens": 75053895.0,
726
+ "eval_recall": 0.8462,
727
+ "eval_runtime": 3.6648,
728
+ "eval_samples_per_second": 7.095,
729
+ "eval_steps_per_second": 3.547,
730
+ "step": 73590
731
+ },
732
+ {
733
+ "entropy": 0.6307261824680745,
734
+ "epoch": 31.0,
735
+ "grad_norm": 0.00015163421630859375,
736
+ "learning_rate": 1.1752878877027823e-05,
737
+ "loss": 0.0,
738
+ "mean_token_accuracy": 1.0,
739
+ "num_tokens": 90064674.0,
740
+ "step": 76043
741
+ },
742
+ {
743
+ "epoch": 31.0,
744
+ "eval_entropy": 0.5423439878683823,
745
+ "eval_loss": 0.890313982963562,
746
+ "eval_mean_token_accuracy": 0.9358974374257601,
747
+ "eval_num_gold": 26,
748
+ "eval_num_guess": 26,
749
+ "eval_num_tokens": 90064674.0,
750
+ "eval_recall": 0.8462,
751
+ "eval_runtime": 3.6589,
752
+ "eval_samples_per_second": 7.106,
753
+ "eval_steps_per_second": 3.553,
754
+ "step": 76043
755
+ },
756
+ {
757
+ "entropy": 0.6317850742056279,
758
+ "epoch": 32.0,
759
+ "grad_norm": 0.0005035400390625,
760
+ "learning_rate": 1.113431957636379e-05,
761
+ "loss": 0.0,
762
+ "mean_token_accuracy": 1.0,
763
+ "num_tokens": 105075453.0,
764
+ "step": 78496
765
+ },
766
+ {
767
+ "epoch": 32.0,
768
+ "eval_entropy": 0.5422184283916767,
769
+ "eval_loss": 0.8882402181625366,
770
+ "eval_mean_token_accuracy": 0.9358974374257601,
771
+ "eval_num_gold": 26,
772
+ "eval_num_guess": 26,
773
+ "eval_num_tokens": 105075453.0,
774
+ "eval_recall": 0.8462,
775
+ "eval_runtime": 3.6075,
776
+ "eval_samples_per_second": 7.207,
777
+ "eval_steps_per_second": 3.604,
778
+ "step": 78496
779
+ },
780
+ {
781
+ "entropy": 0.6315069926961121,
782
+ "epoch": 33.0,
783
+ "grad_norm": 0.0079345703125,
784
+ "learning_rate": 1.0515760275699757e-05,
785
+ "loss": 0.0,
786
+ "mean_token_accuracy": 1.0,
787
+ "num_tokens": 120086232.0,
788
+ "step": 80949
789
+ },
790
+ {
791
+ "epoch": 33.0,
792
+ "eval_entropy": 0.5428683024186355,
793
+ "eval_loss": 0.8859032988548279,
794
+ "eval_mean_token_accuracy": 0.9358974374257601,
795
+ "eval_num_gold": 26,
796
+ "eval_num_guess": 26,
797
+ "eval_num_tokens": 120086232.0,
798
+ "eval_recall": 0.8462,
799
+ "eval_runtime": 3.6537,
800
+ "eval_samples_per_second": 7.116,
801
+ "eval_steps_per_second": 3.558,
802
+ "step": 80949
803
+ },
804
+ {
805
+ "entropy": 0.6313212784246381,
806
+ "epoch": 34.0,
807
+ "grad_norm": 0.000885009765625,
808
+ "learning_rate": 9.897200975035723e-06,
809
+ "loss": 0.0,
810
+ "mean_token_accuracy": 1.0,
811
+ "num_tokens": 135097011.0,
812
+ "step": 83402
813
+ },
814
+ {
815
+ "epoch": 34.0,
816
+ "eval_entropy": 0.5425068598527175,
817
+ "eval_loss": 0.887780487537384,
818
+ "eval_mean_token_accuracy": 0.9358974374257601,
819
+ "eval_num_gold": 26,
820
+ "eval_num_guess": 26,
821
+ "eval_num_tokens": 135097011.0,
822
+ "eval_recall": 0.8462,
823
+ "eval_runtime": 3.6448,
824
+ "eval_samples_per_second": 7.133,
825
+ "eval_steps_per_second": 3.567,
826
+ "step": 83402
827
+ },
828
+ {
829
+ "entropy": 0.6308202771352254,
830
+ "epoch": 35.0,
831
+ "grad_norm": 0.00032806396484375,
832
+ "learning_rate": 9.27864167437169e-06,
833
+ "loss": 0.0,
834
+ "mean_token_accuracy": 1.0,
835
+ "num_tokens": 150107790.0,
836
+ "step": 85855
837
+ },
838
+ {
839
+ "epoch": 35.0,
840
+ "eval_entropy": 0.54246619114509,
841
+ "eval_loss": 0.8900800347328186,
842
+ "eval_mean_token_accuracy": 0.9358974374257601,
843
+ "eval_num_gold": 26,
844
+ "eval_num_guess": 26,
845
+ "eval_num_tokens": 150107790.0,
846
+ "eval_recall": 0.8462,
847
+ "eval_runtime": 3.6253,
848
+ "eval_samples_per_second": 7.172,
849
+ "eval_steps_per_second": 3.586,
850
+ "step": 85855
851
+ },
852
+ {
853
+ "entropy": 0.6310893858737767,
854
+ "epoch": 36.0,
855
+ "grad_norm": 0.00543212890625,
856
+ "learning_rate": 8.660082373707658e-06,
857
+ "loss": 0.0,
858
+ "mean_token_accuracy": 1.0,
859
+ "num_tokens": 165118569.0,
860
+ "step": 88308
861
+ },
862
+ {
863
+ "epoch": 36.0,
864
+ "eval_entropy": 0.542354785479032,
865
+ "eval_loss": 0.882867157459259,
866
+ "eval_mean_token_accuracy": 0.9358974374257601,
867
+ "eval_num_gold": 26,
868
+ "eval_num_guess": 26,
869
+ "eval_num_tokens": 165118569.0,
870
+ "eval_recall": 0.8462,
871
+ "eval_runtime": 3.6309,
872
+ "eval_samples_per_second": 7.161,
873
+ "eval_steps_per_second": 3.58,
874
+ "step": 88308
875
+ },
876
+ {
877
+ "entropy": 0.6313383878492308,
878
+ "epoch": 37.0,
879
+ "grad_norm": 0.0014495849609375,
880
+ "learning_rate": 8.041523073043624e-06,
881
+ "loss": 0.0,
882
+ "mean_token_accuracy": 1.0,
883
+ "num_tokens": 180129348.0,
884
+ "step": 90761
885
+ },
886
+ {
887
+ "epoch": 37.0,
888
+ "eval_entropy": 0.5429406670423654,
889
+ "eval_loss": 0.8894430994987488,
890
+ "eval_mean_token_accuracy": 0.9358974374257601,
891
+ "eval_num_gold": 26,
892
+ "eval_num_guess": 26,
893
+ "eval_num_tokens": 180129348.0,
894
+ "eval_recall": 0.8462,
895
+ "eval_runtime": 3.6047,
896
+ "eval_samples_per_second": 7.213,
897
+ "eval_steps_per_second": 3.606,
898
+ "step": 90761
899
+ },
900
+ {
901
+ "entropy": 0.6315074832012738,
902
+ "epoch": 38.0,
903
+ "grad_norm": 1.8477439880371094e-05,
904
+ "learning_rate": 7.422963772379592e-06,
905
+ "loss": 0.0,
906
+ "mean_token_accuracy": 1.0,
907
+ "num_tokens": 195140127.0,
908
+ "step": 93214
909
+ },
910
+ {
911
+ "epoch": 38.0,
912
+ "eval_entropy": 0.5428708929281968,
913
+ "eval_loss": 0.8853751420974731,
914
+ "eval_mean_token_accuracy": 0.9358974374257601,
915
+ "eval_num_gold": 26,
916
+ "eval_num_guess": 26,
917
+ "eval_num_tokens": 195140127.0,
918
+ "eval_recall": 0.8462,
919
+ "eval_runtime": 3.6095,
920
+ "eval_samples_per_second": 7.203,
921
+ "eval_steps_per_second": 3.602,
922
+ "step": 93214
923
+ },
924
+ {
925
+ "entropy": 0.6316086658156264,
926
+ "epoch": 39.0,
927
+ "grad_norm": 0.0019378662109375,
928
+ "learning_rate": 6.804404471715559e-06,
929
+ "loss": 0.0,
930
+ "mean_token_accuracy": 1.0,
931
+ "num_tokens": 210150906.0,
932
+ "step": 95667
933
+ },
934
+ {
935
+ "epoch": 39.0,
936
+ "eval_entropy": 0.5423155472828791,
937
+ "eval_loss": 0.8865050673484802,
938
+ "eval_mean_token_accuracy": 0.9358974374257601,
939
+ "eval_num_gold": 26,
940
+ "eval_num_guess": 26,
941
+ "eval_num_tokens": 210150906.0,
942
+ "eval_recall": 0.8462,
943
+ "eval_runtime": 3.6105,
944
+ "eval_samples_per_second": 7.201,
945
+ "eval_steps_per_second": 3.601,
946
+ "step": 95667
947
+ },
948
+ {
949
+ "entropy": 0.6319762418161253,
950
+ "epoch": 40.0,
951
+ "grad_norm": 0.0076904296875,
952
+ "learning_rate": 6.185845171051526e-06,
953
+ "loss": 0.0,
954
+ "mean_token_accuracy": 1.0,
955
+ "num_tokens": 225161685.0,
956
+ "step": 98120
957
+ },
958
+ {
959
+ "epoch": 40.0,
960
+ "eval_entropy": 0.5423448315033546,
961
+ "eval_loss": 0.887237012386322,
962
+ "eval_mean_token_accuracy": 0.9358974374257601,
963
+ "eval_num_gold": 26,
964
+ "eval_num_guess": 26,
965
+ "eval_num_tokens": 225161685.0,
966
+ "eval_recall": 0.8462,
967
+ "eval_runtime": 3.6062,
968
+ "eval_samples_per_second": 7.21,
969
+ "eval_steps_per_second": 3.605,
970
+ "step": 98120
971
+ },
972
+ {
973
+ "entropy": 0.6316094772090632,
974
+ "epoch": 41.0,
975
+ "grad_norm": 0.00040435791015625,
976
+ "learning_rate": 5.567285870387493e-06,
977
+ "loss": 0.0,
978
+ "mean_token_accuracy": 1.0,
979
+ "num_tokens": 240172464.0,
980
+ "step": 100573
981
+ },
982
+ {
983
+ "epoch": 41.0,
984
+ "eval_entropy": 0.5424330555475675,
985
+ "eval_loss": 0.8862788081169128,
986
+ "eval_mean_token_accuracy": 0.9358974374257601,
987
+ "eval_num_gold": 26,
988
+ "eval_num_guess": 26,
989
+ "eval_num_tokens": 240172464.0,
990
+ "eval_recall": 0.8462,
991
+ "eval_runtime": 3.6042,
992
+ "eval_samples_per_second": 7.214,
993
+ "eval_steps_per_second": 3.607,
994
+ "step": 100573
995
+ },
996
+ {
997
+ "entropy": 0.6310035889118581,
998
+ "epoch": 42.0,
999
+ "grad_norm": 0.0020294189453125,
1000
+ "learning_rate": 4.94872656972346e-06,
1001
+ "loss": 0.0,
1002
+ "mean_token_accuracy": 1.0,
1003
+ "num_tokens": 255183243.0,
1004
+ "step": 103026
1005
+ },
1006
+ {
1007
+ "epoch": 42.0,
1008
+ "eval_entropy": 0.5431472292313209,
1009
+ "eval_loss": 0.890018105506897,
1010
+ "eval_mean_token_accuracy": 0.9358974374257601,
1011
+ "eval_num_gold": 26,
1012
+ "eval_num_guess": 26,
1013
+ "eval_num_tokens": 255183243.0,
1014
+ "eval_recall": 0.8462,
1015
+ "eval_runtime": 3.6041,
1016
+ "eval_samples_per_second": 7.214,
1017
+ "eval_steps_per_second": 3.607,
1018
+ "step": 103026
1019
+ },
1020
+ {
1021
+ "entropy": 0.6312229550229838,
1022
+ "epoch": 43.0,
1023
+ "grad_norm": 0.0012969970703125,
1024
+ "learning_rate": 4.330167269059427e-06,
1025
+ "loss": 0.0,
1026
+ "mean_token_accuracy": 1.0,
1027
+ "num_tokens": 270194022.0,
1028
+ "step": 105479
1029
+ },
1030
+ {
1031
+ "epoch": 43.0,
1032
+ "eval_entropy": 0.5424636235603919,
1033
+ "eval_loss": 0.8868480324745178,
1034
+ "eval_mean_token_accuracy": 0.9358974374257601,
1035
+ "eval_num_gold": 26,
1036
+ "eval_num_guess": 26,
1037
+ "eval_num_tokens": 270194022.0,
1038
+ "eval_recall": 0.8462,
1039
+ "eval_runtime": 3.606,
1040
+ "eval_samples_per_second": 7.21,
1041
+ "eval_steps_per_second": 3.605,
1042
+ "step": 105479
1043
+ },
1044
+ {
1045
+ "entropy": 0.631434175660063,
1046
+ "epoch": 44.0,
1047
+ "grad_norm": 7.390975952148438e-05,
1048
+ "learning_rate": 3.711607968395394e-06,
1049
+ "loss": 0.0,
1050
+ "mean_token_accuracy": 1.0,
1051
+ "num_tokens": 285204801.0,
1052
+ "step": 107932
1053
+ },
1054
+ {
1055
+ "epoch": 44.0,
1056
+ "eval_entropy": 0.5421680899766775,
1057
+ "eval_loss": 0.8860384821891785,
1058
+ "eval_mean_token_accuracy": 0.9358974374257601,
1059
+ "eval_num_gold": 26,
1060
+ "eval_num_guess": 26,
1061
+ "eval_num_tokens": 285204801.0,
1062
+ "eval_recall": 0.8462,
1063
+ "eval_runtime": 3.6344,
1064
+ "eval_samples_per_second": 7.154,
1065
+ "eval_steps_per_second": 3.577,
1066
+ "step": 107932
1067
+ },
1068
+ {
1069
+ "entropy": 0.6307510763127319,
1070
+ "epoch": 45.0,
1071
+ "grad_norm": 0.00927734375,
1072
+ "learning_rate": 3.0930486677313608e-06,
1073
+ "loss": 0.0,
1074
+ "mean_token_accuracy": 1.0,
1075
+ "num_tokens": 300215580.0,
1076
+ "step": 110385
1077
+ },
1078
+ {
1079
+ "epoch": 45.0,
1080
+ "eval_entropy": 0.54229736328125,
1081
+ "eval_loss": 0.8853968977928162,
1082
+ "eval_mean_token_accuracy": 0.9358974374257601,
1083
+ "eval_num_gold": 26,
1084
+ "eval_num_guess": 26,
1085
+ "eval_num_tokens": 300215580.0,
1086
+ "eval_recall": 0.8462,
1087
+ "eval_runtime": 3.61,
1088
+ "eval_samples_per_second": 7.202,
1089
+ "eval_steps_per_second": 3.601,
1090
+ "step": 110385
1091
+ },
1092
+ {
1093
+ "entropy": 0.6315490893937595,
1094
+ "epoch": 46.0,
1095
+ "grad_norm": 0.0001239776611328125,
1096
+ "learning_rate": 2.474489367067328e-06,
1097
+ "loss": 0.0,
1098
+ "mean_token_accuracy": 1.0,
1099
+ "num_tokens": 315226359.0,
1100
+ "step": 112838
1101
+ },
1102
+ {
1103
+ "epoch": 46.0,
1104
+ "eval_entropy": 0.5422170620698196,
1105
+ "eval_loss": 0.8882192373275757,
1106
+ "eval_mean_token_accuracy": 0.9358974374257601,
1107
+ "eval_num_gold": 26,
1108
+ "eval_num_guess": 26,
1109
+ "eval_num_tokens": 315226359.0,
1110
+ "eval_recall": 0.8462,
1111
+ "eval_runtime": 3.7084,
1112
+ "eval_samples_per_second": 7.011,
1113
+ "eval_steps_per_second": 3.506,
1114
+ "step": 112838
1115
+ },
1116
+ {
1117
+ "entropy": 0.6317317981380761,
1118
+ "epoch": 47.0,
1119
+ "grad_norm": 3.3855438232421875e-05,
1120
+ "learning_rate": 1.855930066403295e-06,
1121
+ "loss": 0.0,
1122
+ "mean_token_accuracy": 1.0,
1123
+ "num_tokens": 330237138.0,
1124
+ "step": 115291
1125
+ },
1126
+ {
1127
+ "epoch": 47.0,
1128
+ "eval_entropy": 0.5427549022894639,
1129
+ "eval_loss": 0.8879793882369995,
1130
+ "eval_mean_token_accuracy": 0.9358974374257601,
1131
+ "eval_num_gold": 26,
1132
+ "eval_num_guess": 26,
1133
+ "eval_num_tokens": 330237138.0,
1134
+ "eval_recall": 0.8462,
1135
+ "eval_runtime": 3.6923,
1136
+ "eval_samples_per_second": 7.042,
1137
+ "eval_steps_per_second": 3.521,
1138
+ "step": 115291
1139
+ },
1140
+ {
1141
+ "entropy": 0.6314135375092869,
1142
+ "epoch": 48.0,
1143
+ "grad_norm": 0.0025634765625,
1144
+ "learning_rate": 1.2373707657392621e-06,
1145
+ "loss": 0.0,
1146
+ "mean_token_accuracy": 1.0,
1147
+ "num_tokens": 345247917.0,
1148
+ "step": 117744
1149
+ },
1150
+ {
1151
+ "epoch": 48.0,
1152
+ "eval_entropy": 0.5423269546948947,
1153
+ "eval_loss": 0.887828528881073,
1154
+ "eval_mean_token_accuracy": 0.9358974374257601,
1155
+ "eval_num_gold": 26,
1156
+ "eval_num_guess": 26,
1157
+ "eval_num_tokens": 345247917.0,
1158
+ "eval_recall": 0.8462,
1159
+ "eval_runtime": 3.661,
1160
+ "eval_samples_per_second": 7.102,
1161
+ "eval_steps_per_second": 3.551,
1162
+ "step": 117744
1163
+ },
1164
+ {
1165
+ "entropy": 0.6317788491650499,
1166
+ "epoch": 49.0,
1167
+ "grad_norm": 0.0015106201171875,
1168
+ "learning_rate": 6.18811465075229e-07,
1169
+ "loss": 0.0,
1170
+ "mean_token_accuracy": 1.0,
1171
+ "num_tokens": 360258696.0,
1172
+ "step": 120197
1173
+ },
1174
+ {
1175
+ "epoch": 49.0,
1176
+ "eval_entropy": 0.5421000031324533,
1177
+ "eval_loss": 0.886226236820221,
1178
+ "eval_mean_token_accuracy": 0.9358974374257601,
1179
+ "eval_num_gold": 26,
1180
+ "eval_num_guess": 26,
1181
+ "eval_num_tokens": 360258696.0,
1182
+ "eval_recall": 0.8462,
1183
+ "eval_runtime": 3.8724,
1184
+ "eval_samples_per_second": 6.714,
1185
+ "eval_steps_per_second": 3.357,
1186
+ "step": 120197
1187
+ },
1188
+ {
1189
+ "entropy": 0.6307675256881722,
1190
+ "epoch": 50.0,
1191
+ "grad_norm": 0.0003414154052734375,
1192
+ "learning_rate": 2.5216441119609984e-10,
1193
+ "loss": 0.0,
1194
+ "mean_token_accuracy": 1.0,
1195
+ "num_tokens": 375269475.0,
1196
+ "step": 122650
1197
+ },
1198
+ {
1199
+ "epoch": 50.0,
1200
+ "eval_entropy": 0.5427401478473957,
1201
+ "eval_loss": 0.888108491897583,
1202
+ "eval_mean_token_accuracy": 0.9358974374257601,
1203
+ "eval_num_gold": 26,
1204
+ "eval_num_guess": 26,
1205
+ "eval_num_tokens": 375269475.0,
1206
+ "eval_recall": 0.8462,
1207
+ "eval_runtime": 3.7116,
1208
+ "eval_samples_per_second": 7.005,
1209
+ "eval_steps_per_second": 3.503,
1210
+ "step": 122650
1211
+ }
1212
+ ],
1213
+ "logging_steps": 0,
1214
+ "max_steps": 122650,
1215
+ "num_input_tokens_seen": 0,
1216
+ "num_train_epochs": 50,
1217
+ "save_steps": 0,
1218
+ "stateful_callbacks": {
1219
+ "TrainerControl": {
1220
+ "args": {
1221
+ "should_epoch_stop": false,
1222
+ "should_evaluate": false,
1223
+ "should_log": false,
1224
+ "should_save": true,
1225
+ "should_training_stop": true
1226
+ },
1227
+ "attributes": {}
1228
+ }
1229
+ },
1230
+ "total_flos": 3.3796448253168845e+19,
1231
+ "train_batch_size": 2,
1232
+ "trial_name": null,
1233
+ "trial_params": null
1234
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18eb0c3939b0bb4035391490d7998e62734714a4aadf05a7ddf2f612a76980ce
3
+ size 6289