Instructions to use AnonymousARR42/LongBEL_8B_QUAERO_EMEA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AnonymousARR42/LongBEL_8B_QUAERO_EMEA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AnonymousARR42/LongBEL_8B_QUAERO_EMEA", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("AnonymousARR42/LongBEL_8B_QUAERO_EMEA", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AnonymousARR42/LongBEL_8B_QUAERO_EMEA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AnonymousARR42/LongBEL_8B_QUAERO_EMEA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AnonymousARR42/LongBEL_8B_QUAERO_EMEA

SGLang

How to use AnonymousARR42/LongBEL_8B_QUAERO_EMEA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AnonymousARR42/LongBEL_8B_QUAERO_EMEA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AnonymousARR42/LongBEL_8B_QUAERO_EMEA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AnonymousARR42/LongBEL_8B_QUAERO_EMEA with Docker Model Runner:
```
docker model run hf.co/AnonymousARR42/LongBEL_8B_QUAERO_EMEA
```

AnonymousARR42 commited on May 6

Commit

772467b

verified ·

1 Parent(s): ef640a2

Upload folder using huggingface_hub

Browse files

Files changed (26) hide show

.gitattributes +2 -0
.ipynb_checkpoints/README-checkpoint.md +343 -0
.ipynb_checkpoints/config-checkpoint.json +40 -0
.ipynb_checkpoints/trainer_state-checkpoint.json +1234 -0
LICENSE +114 -0
README.md +343 -0
__init__.py +4 -0
candidate_trie.pkl +3 -0
chat_template.jinja +5 -0
config.json +40 -0
generation_config.json +14 -0
longbel.py +981 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +299 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +60 -0
text_to_code.json +3 -0
tokenizer.json +3 -0
tokenizer_config.json +2110 -0
trainer_state.json +1234 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+text_to_code.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,343 @@

+---
+license: llama3.1
+base_model:
+  - meta-llama/Llama-3.1-8B-Instruct
+language:
+  - fr
+tags:
+  - biomedical-entity-linking
+  - entity-linking
+  - entity-disambiguation
+  - named-entity-linking
+  - biomedical
+  - healthcare
+  - umls
+  - quaero
+  - text-generation
+  - constrained-decoding
+  - causal-lm
+  - llm
+library_name: transformers
+pipeline_tag: text-generation
+datasets:
+  - bigbio/quaero
+finetuning_task:
+  - entity-linking
+metrics:
+  - recall
+model-index:
+  - name: LongBEL-8B-QUAERO-EMEA
+    results:
+      - task:
+          type: entity-linking
+          name: Biomedical Entity Linking
+        dataset:
+          type: bigbio/quaero
+          name: QUAERO-EMEA
+          config: quaero_emea_bigbio_kb
+        metrics:
+          - type: recall
+            name: Recall@1
+            value: 0.754
+---
+# LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
+## LongBEL
+**LongBEL** is a novel document-level framework for biomedical entity linking (BEL). Instead of normalizing each mention independently, LongBEL conditions each prediction on the document context and on previous normalizations produced in the same document. This design enforces document-level consistency and is enhanced by our **robust memory** mechanism. The method is introduced in our paper, currently under review.
+## LongBEL (QUAERO-EMEA Edition)
+This is a **finetuned version of LLaMA-3-8B** trained on **QUAERO-EMEA**, applying the LongBEL framework to enable long context and robust memory predictions.
+| Field | Value |
+|---|---|
+| Base model | `meta-llama/Llama-3.1-8B-Instruct` |
+| Task | Biomedical Entity Linking |
+| Dataset | QUAERO-EMEA |
+| Knowledge base | UMLS 2014AA |
+| Input | BigBio-like documents with mention spans and semantic groups |
+| Output | Ranked UMLS concept predictions |
+| Decoding | Semantic-guided constrained decoding |
+| Main metric | Recall@1 |
+## Intended Use
+This model is intended for research on biomedical entity linking and document-level consistency.
+It assumes that mention spans and semantic groups are already provided. It does **not** perform named entity recognition. In a full pipeline, a NER model should first detect mentions and assign semantic groups, then LongBEL can normalize these mentions to UMLS concepts.
+## Usage
+### Loading the model
+```python
+import torch
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained(
+    "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+````
+### Inference example
+The model expects BigBio-like documents. Each entity should include a mention text, character offsets, and a semantic group in the `type` field.
+```python
+num_beams = 5
+bigbio_pages = [
+    {
+        "id": "001",
+        "document_id": "doc_001",
+        "passages": [
+            {
+                "id": "0",
+                "type": "paragraph",
+                "text": [
+                    "A 29-year-old pregnant woman presented with severe-range hypertension, "
+                    "headache, and epigastric pain. Laboratory testing showed proteinuria "
+                    "and mildly elevated liver enzymes. She was admitted overnight with "
+                    "suspected PET and was started on urgent treatment."
+                ],
+                "offsets": [[0, 257]],
+            }
+        ],
+        "entities": [
+            {
+                "id": "T1",
+                "type": "Living Beings",
+                "text": ["pregnant woman"],
+                "offsets": [[14, 28]],
+            },
+            {
+                "id": "T2",
+                "type": "Disorders",
+                "text": ["severe-range hypertension"],
+                "offsets": [[44, 69]],
+            },
+            {
+                "id": "T3",
+                "type": "Disorders",
+                "text": ["proteinuria"],
+                "offsets": [[128, 139]],
+            },
+            {
+                "id": "T4",
+                "type": "Disorders",
+                "text": ["PET"],
+                "offsets": [[217, 220]],
+            },
+        ],
+        "events": [],
+        "coreferences": [],
+        "relations": [],
+    }
+]
+predictions = model.sample(
+    bigbio_pages=bigbio_pages,
+    num_beams=num_beams,
+)
+for i in range(0, len(predictions), num_beams):
+    mention = predictions[i]["mention"]
+    print(f"## Mention {(i // num_beams) + 1}: {mention}")
+    for j in range(num_beams):
+        pred = predictions[i + j]
+        print(
+            f"   - Beam {j + 1}:\n"
+            f"     Predicted concept name: {pred['pred_concept_name']}\n"
+            f"     Predicted code: {pred['pred_concept_code']}\n"
+            f"     Beam score: {pred['beam_score']:.3f}\n"
+        )
+```
+**Example Output:**
+```text
+## Mention 1: pregnant woman
+   - Beam 1:
+   - Predicted concept name:Pregnant Woman
+   - Predicted code: C0033011
+   - Beam score: 1.000
+   - Beam 2:
+   - Predicted concept name:Pregnant woman
+   - Predicted code: C0033011
+   - Beam score: 0.003
+   - Beam 3:
+   - Predicted concept name:Pregnant woman (person)
+   - Predicted code: C0033011
+   - Beam score: 0.001
+   - Beam 4:
+   - Predicted concept name:Pregnancy Partner
+   - Predicted code: C3538996
+   - Beam score: 0.000
+   - Beam 5:
+   - Predicted concept name:Pregnant woman (person)
+   - Predicted code: C0033011
+   - Beam score: 0.000
+## Mention 2: severe-range hypertension
+   - Beam 1:
+   - Predicted concept name:Hypertensive disease
+   - Predicted code: C0020538
+   - Beam score: 0.078
+   - Beam 2:
+   - Predicted concept name:Hypertension (in some patients)
+   - Predicted code: C3280936
+   - Beam score: 0.022
+   - Beam 3:
+   - Predicted concept name:Hypertensive disease (disorder)
+   - Predicted code: C0020538
+   - Beam score: 0.010
+   - Beam 4:
+   - Predicted concept name:Hypertension, severe
+   - Predicted code: C4013784
+   - Beam score: 0.010
+   - Beam 5:
+   - Predicted concept name:Hypertension (patient A)
+   - Predicted code: C4313262
+   - Beam score: 0.004
+## Mention 3: proteinuria
+   - Beam 1:
+   - Predicted concept name:Proteinurias
+   - Predicted code: C0033687
+   - Beam score: 1.000
+   - Beam 2:
+   - Predicted concept name:Proteinuric diabetic nephropathy (disorder)
+   - Predicted code: C0403519
+   - Beam score: 0.003
+   - Beam 3:
+   - Predicted concept name:Proteinuria
+   - Predicted code: C0033687
+   - Beam score: 0.003
+   - Beam 4:
+   - Predicted concept name:Proteinuric diabetic nephropathy
+   - Predicted code: C0403519
+   - Beam score: 0.002
+   - Beam 5:
+   - Predicted concept name:Proteinuric hypertension of pregnancy (disorder)
+   - Predicted code: C0032914
+   - Beam score: 0.001
+## Mention 4: PET
+   - Beam 1:
+   - Predicted concept name:PET - Pre-eclamptic toxemia
+   - Predicted code: C0032914
+   - Beam score: 0.075
+   - Beam 2:
+   - Predicted concept name:PET - Pre-eclamptic toxaemia
+   - Predicted code: C0032914
+   - Beam score: 0.039
+   - Beam 3:
+   - Predicted concept name:Preeclamptic toxemia
+   - Predicted code: C2931877
+   - Beam score: 0.027
+   - Beam 4:
+   - Predicted concept name:Preeclampsia
+   - Predicted code: C0032914
+   - Beam score: 0.023
+   - Beam 5:
+   - Predicted concept name:Preeclampsia with Severe Features
+   - Predicted code: C0341950
+   - Beam score: 0.019
+```
+## Evaluation
+Entity linking performance is reported using Recall@1 with bootstrap confidence intervals. The best result is shown in **bold**, and the second-best result is <u>underlined</u>.
+| Model | MM-ST21PV<br>(English) | QUAERO-EMEA<br>(French) | SympTEMIST<br>(Spanish) | DisTEMIST<br>(Spanish) | MedProcNER<br>(Spanish) |
+| :--- | :---: | :---: | :---: | :---: | :---: |
+| **Context-Free BEL** ||||| |
+| SciSpacy | 53.8 ± 1.0 | 37.1 ± 4.3 | 9.8 ± 1.3 | 21.1 ± 1.9 | 10.3 ± 1.2 |
+| SapBERT | 65.6 ± 1.0 | 59.7 ± 3.8 | 34.2 ± 2.0 | 38.6 ± 2.6 | 30.4 ± 2.1 |
+| CODER-all | 62.9 ± 1.1 | 66.9 ± 4.0 | 42.2 ± 2.2 | 47.0 ± 2.6 | 42.7 ± 2.1 |
+| SapBERT-all | 64.6 ± 1.1 | 67.9 ± 3.9 | 49.8 ± 2.4 | 49.6 ± 2.6 | 45.1 ± 2.2 |
+| BERGAMOT | 60.9 ± 1.1 | 63.8 ± 4.9 | 48.0 ± 2.7 | 48.9 ± 2.4 | 42.3 ± 2.2 |
+| **Local-Context BEL** ||||| |
+| ArboEL | 76.9 ± 0.9 | 63.0 ± 3.9 | 55.4 ± 2.5 | 54.7 ± 2.6 | 59.7 ± 2.6 |
+| GENRE / mBART-large | 69.6 ± 1.0 | 69.3 ± 5.4 | 59.8 ± 2.7 | 58.7 ± 2.7 | 66.0 ± 2.3 |
+| GENRE / Llama-1B | 73.1 ± 1.0 | 75.1 ± 3.6 | 60.5 ± 2.4 | 62.5 ± 2.3 | 67.4 ± 2.1 |
+| GENRE / Llama-8B | 75.0 ± 0.9 | 73.8 ± 4.0 | 61.7 ± 2.5 | 63.2 ± 2.5 | 68.3 ± 2.2 |
+| **Global-Context BEL: LongBEL** ||||| |
+| LongBEL-1B | 77.6 ± 0.9 | 74.5 ± 3.7 | 59.8 ± 2.5 | 61.9 ± 2.4 | 66.6 ± 2.1 |
+| LongBEL-1B + Ensemble | 78.6 ± 0.8 | <u>77.2 ± 3.0</u> | 61.8 ± 2.5 | <u>64.3 ± 2.2</u> | <u>69.0 ± 2.0</u> |
+| **LongBEL-8B** | <u>79.3 ± 0.8</u> | 75.4 ± 3.4 | <u>62.0 ± 2.6</u> | 63.6 ± 2.1 | <u>69.0 ± 2.1</u> |
+| LongBEL-8B + Ensemble | **80.0 ± 0.8** | **77.6 ± 3.0** | **63.3 ± 2.5** | **65.8 ± 2.2** | **71.0 ± 2.0** |
+The score reported for this checkpoint is the **single LongBEL-8B model**. The ensemble result requires fusing several LongBEL input configurations and is not produced by this checkpoint alone.
+## Speed and Memory
+Measured on a single NVIDIA H100 80GB GPU.
+| Model                   | Model memory | Candidate memory |           Speed |
+| ----------------------- | -----------: | ---------------: | --------------: |
+| GENRE-Llama-8B baseline |      28.6 GB |           5.4 GB | 38.2 mentions/s |
+| LongBEL-8B              |      28.6 GB |           5.4 GB | 15.2 mentions/s |
+LongBEL has the same model memory footprint as the sentence-level Llama-8B baseline, but it is slower because it processes longer contexts and updates document-level memory during inference.
+## Limitations
+This model assumes that mention spans and semantic groups are given. It does not perform mention detection.
+LongBEL is most useful when concepts recur within a document. When most concepts appear only once, the memory mechanism has less information to exploit.
+Because LongBEL uses previous predictions as memory, early mistakes can still influence later predictions. Robust memory training reduces this risk but does not remove it completely.
+This model is intended for research use. It should not be used for clinical decision-making without additional validation and human oversight.
+## Reproducibility
+Code and evaluation scripts are available in this [GitHub repository](https://anonymous.4open.science/r/LongBEL-31AD).
+Trained model checkpoints and processed datasets are available in the anonymous Hugging Face collection associated with LongBEL.
+<!-- ## Citation
+If you use this model, please cite the LongBEL paper.
+```bibtex
+@inproceedings{longbel2026,
+  title = {LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking},
+  author = {Anonymous},
+  booktitle = {Anonymous submission},
+  year = {2026}
+}
+``` -->

.ipynb_checkpoints/config-checkpoint.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "architectures": [
+    "LLamaLongBEL"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "dtype": "bfloat16",
+  "eos_token_id": 128009,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama_longbel",
+  "auto_map": {
+    "AutoConfig": "longbel.LLamaLongBELConfig",
+    "AutoModelForCausalLM": "longbel.LLamaLongBEL"
+  },
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pad_token_id": 128009,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "vocab_size": 128257
+}

.ipynb_checkpoints/trainer_state-checkpoint.json ADDED Viewed

	@@ -0,0 +1,1234 @@

+{
+  "best_global_step": 7359,
+  "best_metric": 0.8462,
+  "best_model_checkpoint": "models/NED/EMEA_human_only_tfidf_hybrid_long_v2_addheaders/Llama-3.1-8B-Instruct/checkpoint-7359",
+  "epoch": 50.0,
+  "eval_steps": 500,
+  "global_step": 122650,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 1.1526817805758311,
+      "epoch": 1.0,
+      "grad_norm": 304.0,
+      "learning_rate": 1.9989130434782608e-05,
+      "loss": 0.7669,
+      "mean_token_accuracy": 0.8752253057546777,
+      "num_tokens": 15010779.0,
+      "step": 2453
+    },
+    {
+      "epoch": 1.0,
+      "eval_entropy": 1.2358426589232225,
+      "eval_loss": 0.6339517831802368,
+      "eval_mean_token_accuracy": 0.8988095246828519,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 15010779.0,
+      "eval_recall": 0.7308,
+      "eval_runtime": 3.6399,
+      "eval_samples_per_second": 7.143,
+      "eval_steps_per_second": 3.571,
+      "step": 2453
+    },
+    {
+      "entropy": 1.3605892632720036,
+      "epoch": 2.0,
+      "grad_norm": 12.1875,
+      "learning_rate": 2.9691098596284776e-05,
+      "loss": 0.5437,
+      "mean_token_accuracy": 0.9150349811612466,
+      "num_tokens": 30021558.0,
+      "step": 4906
+    },
+    {
+      "epoch": 2.0,
+      "eval_entropy": 1.1509519540346587,
+      "eval_loss": 0.4853871166706085,
+      "eval_mean_token_accuracy": 0.9201437464127173,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 30021558.0,
+      "eval_recall": 0.7692,
+      "eval_runtime": 3.627,
+      "eval_samples_per_second": 7.168,
+      "eval_steps_per_second": 3.584,
+      "step": 4906
+    },
+    {
+      "entropy": 1.1862413553719222,
+      "epoch": 3.0,
+      "grad_norm": 2.1875,
+      "learning_rate": 2.9072539295620746e-05,
+      "loss": 0.2619,
+      "mean_token_accuracy": 0.9548876376794495,
+      "num_tokens": 45032337.0,
+      "step": 7359
+    },
+    {
+      "epoch": 3.0,
+      "eval_entropy": 1.019592651954064,
+      "eval_loss": 0.5770813822746277,
+      "eval_mean_token_accuracy": 0.9220362993387076,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 45032337.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6363,
+      "eval_samples_per_second": 7.15,
+      "eval_steps_per_second": 3.575,
+      "step": 7359
+    },
+    {
+      "entropy": 0.9634018300311497,
+      "epoch": 4.0,
+      "grad_norm": 0.1240234375,
+      "learning_rate": 2.8453979994956713e-05,
+      "loss": 0.1216,
+      "mean_token_accuracy": 0.9782008502466845,
+      "num_tokens": 60043116.0,
+      "step": 9812
+    },
+    {
+      "epoch": 4.0,
+      "eval_entropy": 0.8699520321992728,
+      "eval_loss": 0.5446107387542725,
+      "eval_mean_token_accuracy": 0.940018314581651,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 60043116.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6143,
+      "eval_samples_per_second": 7.194,
+      "eval_steps_per_second": 3.597,
+      "step": 9812
+    },
+    {
+      "entropy": 0.7849812429144681,
+      "epoch": 5.0,
+      "grad_norm": 0.002227783203125,
+      "learning_rate": 2.783542069429268e-05,
+      "loss": 0.0517,
+      "mean_token_accuracy": 0.9894482943411997,
+      "num_tokens": 75053895.0,
+      "step": 12265
+    },
+    {
+      "epoch": 5.0,
+      "eval_entropy": 0.6801113898937519,
+      "eval_loss": 0.7289856672286987,
+      "eval_mean_token_accuracy": 0.9444444454633273,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 75053895.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6486,
+      "eval_samples_per_second": 7.126,
+      "eval_steps_per_second": 3.563,
+      "step": 12265
+    },
+    {
+      "entropy": 0.6892432886826181,
+      "epoch": 6.0,
+      "grad_norm": 0.0004749298095703125,
+      "learning_rate": 2.721686139362865e-05,
+      "loss": 0.0209,
+      "mean_token_accuracy": 0.9958273216359138,
+      "num_tokens": 90064674.0,
+      "step": 14718
+    },
+    {
+      "epoch": 6.0,
+      "eval_entropy": 0.577189931502709,
+      "eval_loss": 0.7246649265289307,
+      "eval_mean_token_accuracy": 0.9444444454633273,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 90064674.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6456,
+      "eval_samples_per_second": 7.132,
+      "eval_steps_per_second": 3.566,
+      "step": 14718
+    },
+    {
+      "entropy": 0.6557439389371696,
+      "epoch": 7.0,
+      "grad_norm": 0.000888824462890625,
+      "learning_rate": 2.659830209296461e-05,
+      "loss": 0.0078,
+      "mean_token_accuracy": 0.9979321826393635,
+      "num_tokens": 105075453.0,
+      "step": 17171
+    },
+    {
+      "epoch": 7.0,
+      "eval_entropy": 0.5603500146132249,
+      "eval_loss": 0.8045116662979126,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 105075453.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 5.557,
+      "eval_samples_per_second": 4.679,
+      "eval_steps_per_second": 2.339,
+      "step": 17171
+    },
+    {
+      "entropy": 0.6481096161976669,
+      "epoch": 8.0,
+      "grad_norm": 8.96453857421875e-05,
+      "learning_rate": 2.597974279230058e-05,
+      "loss": 0.0028,
+      "mean_token_accuracy": 0.9993061645391568,
+      "num_tokens": 120086232.0,
+      "step": 19624
+    },
+    {
+      "epoch": 8.0,
+      "eval_entropy": 0.5650725089586698,
+      "eval_loss": 0.8335245847702026,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 120086232.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6391,
+      "eval_samples_per_second": 7.145,
+      "eval_steps_per_second": 3.572,
+      "step": 19624
+    },
+    {
+      "entropy": 0.6384989822756452,
+      "epoch": 9.0,
+      "grad_norm": 0.00102996826171875,
+      "learning_rate": 2.5361183491636548e-05,
+      "loss": 0.0011,
+      "mean_token_accuracy": 0.9997574686275129,
+      "num_tokens": 135097011.0,
+      "step": 22077
+    },
+    {
+      "epoch": 9.0,
+      "eval_entropy": 0.5437194108963013,
+      "eval_loss": 0.8720409870147705,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 135097011.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6696,
+      "eval_samples_per_second": 7.085,
+      "eval_steps_per_second": 3.543,
+      "step": 22077
+    },
+    {
+      "entropy": 0.6327040182586792,
+      "epoch": 10.0,
+      "grad_norm": 0.00011968612670898438,
+      "learning_rate": 2.4742624190972517e-05,
+      "loss": 0.0002,
+      "mean_token_accuracy": 0.9999592335818012,
+      "num_tokens": 150107790.0,
+      "step": 24530
+    },
+    {
+      "epoch": 10.0,
+      "eval_entropy": 0.5456434029799241,
+      "eval_loss": 0.8786986470222473,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 150107790.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7213,
+      "eval_samples_per_second": 6.987,
+      "eval_steps_per_second": 3.493,
+      "step": 24530
+    },
+    {
+      "entropy": 0.6342776355527636,
+      "epoch": 11.0,
+      "grad_norm": 2.9206275939941406e-05,
+      "learning_rate": 2.412406489030848e-05,
+      "loss": 0.0001,
+      "mean_token_accuracy": 0.9999629396397,
+      "num_tokens": 165118569.0,
+      "step": 26983
+    },
+    {
+      "epoch": 11.0,
+      "eval_entropy": 0.5441241906239436,
+      "eval_loss": 0.8776129484176636,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 165118569.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6285,
+      "eval_samples_per_second": 7.165,
+      "eval_steps_per_second": 3.583,
+      "step": 26983
+    },
+    {
+      "entropy": 0.6330991076222742,
+      "epoch": 12.0,
+      "grad_norm": 0.000823974609375,
+      "learning_rate": 2.350550558964445e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 180129348.0,
+      "step": 29436
+    },
+    {
+      "epoch": 12.0,
+      "eval_entropy": 0.544509245799138,
+      "eval_loss": 0.88084477186203,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 180129348.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6661,
+      "eval_samples_per_second": 7.092,
+      "eval_steps_per_second": 3.546,
+      "step": 29436
+    },
+    {
+      "entropy": 0.6322705759061291,
+      "epoch": 13.0,
+      "grad_norm": 0.010498046875,
+      "learning_rate": 2.2886946288980416e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 195140127.0,
+      "step": 31889
+    },
+    {
+      "epoch": 13.0,
+      "eval_entropy": 0.5434356606923617,
+      "eval_loss": 0.8842343091964722,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 195140127.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 4.1268,
+      "eval_samples_per_second": 6.3,
+      "eval_steps_per_second": 3.15,
+      "step": 31889
+    },
+    {
+      "entropy": 0.6316640121908612,
+      "epoch": 14.0,
+      "grad_norm": 0.0035552978515625,
+      "learning_rate": 2.2268386988316383e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 210150906.0,
+      "step": 34342
+    },
+    {
+      "epoch": 14.0,
+      "eval_entropy": 0.543243577847114,
+      "eval_loss": 0.885927140712738,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 210150906.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7188,
+      "eval_samples_per_second": 6.991,
+      "eval_steps_per_second": 3.496,
+      "step": 34342
+    },
+    {
+      "entropy": 0.6321596540070241,
+      "epoch": 15.0,
+      "grad_norm": 2.4199485778808594e-05,
+      "learning_rate": 2.164982768765235e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 225161685.0,
+      "step": 36795
+    },
+    {
+      "epoch": 15.0,
+      "eval_entropy": 0.5422769280580374,
+      "eval_loss": 0.8823052644729614,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 225161685.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6723,
+      "eval_samples_per_second": 7.08,
+      "eval_steps_per_second": 3.54,
+      "step": 36795
+    },
+    {
+      "entropy": 0.6315903761194426,
+      "epoch": 16.0,
+      "grad_norm": 0.0291748046875,
+      "learning_rate": 2.1031268386988316e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 240172464.0,
+      "step": 39248
+    },
+    {
+      "epoch": 16.0,
+      "eval_entropy": 0.5426660546889672,
+      "eval_loss": 0.8869765996932983,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 240172464.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6896,
+      "eval_samples_per_second": 7.047,
+      "eval_steps_per_second": 3.523,
+      "step": 39248
+    },
+    {
+      "entropy": 0.6317922561279472,
+      "epoch": 17.0,
+      "grad_norm": 0.0001850128173828125,
+      "learning_rate": 2.0412709086324285e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 255183243.0,
+      "step": 41701
+    },
+    {
+      "epoch": 17.0,
+      "eval_entropy": 0.542809899036701,
+      "eval_loss": 0.8864607214927673,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 255183243.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6498,
+      "eval_samples_per_second": 7.124,
+      "eval_steps_per_second": 3.562,
+      "step": 41701
+    },
+    {
+      "entropy": 0.6319634849034763,
+      "epoch": 18.0,
+      "grad_norm": 2.1457672119140625e-05,
+      "learning_rate": 1.979414978566025e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 270194022.0,
+      "step": 44154
+    },
+    {
+      "epoch": 18.0,
+      "eval_entropy": 0.5426488243616544,
+      "eval_loss": 0.8861849308013916,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 270194022.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6568,
+      "eval_samples_per_second": 7.11,
+      "eval_steps_per_second": 3.555,
+      "step": 44154
+    },
+    {
+      "entropy": 0.631338802688325,
+      "epoch": 19.0,
+      "grad_norm": 4.076957702636719e-05,
+      "learning_rate": 1.9175590484996218e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 285204801.0,
+      "step": 46607
+    },
+    {
+      "epoch": 19.0,
+      "eval_entropy": 0.5423762339812058,
+      "eval_loss": 0.885791540145874,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 285204801.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.653,
+      "eval_samples_per_second": 7.118,
+      "eval_steps_per_second": 3.559,
+      "step": 46607
+    },
+    {
+      "entropy": 0.6311312203036976,
+      "epoch": 20.0,
+      "grad_norm": 0.0004634857177734375,
+      "learning_rate": 1.8557031184332184e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 300215580.0,
+      "step": 49060
+    },
+    {
+      "epoch": 20.0,
+      "eval_entropy": 0.5424229686076825,
+      "eval_loss": 0.8889456987380981,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 300215580.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.651,
+      "eval_samples_per_second": 7.121,
+      "eval_steps_per_second": 3.561,
+      "step": 49060
+    },
+    {
+      "entropy": 0.631198678741249,
+      "epoch": 21.0,
+      "grad_norm": 0.00031280517578125,
+      "learning_rate": 1.793847188366815e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 315226359.0,
+      "step": 51513
+    },
+    {
+      "epoch": 21.0,
+      "eval_entropy": 0.5428222968028142,
+      "eval_loss": 0.8843169808387756,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 315226359.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6619,
+      "eval_samples_per_second": 7.1,
+      "eval_steps_per_second": 3.55,
+      "step": 51513
+    },
+    {
+      "entropy": 0.6313406728478388,
+      "epoch": 22.0,
+      "grad_norm": 0.000759124755859375,
+      "learning_rate": 1.731991258300412e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 330237138.0,
+      "step": 53966
+    },
+    {
+      "epoch": 22.0,
+      "eval_entropy": 0.5427144765853882,
+      "eval_loss": 0.8861469030380249,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 330237138.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6544,
+      "eval_samples_per_second": 7.115,
+      "eval_steps_per_second": 3.557,
+      "step": 53966
+    },
+    {
+      "entropy": 0.6313331465647263,
+      "epoch": 23.0,
+      "grad_norm": 0.00051116943359375,
+      "learning_rate": 1.6701353282340083e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 345247917.0,
+      "step": 56419
+    },
+    {
+      "epoch": 23.0,
+      "eval_entropy": 0.5423137545585632,
+      "eval_loss": 0.8892049193382263,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 345247917.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6537,
+      "eval_samples_per_second": 7.116,
+      "eval_steps_per_second": 3.558,
+      "step": 56419
+    },
+    {
+      "entropy": 0.6310314053401527,
+      "epoch": 24.0,
+      "grad_norm": 3.600120544433594e-05,
+      "learning_rate": 1.6082793981676053e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 360258696.0,
+      "step": 58872
+    },
+    {
+      "epoch": 24.0,
+      "eval_entropy": 0.5423843631377587,
+      "eval_loss": 0.8886714577674866,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 360258696.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6316,
+      "eval_samples_per_second": 7.159,
+      "eval_steps_per_second": 3.58,
+      "step": 58872
+    },
+    {
+      "entropy": 0.6315073234496484,
+      "epoch": 25.0,
+      "grad_norm": 7.82012939453125e-05,
+      "learning_rate": 1.546423468101202e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 375269475.0,
+      "step": 61325
+    },
+    {
+      "epoch": 25.0,
+      "eval_entropy": 0.5420686419193561,
+      "eval_loss": 0.8865240812301636,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 375269475.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.613,
+      "eval_samples_per_second": 7.196,
+      "eval_steps_per_second": 3.598,
+      "step": 61325
+    },
+    {
+      "entropy": 0.632054461467718,
+      "epoch": 26.0,
+      "grad_norm": 0.00024318695068359375,
+      "learning_rate": 1.4845675380347987e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 15010779.0,
+      "step": 63778
+    },
+    {
+      "epoch": 26.0,
+      "eval_entropy": 0.5426568893285898,
+      "eval_loss": 0.88667893409729,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 15010779.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.647,
+      "eval_samples_per_second": 7.129,
+      "eval_steps_per_second": 3.565,
+      "step": 63778
+    },
+    {
+      "entropy": 0.6314872418356777,
+      "epoch": 27.0,
+      "grad_norm": 0.00011396408081054688,
+      "learning_rate": 1.4227116079683954e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 30021558.0,
+      "step": 66231
+    },
+    {
+      "epoch": 27.0,
+      "eval_entropy": 0.5423887417866633,
+      "eval_loss": 0.8907365798950195,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 30021558.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6242,
+      "eval_samples_per_second": 7.174,
+      "eval_steps_per_second": 3.587,
+      "step": 66231
+    },
+    {
+      "entropy": 0.6317801613055392,
+      "epoch": 28.0,
+      "grad_norm": 8.392333984375e-05,
+      "learning_rate": 1.3608556779019922e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 45032337.0,
+      "step": 68684
+    },
+    {
+      "epoch": 28.0,
+      "eval_entropy": 0.5428364735383254,
+      "eval_loss": 0.885719358921051,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 45032337.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6828,
+      "eval_samples_per_second": 7.06,
+      "eval_steps_per_second": 3.53,
+      "step": 68684
+    },
+    {
+      "entropy": 0.6310389586555389,
+      "epoch": 29.0,
+      "grad_norm": 0.000774383544921875,
+      "learning_rate": 1.2989997478355888e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 60043116.0,
+      "step": 71137
+    },
+    {
+      "epoch": 29.0,
+      "eval_entropy": 0.5424722524789664,
+      "eval_loss": 0.8864960074424744,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 60043116.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6359,
+      "eval_samples_per_second": 7.151,
+      "eval_steps_per_second": 3.576,
+      "step": 71137
+    },
+    {
+      "entropy": 0.6310345640461444,
+      "epoch": 30.0,
+      "grad_norm": 3.5762786865234375e-05,
+      "learning_rate": 1.2371438177691856e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 75053895.0,
+      "step": 73590
+    },
+    {
+      "epoch": 30.0,
+      "eval_entropy": 0.5427528161268967,
+      "eval_loss": 0.8871183395385742,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 75053895.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6648,
+      "eval_samples_per_second": 7.095,
+      "eval_steps_per_second": 3.547,
+      "step": 73590
+    },
+    {
+      "entropy": 0.6307261824680745,
+      "epoch": 31.0,
+      "grad_norm": 0.00015163421630859375,
+      "learning_rate": 1.1752878877027823e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 90064674.0,
+      "step": 76043
+    },
+    {
+      "epoch": 31.0,
+      "eval_entropy": 0.5423439878683823,
+      "eval_loss": 0.890313982963562,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 90064674.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6589,
+      "eval_samples_per_second": 7.106,
+      "eval_steps_per_second": 3.553,
+      "step": 76043
+    },
+    {
+      "entropy": 0.6317850742056279,
+      "epoch": 32.0,
+      "grad_norm": 0.0005035400390625,
+      "learning_rate": 1.113431957636379e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 105075453.0,
+      "step": 78496
+    },
+    {
+      "epoch": 32.0,
+      "eval_entropy": 0.5422184283916767,
+      "eval_loss": 0.8882402181625366,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 105075453.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6075,
+      "eval_samples_per_second": 7.207,
+      "eval_steps_per_second": 3.604,
+      "step": 78496
+    },
+    {
+      "entropy": 0.6315069926961121,
+      "epoch": 33.0,
+      "grad_norm": 0.0079345703125,
+      "learning_rate": 1.0515760275699757e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 120086232.0,
+      "step": 80949
+    },
+    {
+      "epoch": 33.0,
+      "eval_entropy": 0.5428683024186355,
+      "eval_loss": 0.8859032988548279,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 120086232.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6537,
+      "eval_samples_per_second": 7.116,
+      "eval_steps_per_second": 3.558,
+      "step": 80949
+    },
+    {
+      "entropy": 0.6313212784246381,
+      "epoch": 34.0,
+      "grad_norm": 0.000885009765625,
+      "learning_rate": 9.897200975035723e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 135097011.0,
+      "step": 83402
+    },
+    {
+      "epoch": 34.0,
+      "eval_entropy": 0.5425068598527175,
+      "eval_loss": 0.887780487537384,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 135097011.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6448,
+      "eval_samples_per_second": 7.133,
+      "eval_steps_per_second": 3.567,
+      "step": 83402
+    },
+    {
+      "entropy": 0.6308202771352254,
+      "epoch": 35.0,
+      "grad_norm": 0.00032806396484375,
+      "learning_rate": 9.27864167437169e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 150107790.0,
+      "step": 85855
+    },
+    {
+      "epoch": 35.0,
+      "eval_entropy": 0.54246619114509,
+      "eval_loss": 0.8900800347328186,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 150107790.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6253,
+      "eval_samples_per_second": 7.172,
+      "eval_steps_per_second": 3.586,
+      "step": 85855
+    },
+    {
+      "entropy": 0.6310893858737767,
+      "epoch": 36.0,
+      "grad_norm": 0.00543212890625,
+      "learning_rate": 8.660082373707658e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 165118569.0,
+      "step": 88308
+    },
+    {
+      "epoch": 36.0,
+      "eval_entropy": 0.542354785479032,
+      "eval_loss": 0.882867157459259,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 165118569.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6309,
+      "eval_samples_per_second": 7.161,
+      "eval_steps_per_second": 3.58,
+      "step": 88308
+    },
+    {
+      "entropy": 0.6313383878492308,
+      "epoch": 37.0,
+      "grad_norm": 0.0014495849609375,
+      "learning_rate": 8.041523073043624e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 180129348.0,
+      "step": 90761
+    },
+    {
+      "epoch": 37.0,
+      "eval_entropy": 0.5429406670423654,
+      "eval_loss": 0.8894430994987488,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 180129348.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6047,
+      "eval_samples_per_second": 7.213,
+      "eval_steps_per_second": 3.606,
+      "step": 90761
+    },
+    {
+      "entropy": 0.6315074832012738,
+      "epoch": 38.0,
+      "grad_norm": 1.8477439880371094e-05,
+      "learning_rate": 7.422963772379592e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 195140127.0,
+      "step": 93214
+    },
+    {
+      "epoch": 38.0,
+      "eval_entropy": 0.5428708929281968,
+      "eval_loss": 0.8853751420974731,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 195140127.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6095,
+      "eval_samples_per_second": 7.203,
+      "eval_steps_per_second": 3.602,
+      "step": 93214
+    },
+    {
+      "entropy": 0.6316086658156264,
+      "epoch": 39.0,
+      "grad_norm": 0.0019378662109375,
+      "learning_rate": 6.804404471715559e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 210150906.0,
+      "step": 95667
+    },
+    {
+      "epoch": 39.0,
+      "eval_entropy": 0.5423155472828791,
+      "eval_loss": 0.8865050673484802,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 210150906.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6105,
+      "eval_samples_per_second": 7.201,
+      "eval_steps_per_second": 3.601,
+      "step": 95667
+    },
+    {
+      "entropy": 0.6319762418161253,
+      "epoch": 40.0,
+      "grad_norm": 0.0076904296875,
+      "learning_rate": 6.185845171051526e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 225161685.0,
+      "step": 98120
+    },
+    {
+      "epoch": 40.0,
+      "eval_entropy": 0.5423448315033546,
+      "eval_loss": 0.887237012386322,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 225161685.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6062,
+      "eval_samples_per_second": 7.21,
+      "eval_steps_per_second": 3.605,
+      "step": 98120
+    },
+    {
+      "entropy": 0.6316094772090632,
+      "epoch": 41.0,
+      "grad_norm": 0.00040435791015625,
+      "learning_rate": 5.567285870387493e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 240172464.0,
+      "step": 100573
+    },
+    {
+      "epoch": 41.0,
+      "eval_entropy": 0.5424330555475675,
+      "eval_loss": 0.8862788081169128,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 240172464.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6042,
+      "eval_samples_per_second": 7.214,
+      "eval_steps_per_second": 3.607,
+      "step": 100573
+    },
+    {
+      "entropy": 0.6310035889118581,
+      "epoch": 42.0,
+      "grad_norm": 0.0020294189453125,
+      "learning_rate": 4.94872656972346e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 255183243.0,
+      "step": 103026
+    },
+    {
+      "epoch": 42.0,
+      "eval_entropy": 0.5431472292313209,
+      "eval_loss": 0.890018105506897,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 255183243.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6041,
+      "eval_samples_per_second": 7.214,
+      "eval_steps_per_second": 3.607,
+      "step": 103026
+    },
+    {
+      "entropy": 0.6312229550229838,
+      "epoch": 43.0,
+      "grad_norm": 0.0012969970703125,
+      "learning_rate": 4.330167269059427e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 270194022.0,
+      "step": 105479
+    },
+    {
+      "epoch": 43.0,
+      "eval_entropy": 0.5424636235603919,
+      "eval_loss": 0.8868480324745178,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 270194022.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.606,
+      "eval_samples_per_second": 7.21,
+      "eval_steps_per_second": 3.605,
+      "step": 105479
+    },
+    {
+      "entropy": 0.631434175660063,
+      "epoch": 44.0,
+      "grad_norm": 7.390975952148438e-05,
+      "learning_rate": 3.711607968395394e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 285204801.0,
+      "step": 107932
+    },
+    {
+      "epoch": 44.0,
+      "eval_entropy": 0.5421680899766775,
+      "eval_loss": 0.8860384821891785,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 285204801.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6344,
+      "eval_samples_per_second": 7.154,
+      "eval_steps_per_second": 3.577,
+      "step": 107932
+    },
+    {
+      "entropy": 0.6307510763127319,
+      "epoch": 45.0,
+      "grad_norm": 0.00927734375,
+      "learning_rate": 3.0930486677313608e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 300215580.0,
+      "step": 110385
+    },
+    {
+      "epoch": 45.0,
+      "eval_entropy": 0.54229736328125,
+      "eval_loss": 0.8853968977928162,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 300215580.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.61,
+      "eval_samples_per_second": 7.202,
+      "eval_steps_per_second": 3.601,
+      "step": 110385
+    },
+    {
+      "entropy": 0.6315490893937595,
+      "epoch": 46.0,
+      "grad_norm": 0.0001239776611328125,
+      "learning_rate": 2.474489367067328e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 315226359.0,
+      "step": 112838
+    },
+    {
+      "epoch": 46.0,
+      "eval_entropy": 0.5422170620698196,
+      "eval_loss": 0.8882192373275757,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 315226359.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7084,
+      "eval_samples_per_second": 7.011,
+      "eval_steps_per_second": 3.506,
+      "step": 112838
+    },
+    {
+      "entropy": 0.6317317981380761,
+      "epoch": 47.0,
+      "grad_norm": 3.3855438232421875e-05,
+      "learning_rate": 1.855930066403295e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 330237138.0,
+      "step": 115291
+    },
+    {
+      "epoch": 47.0,
+      "eval_entropy": 0.5427549022894639,
+      "eval_loss": 0.8879793882369995,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 330237138.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6923,
+      "eval_samples_per_second": 7.042,
+      "eval_steps_per_second": 3.521,
+      "step": 115291
+    },
+    {
+      "entropy": 0.6314135375092869,
+      "epoch": 48.0,
+      "grad_norm": 0.0025634765625,
+      "learning_rate": 1.2373707657392621e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 345247917.0,
+      "step": 117744
+    },
+    {
+      "epoch": 48.0,
+      "eval_entropy": 0.5423269546948947,
+      "eval_loss": 0.887828528881073,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 345247917.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.661,
+      "eval_samples_per_second": 7.102,
+      "eval_steps_per_second": 3.551,
+      "step": 117744
+    },
+    {
+      "entropy": 0.6317788491650499,
+      "epoch": 49.0,
+      "grad_norm": 0.0015106201171875,
+      "learning_rate": 6.18811465075229e-07,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 360258696.0,
+      "step": 120197
+    },
+    {
+      "epoch": 49.0,
+      "eval_entropy": 0.5421000031324533,
+      "eval_loss": 0.886226236820221,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 360258696.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.8724,
+      "eval_samples_per_second": 6.714,
+      "eval_steps_per_second": 3.357,
+      "step": 120197
+    },
+    {
+      "entropy": 0.6307675256881722,
+      "epoch": 50.0,
+      "grad_norm": 0.0003414154052734375,
+      "learning_rate": 2.5216441119609984e-10,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 375269475.0,
+      "step": 122650
+    },
+    {
+      "epoch": 50.0,
+      "eval_entropy": 0.5427401478473957,
+      "eval_loss": 0.888108491897583,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 375269475.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7116,
+      "eval_samples_per_second": 7.005,
+      "eval_steps_per_second": 3.503,
+      "step": 122650
+    }
+  ],
+  "logging_steps": 0,
+  "max_steps": 122650,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 50,
+  "save_steps": 0,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.3796448253168845e+19,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

LICENSE ADDED Viewed

	@@ -0,0 +1,114 @@

+LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
+Llama 3.1 Version Release Date: July 23, 2024
+“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
+Llama Materials set forth herein.
+“Documentation” means the specifications, manuals and documentation accompanying Llama 3.1
+distributed by Meta at https://llama.meta.com/doc/overview.
+“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
+this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
+regulations to provide legal consent and that has legal authority to bind your employer or such other
+person or entity if you are entering in this Agreement on their behalf.
+“Llama 3.1” means the foundational large language models and software and algorithms, including
+machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
+fine-tuning enabling code and other elements of the foregoing distributed by Meta at
+https://llama.meta.com/llama-downloads.
+“Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any
+portion thereof) made available under this Agreement.
+“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
+principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
+outside of the EEA or Switzerland).
+By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
+you agree to be bound by this Agreement.
+1. License Rights and Redistribution.
+  a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
+limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
+Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
+Llama Materials.
+  b. Redistribution and Use.
+      i. If you distribute or make available the Llama Materials (or any derivative works
+thereof), or a product or service (including another AI model) that contains any of them, you shall (A)
+provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with
+Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use
+the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
+otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at
+the beginning of any such AI model name.
+      ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
+of an integrated end user product, then Section 2 of this Agreement will not apply to you.
+      iii. You must retain in all copies of the Llama Materials that you distribute the following
+attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is
+licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights
+Reserved.”
+      iv. Your use of the Llama Materials must comply with applicable laws and regulations
+(including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
+Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by
+reference into this Agreement.
+2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users
+of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
+million monthly active users in the preceding calendar month, you must request a license from Meta,
+which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
+rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
+3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
+OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
+ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
+INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
+MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
+DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
+ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
+RESULTS.
+4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
+OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
+INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
+OF THE POSSIBILITY OF ANY OF THE FOREGOING.
+5. Intellectual Property.
+  a. No trademark licenses are granted under this Agreement, and in connection with the Llama
+Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
+or any of its affiliates, except as required for reasonable and customary use in describing and
+redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
+use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
+comply with Meta’s brand guidelines (currently accessible at
+https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
+of the Mark will inure to the benefit of Meta.
+  b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
+respect to any derivative works and modifications of the Llama Materials that are made by you, as
+between you and Meta, you are and will be the owner of such derivative works and modifications.
+  c. If you institute litigation or other proceedings against Meta or any entity (including a
+cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or
+results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
+rights owned or licensable by you, then any licenses granted to you under this Agreement shall
+terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
+harmless Meta from and against any claim by any third party arising out of or related to your use or
+distribution of the Llama Materials.
+6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
+Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
+accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
+breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
+and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
+Agreement.
+7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
+the State of California without regard to choice of law principles, and the UN Convention on Contracts
+for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
+exclusive jurisdiction of any dispute arising out of this Agreement.

README.md ADDED Viewed

	@@ -0,0 +1,343 @@

+---
+license: llama3.1
+base_model:
+  - meta-llama/Llama-3.1-8B-Instruct
+language:
+  - fr
+tags:
+  - biomedical-entity-linking
+  - entity-linking
+  - entity-disambiguation
+  - named-entity-linking
+  - biomedical
+  - healthcare
+  - umls
+  - quaero
+  - text-generation
+  - constrained-decoding
+  - causal-lm
+  - llm
+library_name: transformers
+pipeline_tag: text-generation
+datasets:
+  - bigbio/quaero
+finetuning_task:
+  - entity-linking
+metrics:
+  - recall
+model-index:
+  - name: LongBEL-8B-QUAERO-EMEA
+    results:
+      - task:
+          type: entity-linking
+          name: Biomedical Entity Linking
+        dataset:
+          type: bigbio/quaero
+          name: QUAERO-EMEA
+          config: quaero_emea_bigbio_kb
+        metrics:
+          - type: recall
+            name: Recall@1
+            value: 0.754
+---
+# LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
+## LongBEL
+**LongBEL** is a novel document-level framework for biomedical entity linking (BEL). Instead of normalizing each mention independently, LongBEL conditions each prediction on the document context and on previous normalizations produced in the same document. This design enforces document-level consistency and is enhanced by our **robust memory** mechanism. The method is introduced in our paper, currently under review.
+## LongBEL (QUAERO-EMEA Edition)
+This is a **finetuned version of LLaMA-3-8B** trained on **QUAERO-EMEA**, applying the LongBEL framework to enable long context and robust memory predictions.
+| Field | Value |
+|---|---|
+| Base model | `meta-llama/Llama-3.1-8B-Instruct` |
+| Task | Biomedical Entity Linking |
+| Dataset | QUAERO-EMEA |
+| Knowledge base | UMLS 2014AA |
+| Input | BigBio-like documents with mention spans and semantic groups |
+| Output | Ranked UMLS concept predictions |
+| Decoding | Semantic-guided constrained decoding |
+| Main metric | Recall@1 |
+## Intended Use
+This model is intended for research on biomedical entity linking and document-level consistency.
+It assumes that mention spans and semantic groups are already provided. It does **not** perform named entity recognition. In a full pipeline, a NER model should first detect mentions and assign semantic groups, then LongBEL can normalize these mentions to UMLS concepts.
+## Usage
+### Loading the model
+```python
+import torch
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained(
+    "AnonymousARR42/LongBEL_8B_QUAERO_EMEA",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+````
+### Inference example
+The model expects BigBio-like documents. Each entity should include a mention text, character offsets, and a semantic group in the `type` field.
+```python
+num_beams = 5
+bigbio_pages = [
+    {
+        "id": "001",
+        "document_id": "doc_001",
+        "passages": [
+            {
+                "id": "0",
+                "type": "paragraph",
+                "text": [
+                    "A 29-year-old pregnant woman presented with severe-range hypertension, "
+                    "headache, and epigastric pain. Laboratory testing showed proteinuria "
+                    "and mildly elevated liver enzymes. She was admitted overnight with "
+                    "suspected PET and was started on urgent treatment."
+                ],
+                "offsets": [[0, 257]],
+            }
+        ],
+        "entities": [
+            {
+                "id": "T1",
+                "type": "Living Beings",
+                "text": ["pregnant woman"],
+                "offsets": [[14, 28]],
+            },
+            {
+                "id": "T2",
+                "type": "Disorders",
+                "text": ["severe-range hypertension"],
+                "offsets": [[44, 69]],
+            },
+            {
+                "id": "T3",
+                "type": "Disorders",
+                "text": ["proteinuria"],
+                "offsets": [[128, 139]],
+            },
+            {
+                "id": "T4",
+                "type": "Disorders",
+                "text": ["PET"],
+                "offsets": [[217, 220]],
+            },
+        ],
+        "events": [],
+        "coreferences": [],
+        "relations": [],
+    }
+]
+predictions = model.sample(
+    bigbio_pages=bigbio_pages,
+    num_beams=num_beams,
+)
+for i in range(0, len(predictions), num_beams):
+    mention = predictions[i]["mention"]
+    print(f"## Mention {(i // num_beams) + 1}: {mention}")
+    for j in range(num_beams):
+        pred = predictions[i + j]
+        print(
+            f"   - Beam {j + 1}:\n"
+            f"     Predicted concept name: {pred['pred_concept_name']}\n"
+            f"     Predicted code: {pred['pred_concept_code']}\n"
+            f"     Beam score: {pred['beam_score']:.3f}\n"
+        )
+```
+**Example Output:**
+```text
+## Mention 1: pregnant woman
+   - Beam 1:
+   - Predicted concept name:Pregnant Woman
+   - Predicted code: C0033011
+   - Beam score: 1.000
+   - Beam 2:
+   - Predicted concept name:Pregnant woman
+   - Predicted code: C0033011
+   - Beam score: 0.003
+   - Beam 3:
+   - Predicted concept name:Pregnant woman (person)
+   - Predicted code: C0033011
+   - Beam score: 0.001
+   - Beam 4:
+   - Predicted concept name:Pregnancy Partner
+   - Predicted code: C3538996
+   - Beam score: 0.000
+   - Beam 5:
+   - Predicted concept name:Pregnant woman (person)
+   - Predicted code: C0033011
+   - Beam score: 0.000
+## Mention 2: severe-range hypertension
+   - Beam 1:
+   - Predicted concept name:Hypertensive disease
+   - Predicted code: C0020538
+   - Beam score: 0.078
+   - Beam 2:
+   - Predicted concept name:Hypertension (in some patients)
+   - Predicted code: C3280936
+   - Beam score: 0.022
+   - Beam 3:
+   - Predicted concept name:Hypertensive disease (disorder)
+   - Predicted code: C0020538
+   - Beam score: 0.010
+   - Beam 4:
+   - Predicted concept name:Hypertension, severe
+   - Predicted code: C4013784
+   - Beam score: 0.010
+   - Beam 5:
+   - Predicted concept name:Hypertension (patient A)
+   - Predicted code: C4313262
+   - Beam score: 0.004
+## Mention 3: proteinuria
+   - Beam 1:
+   - Predicted concept name:Proteinurias
+   - Predicted code: C0033687
+   - Beam score: 1.000
+   - Beam 2:
+   - Predicted concept name:Proteinuric diabetic nephropathy (disorder)
+   - Predicted code: C0403519
+   - Beam score: 0.003
+   - Beam 3:
+   - Predicted concept name:Proteinuria
+   - Predicted code: C0033687
+   - Beam score: 0.003
+   - Beam 4:
+   - Predicted concept name:Proteinuric diabetic nephropathy
+   - Predicted code: C0403519
+   - Beam score: 0.002
+   - Beam 5:
+   - Predicted concept name:Proteinuric hypertension of pregnancy (disorder)
+   - Predicted code: C0032914
+   - Beam score: 0.001
+## Mention 4: PET
+   - Beam 1:
+   - Predicted concept name:PET - Pre-eclamptic toxemia
+   - Predicted code: C0032914
+   - Beam score: 0.075
+   - Beam 2:
+   - Predicted concept name:PET - Pre-eclamptic toxaemia
+   - Predicted code: C0032914
+   - Beam score: 0.039
+   - Beam 3:
+   - Predicted concept name:Preeclamptic toxemia
+   - Predicted code: C2931877
+   - Beam score: 0.027
+   - Beam 4:
+   - Predicted concept name:Preeclampsia
+   - Predicted code: C0032914
+   - Beam score: 0.023
+   - Beam 5:
+   - Predicted concept name:Preeclampsia with Severe Features
+   - Predicted code: C0341950
+   - Beam score: 0.019
+```
+## Evaluation
+Entity linking performance is reported using Recall@1 with bootstrap confidence intervals. The best result is shown in **bold**, and the second-best result is <u>underlined</u>.
+| Model | MM-ST21PV<br>(English) | QUAERO-EMEA<br>(French) | SympTEMIST<br>(Spanish) | DisTEMIST<br>(Spanish) | MedProcNER<br>(Spanish) |
+| :--- | :---: | :---: | :---: | :---: | :---: |
+| **Context-Free BEL** ||||| |
+| SciSpacy | 53.8 ± 1.0 | 37.1 ± 4.3 | 9.8 ± 1.3 | 21.1 ± 1.9 | 10.3 ± 1.2 |
+| SapBERT | 65.6 ± 1.0 | 59.7 ± 3.8 | 34.2 ± 2.0 | 38.6 ± 2.6 | 30.4 ± 2.1 |
+| CODER-all | 62.9 ± 1.1 | 66.9 ± 4.0 | 42.2 ± 2.2 | 47.0 ± 2.6 | 42.7 ± 2.1 |
+| SapBERT-all | 64.6 ± 1.1 | 67.9 ± 3.9 | 49.8 ± 2.4 | 49.6 ± 2.6 | 45.1 ± 2.2 |
+| BERGAMOT | 60.9 ± 1.1 | 63.8 ± 4.9 | 48.0 ± 2.7 | 48.9 ± 2.4 | 42.3 ± 2.2 |
+| **Local-Context BEL** ||||| |
+| ArboEL | 76.9 ± 0.9 | 63.0 ± 3.9 | 55.4 ± 2.5 | 54.7 ± 2.6 | 59.7 ± 2.6 |
+| GENRE / mBART-large | 69.6 ± 1.0 | 69.3 ± 5.4 | 59.8 ± 2.7 | 58.7 ± 2.7 | 66.0 ± 2.3 |
+| GENRE / Llama-1B | 73.1 ± 1.0 | 75.1 ± 3.6 | 60.5 ± 2.4 | 62.5 ± 2.3 | 67.4 ± 2.1 |
+| GENRE / Llama-8B | 75.0 ± 0.9 | 73.8 ± 4.0 | 61.7 ± 2.5 | 63.2 ± 2.5 | 68.3 ± 2.2 |
+| **Global-Context BEL: LongBEL** ||||| |
+| LongBEL-1B | 77.6 ± 0.9 | 74.5 ± 3.7 | 59.8 ± 2.5 | 61.9 ± 2.4 | 66.6 ± 2.1 |
+| LongBEL-1B + Ensemble | 78.6 ± 0.8 | <u>77.2 ± 3.0</u> | 61.8 ± 2.5 | <u>64.3 ± 2.2</u> | <u>69.0 ± 2.0</u> |
+| **LongBEL-8B** | <u>79.3 ± 0.8</u> | 75.4 ± 3.4 | <u>62.0 ± 2.6</u> | 63.6 ± 2.1 | <u>69.0 ± 2.1</u> |
+| LongBEL-8B + Ensemble | **80.0 ± 0.8** | **77.6 ± 3.0** | **63.3 ± 2.5** | **65.8 ± 2.2** | **71.0 ± 2.0** |
+The score reported for this checkpoint is the **single LongBEL-8B model**. The ensemble result requires fusing several LongBEL input configurations and is not produced by this checkpoint alone.
+## Speed and Memory
+Measured on a single NVIDIA H100 80GB GPU.
+| Model                   | Model memory | Candidate memory |           Speed |
+| ----------------------- | -----------: | ---------------: | --------------: |
+| GENRE-Llama-8B baseline |      28.6 GB |           5.4 GB | 38.2 mentions/s |
+| LongBEL-8B              |      28.6 GB |           5.4 GB | 15.2 mentions/s |
+LongBEL has the same model memory footprint as the sentence-level Llama-8B baseline, but it is slower because it processes longer contexts and updates document-level memory during inference.
+## Limitations
+This model assumes that mention spans and semantic groups are given. It does not perform mention detection.
+LongBEL is most useful when concepts recur within a document. When most concepts appear only once, the memory mechanism has less information to exploit.
+Because LongBEL uses previous predictions as memory, early mistakes can still influence later predictions. Robust memory training reduces this risk but does not remove it completely.
+This model is intended for research use. It should not be used for clinical decision-making without additional validation and human oversight.
+## Reproducibility
+Code and evaluation scripts are available in this [GitHub repository](https://anonymous.4open.science/r/LongBEL-31AD).
+Trained model checkpoints and processed datasets are available in the anonymous Hugging Face collection associated with LongBEL.
+<!-- ## Citation
+If you use this model, please cite the LongBEL paper.
+```bibtex
+@inproceedings{longbel2026,
+  title = {LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking},
+  author = {Anonymous},
+  booktitle = {Anonymous submission},
+  year = {2026}
+}
+``` -->

__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+# __init__.py
+from .longbel import LLamaLongBEL, LLamaLongBELConfig
+__all__ = ["LLamaLongBEL", "LLamaLongBELConfig"]

candidate_trie.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d060c96d86bbd5f0531a1eca465a4f645c8fd85fc4c47e2fb5197a5795d053b6
+size 298349465

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,5 @@

+{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
+'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>
+' }}

config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "architectures": [
+    "LLamaLongBEL"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "dtype": "bfloat16",
+  "eos_token_id": 128009,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama_longbel",
+  "auto_map": {
+    "AutoConfig": "longbel.LLamaLongBELConfig",
+    "AutoModelForCausalLM": "longbel.LLamaLongBEL"
+  },
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pad_token_id": 128009,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "vocab_size": 128257
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128009,
+    128001,
+    128008,
+    128009
+  ],
+  "pad_token_id": 128009,
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.57.1"
+}

longbel.py ADDED Viewed

	@@ -0,0 +1,981 @@

+"""
+Core models for LongBEL
+"""
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import json
+import logging
+import os
+import pickle
+import re
+from html import escape
+from typing import Optional
+import nltk
+import torch
+import torch.nn.functional as F
+from huggingface_hub import hf_hub_download
+from tqdm.auto import tqdm
+from transformers import (
+    AutoTokenizer,
+    LlamaForCausalLM,
+    PretrainedConfig,
+)
+logger = logging.getLogger(__name__)
+logging.basicConfig(
+    level=logging.INFO,  # Display INFO and above
+    format="%(levelname)s - %(message)s",
+)
+# Define a simple config class that inherits from PretrainedConfig
+class LLamaLongBELConfig(PretrainedConfig):
+    model_type = "llama_longbel"
+    def __init__(self, **kwargs):
+        # Ensure it has llama as base
+        kwargs.setdefault("model_type", "llama")
+        super().__init__(**kwargs)
+def clean_natural(text):
+    return (
+        text.replace("\xa0", " ")
+        .replace("{", "(")
+        .replace("}", ")")
+        .replace("[", "(")
+        .replace("]", ")")
+        .replace("\n", " ")
+    )
+def parse_text(
+    data,
+    start_entity,
+    end_entity,
+    start_group,
+    end_group,
+    nlp,
+) -> tuple[list[str], list[str], list[dict[str, str]]]:
+    """Create simple (source, target) pairs per entity.
+    For each entity in the BigBio page, returns one pair where:
+      - source: the sentence text that contains the entity mention
+      - target: "<entity> is <annotation>" where <annotation> is the best synonym
+        if available (or the normalized id otherwise).
+    """
+    source_sentences: list[str] = []
+    tsv_lines: list[dict[str, str]] = []
+    target_texts_dict: dict[tuple[tuple[int, int], ...], str] = {}
+    source_texts_dict: dict[tuple[tuple[int, int], ...], str] = {}
+    tsv_lines_dict: dict[tuple[tuple[int, int], ...], dict[str, str]] = {}
+    all_passages = {}
+    for i, passage in enumerate(data.get("passages", [])):
+        all_passages[i] = clean_natural(passage["text"][0])
+    for passage_id, passage in enumerate(data.get("passages", [])):
+        passage_text = passage["text"][0]
+        start_offset_passage = passage["offsets"][0][0]
+        end_offset_passage = passage["offsets"][0][1]
+        passage_text = clean_natural(passage_text)
+        # Iterate over entities and emit one pair per entity found in this passage
+        for entity in data.get("entities", []):
+            # min and max of all entity offsets to get the global span of the entity for filtering sentences
+            global_start = min(off[0] for off in entity["offsets"])
+            global_end = max(off[1] for off in entity["offsets"])
+            # Keep only entities whose start falls inside this passage
+            if not (start_offset_passage <= global_start < end_offset_passage):
+                continue
+            entity_text = " ".join(entity["text"])
+            entity_text = clean_natural(entity_text)
+            # Define entity group
+            group_annotation = entity.get("type")
+            # Get all offsets, convert to relative, and filter for this sentence
+            relative_entity_spans = []
+            for off in entity["offsets"]:
+                global_start_off, global_end_off = off
+                if not (start_offset_passage <= global_start_off < end_offset_passage):
+                    continue
+                rel_start_off = global_start_off - start_offset_passage
+                rel_end_off = global_end_off - start_offset_passage
+                relative_entity_spans.append((rel_start_off, rel_end_off))
+            relative_entity_spans.sort(key=lambda x: x[0])
+            marked_text = passage_text
+            for start_in_sent, end_in_sent in relative_entity_spans:
+                marked_text = (
+                    marked_text[:start_in_sent]
+                    + start_entity
+                    + marked_text[start_in_sent:end_in_sent]
+                    + end_entity
+                    + marked_text[end_in_sent:]
+                )
+            for other_passage_id, other_passage_text in all_passages.items():
+                if other_passage_id < passage_id:
+                    marked_text = other_passage_text + "\n" + marked_text
+                elif other_passage_id > passage_id:
+                    marked_text = marked_text + "\n" + other_passage_text
+            # Emit the pair
+            doc_id = data.get("id", "")
+            tsv_line = {
+                "doc_id": doc_id,
+                "semantic_group": group_annotation,
+                "start_span": global_start,
+                "end_span": global_end,
+                "mention": entity_text,
+            }
+            if entity.get("normalized"):
+                tsv_line["gold_concept_code"] = entity["normalized"][0]["db_id"]
+                tsv_line["gold_concept_name"] = entity["normalized"][0]["db_match"]
+            tsv_lines_dict[(global_start, global_end)] = tsv_line
+            source_texts_dict[(global_start, global_end)] = marked_text
+            target_entity_text = (
+                start_entity
+                + entity_text
+                + end_entity
+                + start_group
+                + group_annotation
+                + end_group
+            )
+            target_texts_dict[(global_start, global_end)] = target_entity_text
+    # Sort keys to have a deterministic order
+    target_texts = []
+    sorted_keys = sorted(tsv_lines_dict.keys(), key=lambda x: (x[0], x[1]))
+    for entity_id, entity_span in enumerate(sorted_keys):
+        tsv_line = tsv_lines_dict[entity_span]
+        tsv_line["mention_id"] = f"{data.get('id', '')}.{entity_id + 1}"
+        tsv_lines.append(tsv_line)
+        source_sentences.append(source_texts_dict[entity_span])
+        target_texts.append(target_texts_dict[entity_span])
+    return source_sentences, target_texts, tsv_lines  # type: ignore
+def get_prefix_allowed_tokens_fn(
+    model,
+    sources: list[str],
+    sem_groups: list[str],
+    multiple_answers: bool = False,
+):
+    candidates_trie = model.candidate_trie  # type: ignore
+    sep_token_id = model.tokenizer.sep_token_id
+    eos_token_id = model.tokenizer.eos_token_id
+    pad_token_id = model.tokenizer.pad_token_id
+    plus_token_id = model.tokenizer.convert_tokens_to_ids("<+>")  # type: ignore
+    end_group_token_id = model.tokenizer.convert_tokens_to_ids("}")  # type: ignore
+    def prefix_allowed_tokens_fn(batch_id, sent):
+        sent = sent.tolist()
+        if len(sent) > 1 and sent[-1] in [eos_token_id, pad_token_id, sep_token_id]:
+            if sep_token_id:
+                return [sep_token_id, pad_token_id, eos_token_id]
+            else:
+                return [pad_token_id, eos_token_id]
+        # Remove the prefix from the sent
+        index_sep = len(sent) - 1 - sent[::-1].index(end_group_token_id)
+        sent = sent[index_sep:]
+        sem_group = sem_groups[batch_id]
+        # Remove everything up to last sep_token_id and add prefix and tgt_lang_id
+        if multiple_answers and plus_token_id in sent:
+            index_plus = len(sent) - 1 - sent[::-1].index(plus_token_id)
+            # Start fresh with decoder start
+            if index_plus == len(sent) - 1:
+                sent = [end_group_token_id]
+            # If there are tokens after the last plus_token_id, keep them
+            else:
+                sent = [end_group_token_id] + sent[index_plus + 1 :]
+        trie_out = candidates_trie[
+            sem_group  # type: ignore
+        ].get(sent)
+        if eos_token_id in trie_out:
+            if sep_token_id:
+                trie_out += [sep_token_id]
+            if multiple_answers:
+                trie_out += [plus_token_id]
+        elif not trie_out:
+            if sep_token_id:
+                return [sep_token_id, pad_token_id, eos_token_id]
+            else:
+                return [pad_token_id, eos_token_id]
+        return trie_out
+    return prefix_allowed_tokens_fn
+def add_headers_to_prompt(source: str, target: str, previous_targets: str):
+    if not previous_targets:
+        previous_targets = "None"
+    input_sentence = f"### Context\n{source.rstrip()}\n\n### Previous Normalizations\n{previous_targets.rstrip()}\n\n### Prediction\n{target.rstrip()}"
+    return input_sentence
+def parse_prediction(
+    outputs: list[str],
+    sem_groups: list[str],
+    text_to_code: Optional[dict[str, dict[str, str]]] = None,
+    multiple_answers: bool = False,
+) -> tuple[list[str], list[str]]:
+    codes = []
+    predictions = []
+    for output, group in zip(outputs, sem_groups):
+        splits = output.split("} ")  # type: ignore
+        if len(splits) > 1 and splits[-1].strip():
+            prediction = splits[-1].strip().replace("<SEP>", "")
+            if text_to_code:
+                if multiple_answers:
+                    prediction_list = prediction.split("<+>")  # type: ignore
+                    code_list = set()
+                    for pred in prediction_list:
+                        code_list.add(text_to_code[group].get(pred.strip(), "NO_CODE"))
+                    if len(code_list) > 1 and "NO_CODE" in code_list:
+                        code_list.remove("NO_CODE")
+                    code = "+".join(code_list)
+                else:
+                    code = text_to_code[group].get(prediction, "NO_CODE")
+            else:
+                code = "NO_CODE"
+        else:
+            print(
+                "IndexError: splitting failed or empty prediction, adding empty string as prediction."
+            )
+            prediction = "NO_PREDICTION"
+            code = "NO_CODE"
+        codes.append(code)
+        predictions.append(prediction)
+    return codes, predictions
+def compute_score(outputs, tokenizer, prefix_len=0):
+    sequences = outputs.sequences  # (N, seq_len)
+    scores = outputs.scores  # list length T = # generated tokens
+    N, total_len = sequences.shape
+    T = len(scores)
+    # keep only the generated part (completion)
+    sequences = sequences[:, prefix_len : prefix_len + T]
+    # Make sure score is not longer than sequences
+    if len(scores) > sequences.size(1):
+        scores = scores[: sequences.size(1)]
+    # Compute as usual but now only for completion tokens
+    mask = (
+        (sequences != tokenizer.pad_token_id)
+        & (sequences != tokenizer.eos_token_id)
+        & (sequences != tokenizer.bos_token_id)
+    )
+    # log-prob for each generated token
+    logprob_steps = []
+    for t, logits in enumerate(scores):
+        log_probs_t = F.log_softmax(logits, dim=-1)
+        token_t = sequences[:, t]
+        idx = torch.arange(N)
+        logprob_steps.append(log_probs_t[idx, token_t])
+    logprobs = torch.stack(logprob_steps, dim=1)
+    logprobs.masked_fill_(~mask, 0)
+    lengths = mask.sum(dim=1).clamp(min=1)
+    confidence = torch.exp(logprobs.sum(dim=1) / lengths)
+    return confidence.tolist()
+def skip_undesired_tokens(outputs, tokenizer):
+    sep_token = "<SEP>"
+    plus_token = "<+>"
+    # Build the list of special tokens to remove
+    tokens_to_remove = tokenizer.all_special_tokens[:2]
+    cleaned_outputs = []
+    for sequence in outputs:
+        # Remove undesired special tokens
+        for token in tokens_to_remove:
+            sequence = sequence.replace(token, "")
+        # Remove spaces *immediately* after the sep_token adn plus_token (e.g. "<sep>  text" → "<sep>text")
+        sequence = re.sub(rf"({re.escape(plus_token)})\s+", r"\1", sequence)
+        sequence = re.sub(rf"({re.escape(sep_token)})\s+", r"\1", sequence)
+        cleaned_outputs.append(sequence.strip())
+    return cleaned_outputs
+def _score_to_rgb(score: float) -> tuple[int, int, int]:
+    clipped_score = max(0.0, min(1.0, score))
+    red = 255
+    channel = int(255 * (1.0 - clipped_score))
+    return red, channel, channel
+def _build_ansi_saliency_text(
+    token_texts: list[str], saliency_scores: list[float]
+) -> str:
+    chunks = []
+    for token_text, score in zip(token_texts, saliency_scores):
+        red, green, blue = _score_to_rgb(score)
+        chunks.append(f"\x1b[48;2;{red};{green};{blue}m{token_text}\x1b[0m")
+    return "".join(chunks)
+def _build_html_saliency_text(
+    token_texts: list[str], saliency_scores: list[float]
+) -> str:
+    chunks = []
+    for token_text, score in zip(token_texts, saliency_scores):
+        red, green, blue = _score_to_rgb(score)
+        chunks.append(
+            f'<span style="background-color: rgb({red}, {green}, {blue});">{escape(token_text)}</span>'
+        )
+    return "".join(chunks)
+class LLamaLongBEL(LlamaForCausalLM):
+    config_class = LLamaLongBELConfig
+    def __init__(self, config, *args, **kwargs):
+        # Initialize the parent LlamaForCausalLM
+        super().__init__(config, *args, **kwargs)
+        # Store language from config
+        self.lang = getattr(config, "lang", "en")
+        self.text_to_code = None
+        self.candidate_trie = None
+        self.tokenizer = None
+    @classmethod
+    def from_pretrained(
+        cls,
+        pretrained_model_name_or_path,
+        *args,
+        lang=None,
+        text_to_code_path=None,
+        candidate_trie_path=None,
+        **kwargs,
+    ):
+        # Remove custom kwargs before passing to parent
+        custom_kwargs = {
+            "lang": lang,
+            "text_to_code_path": text_to_code_path,
+            "candidate_trie_path": candidate_trie_path,
+        }
+        # Call parent's from_pretrained
+        model = super().from_pretrained(
+            pretrained_model_name_or_path,
+            *args,
+            **{k: v for k, v in kwargs.items() if k not in custom_kwargs},
+        )
+        # Set up tokenizer
+        model.tokenizer = AutoTokenizer.from_pretrained(
+            pretrained_model_name_or_path, use_fast=True
+        )
+        model.tokenizer.padding_side = "left"
+        # Set language: explicit override > config > default
+        if lang is not None:
+            model.lang = lang
+        elif hasattr(model.config, "lang"):
+            model.lang = model.config.lang
+        else:
+            model.lang = "en"
+        logger.info(f"Model language set to: {model.lang}")
+        # Load text_to_code
+        text_to_code_file_local = (
+            text_to_code_path
+            if text_to_code_path is not None
+            else os.path.join(pretrained_model_name_or_path, "text_to_code.json")
+        )
+        try:
+            if os.path.exists(text_to_code_file_local):
+                with open(text_to_code_file_local, encoding="utf-8") as f:
+                    model.text_to_code = json.load(f)
+                logger.info(
+                    f"Loaded text_to_code.json from local path: {text_to_code_file_local}"
+                )
+            else:
+                text_to_code_path_hf = hf_hub_download(
+                    repo_id=pretrained_model_name_or_path,
+                    filename="text_to_code.json",
+                )
+                with open(text_to_code_path_hf, encoding="utf-8") as f:
+                    model.text_to_code = json.load(f)
+                logger.info(
+                    f"Loaded text_to_code.json from HF Hub: {text_to_code_path_hf}"
+                )
+        except Exception:
+            logger.warning("text_to_code.json not found (local or HF hub)")
+            model.text_to_code = None
+        # Load candidate_trie
+        candidate_trie_file_local = (
+            candidate_trie_path
+            if candidate_trie_path is not None
+            else os.path.join(pretrained_model_name_or_path, "candidate_trie.pkl")
+        )
+        try:
+            if os.path.exists(candidate_trie_file_local):
+                with open(candidate_trie_file_local, "rb") as f:
+                    model.candidate_trie = pickle.load(f)
+                logger.info(
+                    f"Loaded candidate_trie.pkl from local path: {candidate_trie_file_local}"
+                )
+            else:
+                candidate_trie_path_hf = hf_hub_download(
+                    repo_id=pretrained_model_name_or_path,
+                    filename="candidate_trie.pkl",
+                )
+                with open(candidate_trie_path_hf, "rb") as f:
+                    model.candidate_trie = pickle.load(f)
+                logger.info(
+                    f"Loaded candidate_trie.pkl from HF Hub: {candidate_trie_path_hf}"
+                )
+        except Exception:
+            logger.warning("candidate_trie.pkl not found (local or HF hub)")
+            model.candidate_trie = None
+        return model
+    def _compute_gradient_saliency(
+        self,
+        input_sentences: list[str],
+        generated_sequences: torch.Tensor,
+        num_beams: int,
+        prefix_len: int,
+        saliency_method: str = "integrated",
+        ig_steps: int = 20,
+        ig_baseline: str = "pad",
+    ) -> list[dict[str, object]]:
+        if not input_sentences:
+            return []
+        method = saliency_method.strip().lower()
+        if method == "integerated":
+            method = "integrated"
+        if method not in {"simple", "integrated"}:
+            raise ValueError("saliency_method must be one of: 'simple', 'integrated'.")
+        top_sequence_indices = (
+            torch.arange(
+                len(input_sentences),
+                device=generated_sequences.device,
+            )
+            * num_beams
+        )
+        top_sequences = generated_sequences.index_select(0, top_sequence_indices)
+        attention_mask = (top_sequences != self.tokenizer.pad_token_id).long()  # type: ignore
+        input_embeddings = self.get_input_embeddings()(top_sequences).detach()  # type: ignore
+        next_tokens = top_sequences[:, 1:]
+        output_token_mask = torch.zeros_like(next_tokens, dtype=torch.bool)
+        if prefix_len > 0:
+            output_token_mask[:, prefix_len - 1 :] = True
+        valid_token_mask = output_token_mask & (
+            (next_tokens != self.tokenizer.pad_token_id)  # type: ignore
+            & (next_tokens != self.tokenizer.eos_token_id)  # type: ignore
+            & (next_tokens != self.tokenizer.bos_token_id)  # type: ignore
+        )
+        def _objective_from_embeddings(embeddings: torch.Tensor) -> torch.Tensor:
+            forward_outputs = self(  # type: ignore
+                inputs_embeds=embeddings,
+                attention_mask=attention_mask,
+                use_cache=False,
+                return_dict=True,
+            )
+            logits = forward_outputs.logits[:, :-1, :]
+            log_probs = F.log_softmax(logits, dim=-1)
+            token_log_probs = log_probs.gather(
+                dim=-1,
+                index=next_tokens.unsqueeze(-1),
+            ).squeeze(-1)
+            return token_log_probs.masked_select(valid_token_mask).sum()
+        if method == "simple":
+            simple_embeddings = input_embeddings.detach()
+            simple_embeddings.requires_grad_(True)
+            self.zero_grad(set_to_none=True)  # type: ignore
+            with torch.enable_grad():
+                objective = _objective_from_embeddings(simple_embeddings)
+            gradients = torch.autograd.grad(
+                outputs=objective,
+                inputs=simple_embeddings,
+                retain_graph=False,
+                create_graph=False,
+            )[0]
+            token_importance = gradients.norm(p=2, dim=-1)
+        else:
+            if ig_baseline == "pad":  # type: ignore
+                baseline_ids = torch.full_like(
+                    top_sequences,
+                    self.tokenizer.pad_token_id,  # type: ignore
+                )
+                baseline_embeddings = self.get_input_embeddings()(baseline_ids).detach()  # type: ignore
+            elif ig_baseline == "zero":
+                baseline_embeddings = torch.zeros_like(input_embeddings)
+            elif ig_baseline == "random":
+                baseline_embeddings = torch.randn_like(input_embeddings)
+            elif ig_baseline == "avg":
+                baseline_embeddings = input_embeddings.mean(
+                    dim=1, keepdim=True
+                ).expand_as(input_embeddings)
+            else:
+                raise ValueError(
+                    f"Unsupported baseline type '{ig_baseline}'. Choose from 'pad', 'zero', 'random', 'avg'."
+                )
+            embedding_delta = input_embeddings - baseline_embeddings
+            total_gradients = torch.zeros_like(input_embeddings)
+            steps = max(1, ig_steps)
+            for step in range(1, steps + 1):
+                alpha = float(step) / float(steps)
+                interpolated_embeddings = (
+                    baseline_embeddings + alpha * embedding_delta
+                ).detach()
+                interpolated_embeddings.requires_grad_(True)
+                self.zero_grad(set_to_none=True)  # type: ignore
+                with torch.enable_grad():
+                    objective = _objective_from_embeddings(interpolated_embeddings)
+                gradients = torch.autograd.grad(
+                    outputs=objective,
+                    inputs=interpolated_embeddings,
+                    retain_graph=False,
+                    create_graph=False,
+                )[0]
+                total_gradients += gradients.detach()
+            averaged_gradients = total_gradients / float(steps)
+            integrated_gradients = embedding_delta * averaged_gradients
+            token_importance = integrated_gradients.norm(p=2, dim=-1)
+        saliency_maps = []
+        sequence_len = top_sequences.size(1)
+        prompt_positions = torch.arange(sequence_len, device=top_sequences.device)
+        prompt_mask = (prompt_positions.unsqueeze(0) < prefix_len) & (
+            top_sequences != self.tokenizer.pad_token_id  # type: ignore
+        )
+        for sequence_ids, importance_scores, sentence, mask in zip(
+            top_sequences,
+            token_importance,
+            input_sentences,
+            prompt_mask,
+        ):
+            selected_ids = sequence_ids[mask]
+            selected_scores = importance_scores[mask]
+            if selected_scores.numel() == 0:
+                saliency_maps.append({
+                    "input_sentence": sentence,
+                    "token_ids": [],
+                    "token_strings": [],
+                    "saliency_scores": [],
+                    "saliency_method": method,
+                    "saliency_ansi": "",
+                    "saliency_html": "",
+                })
+                continue
+            max_score = selected_scores.max().clamp(min=1e-12)
+            normalized_scores = (selected_scores / max_score).tolist()
+            selected_ids_list = selected_ids.tolist()
+            token_strings = [
+                self.tokenizer.decode(  # type: ignore
+                    [token_id],
+                    skip_special_tokens=False,
+                    clean_up_tokenization_spaces=False,
+                )
+                for token_id in selected_ids_list
+            ]
+            saliency_maps.append({
+                "input_sentence": sentence,
+                "token_ids": selected_ids_list,
+                "token_strings": token_strings,
+                "saliency_scores": normalized_scores,
+                "saliency_method": method,
+                "saliency_ansi": _build_ansi_saliency_text(
+                    token_strings,
+                    normalized_scores,
+                ),
+                "saliency_html": _build_html_saliency_text(
+                    token_strings,
+                    normalized_scores,
+                ),
+            })
+        return saliency_maps
+    def predict_batch(
+        self,
+        all_outputs,
+        batch_size,
+        input_sentences,
+        sem_groups,
+        mentions,
+        mentions_id,
+        doc_ids,
+        start_spans,
+        end_spans,
+        gold_concept_codes,
+        gold_concept_names,
+        constrained,
+        multiple_answers,
+        num_beams,
+        explicability_mode: str = "",
+        ig_steps: int = 20,
+        ig_baseline: str = "pad",
+        **kwargs,
+    ):
+        input_args = {
+            k: v.to(self.device)  # type: ignore
+            for k, v in self.tokenizer.batch_encode_plus(  # type: ignore
+                input_sentences, padding="longest", return_tensors="pt"
+            ).items()
+        }
+        # Constrained decoding
+        prefix_allowed_tokens_fn = None
+        if constrained:
+            if self.candidate_trie is None:  # type: ignore
+                raise ValueError(
+                    "candidate_trie is not loaded in the model. Use constrained=False."
+                )
+            prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(
+                model=self,
+                sources=input_sentences,
+                sem_groups=sem_groups,
+                multiple_answers=multiple_answers,
+            )
+        if self.tokenizer.sep_token_id:  # type: ignore
+            eos_token_id = self.tokenizer.sep_token_id  # type: ignore
+        else:
+            eos_token_id = self.tokenizer.eos_token_id  # type: ignore
+        outputs = self.generate(  # type: ignore
+            **input_args,
+            max_new_tokens=128,
+            num_beams=num_beams,
+            num_return_sequences=num_beams,
+            output_scores=True,
+            return_dict_in_generate=True,
+            prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
+            eos_token_id=eos_token_id,  # type: ignore
+            **kwargs,
+        )
+        decoded_sequences = self.tokenizer.batch_decode(  # type: ignore
+            outputs.sequences,  # type: ignore
+            skip_special_tokens=False,
+            clean_up_tokenization_spaces=True,
+        )
+        cleaned_output_sequences = skip_undesired_tokens(
+            decoded_sequences,
+            self.tokenizer,  # type: ignore
+        )
+        prefix_len = input_args["input_ids"].size(1)
+        base_sem_groups = sem_groups.copy()
+        base_mentions = mentions.copy()
+        base_mentions_id = mentions_id.copy()
+        base_doc_ids = doc_ids.copy()
+        base_start_spans = start_spans.copy()
+        base_end_spans = end_spans.copy()
+        base_gold_concept_codes = gold_concept_codes.copy()
+        base_gold_concept_names = gold_concept_names.copy()
+        # Duplicate sem_groups and mentions for each beam
+        sem_groups = [x for x in sem_groups for _ in range(num_beams)]
+        mentions = [x for x in mentions for _ in range(num_beams)]
+        mentions_id = [x for x in mentions_id for _ in range(num_beams)]
+        gold_concept_codes = [x for x in gold_concept_codes for _ in range(num_beams)]  # type: ignore
+        gold_concept_names = [x for x in gold_concept_names for _ in range(num_beams)]  # type: ignore
+        start_spans = [x for x in start_spans for _ in range(num_beams)]
+        end_spans = [x for x in end_spans for _ in range(num_beams)]
+        doc_ids = [x for x in doc_ids for _ in range(num_beams)]
+        # Parse predictions
+        pred_concept_codes, pred_concept_names = parse_prediction(
+            cleaned_output_sequences,
+            sem_groups,
+            self.text_to_code,  # type: ignore
+            multiple_answers=multiple_answers,
+        )
+        scores = compute_score(
+            outputs,
+            self.tokenizer,  # type: ignore
+            prefix_len=prefix_len,
+        )
+        beam_scores = [
+            float(torch.exp(s)) if num_beams > 1 else float("nan")
+            for s in (
+                outputs.sequences_scores  # type: ignore
+                if num_beams > 1
+                else [torch.tensor(float("nan"))] * len(scores)
+            )
+        ]
+        all_outputs.extend([
+            {
+                "mention": mention,
+                "doc_id": doc_id,
+                "mention_id": mention_id,
+                "start_span": start_span,
+                "end_span": end_span,
+                "semantic_group": group,
+                "gold_concept_code": gold_concept_code,
+                "gold_concept_name": gold_concept_name,
+                "pred_concept_name": pred_concept_name,
+                "pred_concept_code": pred_concept_code,
+                "score": score,
+                "beam_score": beam_score,
+                "rank": rank + 1,
+            }
+            for score, beam_score, pred_concept_code, pred_concept_name, mention, doc_id, mention_id, start_span, end_span, group, gold_concept_code, gold_concept_name, rank in zip(
+                scores,
+                beam_scores,
+                pred_concept_codes,
+                pred_concept_names,
+                mentions,
+                doc_ids,
+                mentions_id,
+                start_spans,
+                end_spans,
+                sem_groups,
+                gold_concept_codes,
+                gold_concept_names,
+                list(range(num_beams)) * batch_size,
+            )
+        ])
+        explicability_mode = explicability_mode.strip().lower()
+        if explicability_mode not in {"", "simple", "integrated"}:
+            raise ValueError(
+                "explicability must be one of: '', 'simple', 'integrated'."
+            )
+        saliency_maps = []
+        if explicability_mode:
+            saliency_maps = self._compute_gradient_saliency(
+                input_sentences=input_sentences,
+                generated_sequences=outputs.sequences,  # type: ignore
+                num_beams=num_beams,
+                prefix_len=prefix_len,
+                saliency_method=explicability_mode,
+                ig_steps=ig_steps,
+                ig_baseline=ig_baseline,
+            )
+            for idx, saliency_map in enumerate(saliency_maps):
+                top_prediction_index = idx * num_beams
+                saliency_map.update({
+                    "mention": base_mentions[idx],
+                    "doc_id": base_doc_ids[idx],
+                    "mention_id": base_mentions_id[idx],
+                    "start_span": base_start_spans[idx],
+                    "end_span": base_end_spans[idx],
+                    "semantic_group": base_sem_groups[idx],
+                    "gold_concept_code": base_gold_concept_codes[idx],
+                    "gold_concept_name": base_gold_concept_names[idx],
+                    "pred_concept_name": pred_concept_names[top_prediction_index],
+                    "pred_concept_code": pred_concept_codes[top_prediction_index],
+                    "score": scores[top_prediction_index],
+                    "rank": 1,
+                })
+        print(f"Sampling completed. Generated {len(all_outputs)} predictions.")
+        return all_outputs, cleaned_output_sequences, saliency_maps
+    def sample(
+        self,
+        bigbio_pages: list[dict],  # type: ignore
+        num_beams: int = 5,
+        constrained: bool = True,
+        explicability_mode: str = "",
+        multiple_answers: bool = False,
+        batch_size: int = 8,
+        start_entity: str = "[",
+        end_entity: str = "]",
+        start_group: str = "{",
+        end_group: str = "}",
+        show_progress: bool = True,
+        **kwargs,
+    ) -> (
+        list[dict[str, object]]
+        | tuple[list[dict[str, object]], list[dict[str, object]]]
+    ):
+        explicability_mode = explicability_mode.strip().lower()
+        if explicability_mode not in {"", "simple", "integrated"}:
+            raise ValueError(
+                "explicability must be one of: '', 'simple', 'integrated'."
+            )
+        # Prepare input batch
+        if self.lang == "fr":  # type: ignore
+            nlp = nltk.data.load("tokenizers/punkt/french.pickle")
+        elif self.lang == "en":  # type: ignore
+            nlp = nltk.data.load("tokenizers/punkt/english.pickle")
+        elif self.lang == "es":  # type: ignore
+            nlp = nltk.data.load("tokenizers/punkt/spanish.pickle")
+        else:
+            raise ValueError(f"Unsupported language: {self.lang}")  # type: ignore
+        print(
+            f"Starting sampling on {len(bigbio_pages)} pages (lang={getattr(self, 'lang', 'unknown')}, constrained={constrained}, beams={num_beams}, batch_size={batch_size})"
+        )
+        def _progress(
+            iterable, desc: str, total: Optional[int] = None, show: bool = True
+        ):
+            if show:
+                return tqdm(iterable, desc=desc, total=total)
+            return iterable
+        all_outputs = []
+        all_sources = []
+        all_targets = []
+        all_entities_info = []
+        for data in bigbio_pages:
+            sources, targets, entities_info = parse_text(
+                data=data,
+                start_entity=start_entity,
+                end_entity=end_entity,
+                start_group=start_group,
+                end_group=end_group,
+                nlp=nlp,  # type: ignore
+            )
+            all_sources.append(sources)
+            all_targets.append(targets)
+            all_entities_info.append(entities_info)
+        def _build_sequential_batches():
+            # Keep per-page order while still processing multiple pages per batch.
+            page_positions = [0] * len(all_sources)
+            next_page_idx = 0
+            active_pages = []
+            batches = []
+            while active_pages or next_page_idx < len(all_sources):
+                while len(active_pages) < batch_size and next_page_idx < len(
+                    all_sources
+                ):
+                    if len(all_sources[next_page_idx]) > 0:
+                        active_pages.append(next_page_idx)
+                    next_page_idx += 1
+                if not active_pages:
+                    break
+                batch = []
+                next_active_pages = []
+                for page_idx in active_pages:
+                    item_idx = page_positions[page_idx]
+                    batch.append((
+                        all_sources[page_idx][item_idx],
+                        all_targets[page_idx][item_idx],
+                        all_entities_info[page_idx][item_idx],
+                    ))
+                    page_positions[page_idx] += 1
+                    if page_positions[page_idx] < len(all_sources[page_idx]):
+                        next_active_pages.append(page_idx)
+                batches.append(batch)
+                active_pages = next_active_pages
+            return batches
+        all_batches = _build_sequential_batches()
+        print(
+            f"Input preparation completed. Running generation on {len(all_batches)} batches."
+        )
+        all_outputs = []
+        all_saliency_maps = []
+        batch_previous_targets = {}
+        for batch in _progress(
+            all_batches,
+            desc="Processing batches",
+            total=len(all_batches),
+            show=show_progress,
+        ):
+            input_sentences = []
+            sem_groups = []
+            mentions = []
+            doc_ids = []
+            mentions_id = []
+            gold_concept_codes = []
+            gold_concept_names = []
+            start_spans = []
+            end_spans = []
+            for source, target, entity in batch:
+                doc_id = entity["doc_id"]
+                if doc_id not in batch_previous_targets:
+                    batch_previous_targets[doc_id] = ""
+                previous_targets = batch_previous_targets.get(doc_id)
+                input_sentences.append(
+                    add_headers_to_prompt(
+                        source,
+                        target,
+                        previous_targets,  # type: ignore
+                    )
+                )
+                sem_groups.append(entity["semantic_group"])
+                mentions.append(entity["mention"])
+                doc_ids.append(doc_id)
+                mentions_id.append(entity["mention_id"])
+                start_spans.append(entity["start_span"])
+                end_spans.append(entity["end_span"])
+                gold_concept_codes.append(entity.get("gold_concept_code", None))  # type: ignore
+                gold_concept_names.append(entity.get("gold_concept_name", None))  # type: ignore
+            all_outputs, cleaned_output_sequences, batch_saliency_maps = (
+                self.predict_batch(
+                    all_outputs=all_outputs,
+                    batch_size=batch_size,
+                    input_sentences=input_sentences,
+                    sem_groups=sem_groups,
+                    mentions=mentions,
+                    mentions_id=mentions_id,
+                    doc_ids=doc_ids,
+                    start_spans=start_spans,
+                    end_spans=end_spans,
+                    gold_concept_codes=gold_concept_codes,
+                    gold_concept_names=gold_concept_names,
+                    constrained=constrained,
+                    multiple_answers=multiple_answers,
+                    num_beams=num_beams,
+                    explicability_mode=explicability_mode,
+                    **kwargs,
+                )
+            )
+            if explicability_mode:
+                all_saliency_maps.extend(batch_saliency_maps)
+            for i, doc_id in enumerate(doc_ids):
+                clean_sentence = cleaned_output_sequences[num_beams * i]
+                clean_sentence = start_entity + clean_sentence.split(start_entity)[-1]
+                clean_sentence = clean_sentence.rstrip() + "\n"
+                batch_previous_targets[doc_id] += clean_sentence
+        if explicability_mode:
+            return all_outputs, all_saliency_maps  # type: ignore
+        return all_outputs  # type: ignore
+    def encode(self, sentence):
+        return self.tokenizer.encode(sentence, return_tensors="pt")[0]  # type: ignore

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f11122a7fb2e5016088d38ef16605df6e93811b248f39182c1e20b8cff1b7463
+size 4976706864

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43f1682082f23c098f046c60ff58b5c1eb5dd35eac8306601afb75450842eb69
+size 4999802720

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f52fa7dab26ec4615ba2d5b2ed4130173450947dd1c000a9d9739b5063ca2f87
+size 4915916176

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af65da5f6a8ecc159dd34b4d0be4dc8b1d5335432e8b8bea657e70bb6e91b470
+size 1168147000

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,299 @@

+{
+  "metadata": {
+    "total_parameters": 8030269440,
+    "total_size": 16060538880
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1273e28f2c3d3a2e7df4915698b8ac32334b9b1a7b964a7ee6b0b640313a404f
+size 32121333167

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:91a1a9bf984f5d845a3b4eb95d54e9cfc7ed36490e795a1b355975eae9b98700
+size 14645

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed5c2aba955db99894ccedba5e103eb87b693fa40acb652acf91dd1a19aef81b
+size 1465

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,60 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "{",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "}",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<+>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": {
+    "content": "<|begin_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|eot_id|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|eot_id|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

text_to_code.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b95ae9165d70692681902ea91875f9120f94415bbba754fabe6047fafb78bae0
+size 494763280

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11ac3b66638a75d981484ee3713682e63c142ad255bd7cd96d9635ad5e654cdd
+size 17210796

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,2110 @@

+{
+  "added_tokens_decoder": {
+    "58": {
+      "content": "[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "60": {
+      "content": "]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "90": {
+      "content": "{",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "92": {
+      "content": "}",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128000": {
+      "content": "<|begin_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128001": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128002": {
+      "content": "<|reserved_special_token_0|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128003": {
+      "content": "<|reserved_special_token_1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128004": {
+      "content": "<|finetune_right_pad_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128005": {
+      "content": "<|reserved_special_token_2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128006": {
+      "content": "<|start_header_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128007": {
+      "content": "<|end_header_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128008": {
+      "content": "<|eom_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128009": {
+      "content": "<|eot_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128010": {
+      "content": "<|python_tag|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128011": {
+      "content": "<|reserved_special_token_3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128012": {
+      "content": "<|reserved_special_token_4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128013": {
+      "content": "<|reserved_special_token_5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128014": {
+      "content": "<|reserved_special_token_6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128015": {
+      "content": "<|reserved_special_token_7|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128016": {
+      "content": "<|reserved_special_token_8|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128017": {
+      "content": "<|reserved_special_token_9|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128018": {
+      "content": "<|reserved_special_token_10|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128019": {
+      "content": "<|reserved_special_token_11|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128020": {
+      "content": "<|reserved_special_token_12|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128021": {
+      "content": "<|reserved_special_token_13|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128022": {
+      "content": "<|reserved_special_token_14|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128023": {
+      "content": "<|reserved_special_token_15|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128024": {
+      "content": "<|reserved_special_token_16|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128025": {
+      "content": "<|reserved_special_token_17|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128026": {
+      "content": "<|reserved_special_token_18|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128027": {
+      "content": "<|reserved_special_token_19|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128028": {
+      "content": "<|reserved_special_token_20|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128029": {
+      "content": "<|reserved_special_token_21|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128030": {
+      "content": "<|reserved_special_token_22|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128031": {
+      "content": "<|reserved_special_token_23|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128032": {
+      "content": "<|reserved_special_token_24|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128033": {
+      "content": "<|reserved_special_token_25|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128034": {
+      "content": "<|reserved_special_token_26|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128035": {
+      "content": "<|reserved_special_token_27|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128036": {
+      "content": "<|reserved_special_token_28|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128037": {
+      "content": "<|reserved_special_token_29|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128038": {
+      "content": "<|reserved_special_token_30|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128039": {
+      "content": "<|reserved_special_token_31|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128040": {
+      "content": "<|reserved_special_token_32|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128041": {
+      "content": "<|reserved_special_token_33|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128042": {
+      "content": "<|reserved_special_token_34|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128043": {
+      "content": "<|reserved_special_token_35|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128044": {
+      "content": "<|reserved_special_token_36|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128045": {
+      "content": "<|reserved_special_token_37|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128046": {
+      "content": "<|reserved_special_token_38|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128047": {
+      "content": "<|reserved_special_token_39|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128048": {
+      "content": "<|reserved_special_token_40|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128049": {
+      "content": "<|reserved_special_token_41|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128050": {
+      "content": "<|reserved_special_token_42|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128051": {
+      "content": "<|reserved_special_token_43|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128052": {
+      "content": "<|reserved_special_token_44|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128053": {
+      "content": "<|reserved_special_token_45|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128054": {
+      "content": "<|reserved_special_token_46|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128055": {
+      "content": "<|reserved_special_token_47|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128056": {
+      "content": "<|reserved_special_token_48|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128057": {
+      "content": "<|reserved_special_token_49|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128058": {
+      "content": "<|reserved_special_token_50|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128059": {
+      "content": "<|reserved_special_token_51|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128060": {
+      "content": "<|reserved_special_token_52|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128061": {
+      "content": "<|reserved_special_token_53|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128062": {
+      "content": "<|reserved_special_token_54|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128063": {
+      "content": "<|reserved_special_token_55|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128064": {
+      "content": "<|reserved_special_token_56|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128065": {
+      "content": "<|reserved_special_token_57|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128066": {
+      "content": "<|reserved_special_token_58|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128067": {
+      "content": "<|reserved_special_token_59|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128068": {
+      "content": "<|reserved_special_token_60|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128069": {
+      "content": "<|reserved_special_token_61|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128070": {
+      "content": "<|reserved_special_token_62|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128071": {
+      "content": "<|reserved_special_token_63|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128072": {
+      "content": "<|reserved_special_token_64|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128073": {
+      "content": "<|reserved_special_token_65|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128074": {
+      "content": "<|reserved_special_token_66|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128075": {
+      "content": "<|reserved_special_token_67|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128076": {
+      "content": "<|reserved_special_token_68|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128077": {
+      "content": "<|reserved_special_token_69|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128078": {
+      "content": "<|reserved_special_token_70|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128079": {
+      "content": "<|reserved_special_token_71|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128080": {
+      "content": "<|reserved_special_token_72|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128081": {
+      "content": "<|reserved_special_token_73|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128082": {
+      "content": "<|reserved_special_token_74|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128083": {
+      "content": "<|reserved_special_token_75|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128084": {
+      "content": "<|reserved_special_token_76|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128085": {
+      "content": "<|reserved_special_token_77|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128086": {
+      "content": "<|reserved_special_token_78|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128087": {
+      "content": "<|reserved_special_token_79|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128088": {
+      "content": "<|reserved_special_token_80|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128089": {
+      "content": "<|reserved_special_token_81|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128090": {
+      "content": "<|reserved_special_token_82|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128091": {
+      "content": "<|reserved_special_token_83|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128092": {
+      "content": "<|reserved_special_token_84|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128093": {
+      "content": "<|reserved_special_token_85|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128094": {
+      "content": "<|reserved_special_token_86|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128095": {
+      "content": "<|reserved_special_token_87|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128096": {
+      "content": "<|reserved_special_token_88|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128097": {
+      "content": "<|reserved_special_token_89|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128098": {
+      "content": "<|reserved_special_token_90|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128099": {
+      "content": "<|reserved_special_token_91|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128100": {
+      "content": "<|reserved_special_token_92|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128101": {
+      "content": "<|reserved_special_token_93|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128102": {
+      "content": "<|reserved_special_token_94|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128103": {
+      "content": "<|reserved_special_token_95|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128104": {
+      "content": "<|reserved_special_token_96|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128105": {
+      "content": "<|reserved_special_token_97|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128106": {
+      "content": "<|reserved_special_token_98|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128107": {
+      "content": "<|reserved_special_token_99|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128108": {
+      "content": "<|reserved_special_token_100|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128109": {
+      "content": "<|reserved_special_token_101|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128110": {
+      "content": "<|reserved_special_token_102|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128111": {
+      "content": "<|reserved_special_token_103|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128112": {
+      "content": "<|reserved_special_token_104|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128113": {
+      "content": "<|reserved_special_token_105|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128114": {
+      "content": "<|reserved_special_token_106|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128115": {
+      "content": "<|reserved_special_token_107|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128116": {
+      "content": "<|reserved_special_token_108|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128117": {
+      "content": "<|reserved_special_token_109|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128118": {
+      "content": "<|reserved_special_token_110|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128119": {
+      "content": "<|reserved_special_token_111|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128120": {
+      "content": "<|reserved_special_token_112|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128121": {
+      "content": "<|reserved_special_token_113|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128122": {
+      "content": "<|reserved_special_token_114|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128123": {
+      "content": "<|reserved_special_token_115|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128124": {
+      "content": "<|reserved_special_token_116|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128125": {
+      "content": "<|reserved_special_token_117|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128126": {
+      "content": "<|reserved_special_token_118|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128127": {
+      "content": "<|reserved_special_token_119|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128128": {
+      "content": "<|reserved_special_token_120|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128129": {
+      "content": "<|reserved_special_token_121|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128130": {
+      "content": "<|reserved_special_token_122|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128131": {
+      "content": "<|reserved_special_token_123|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128132": {
+      "content": "<|reserved_special_token_124|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128133": {
+      "content": "<|reserved_special_token_125|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128134": {
+      "content": "<|reserved_special_token_126|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128135": {
+      "content": "<|reserved_special_token_127|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128136": {
+      "content": "<|reserved_special_token_128|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128137": {
+      "content": "<|reserved_special_token_129|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128138": {
+      "content": "<|reserved_special_token_130|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128139": {
+      "content": "<|reserved_special_token_131|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128140": {
+      "content": "<|reserved_special_token_132|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128141": {
+      "content": "<|reserved_special_token_133|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128142": {
+      "content": "<|reserved_special_token_134|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128143": {
+      "content": "<|reserved_special_token_135|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128144": {
+      "content": "<|reserved_special_token_136|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128145": {
+      "content": "<|reserved_special_token_137|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128146": {
+      "content": "<|reserved_special_token_138|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128147": {
+      "content": "<|reserved_special_token_139|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128148": {
+      "content": "<|reserved_special_token_140|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128149": {
+      "content": "<|reserved_special_token_141|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128150": {
+      "content": "<|reserved_special_token_142|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128151": {
+      "content": "<|reserved_special_token_143|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128152": {
+      "content": "<|reserved_special_token_144|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128153": {
+      "content": "<|reserved_special_token_145|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128154": {
+      "content": "<|reserved_special_token_146|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128155": {
+      "content": "<|reserved_special_token_147|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128156": {
+      "content": "<|reserved_special_token_148|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128157": {
+      "content": "<|reserved_special_token_149|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128158": {
+      "content": "<|reserved_special_token_150|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128159": {
+      "content": "<|reserved_special_token_151|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128160": {
+      "content": "<|reserved_special_token_152|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128161": {
+      "content": "<|reserved_special_token_153|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128162": {
+      "content": "<|reserved_special_token_154|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128163": {
+      "content": "<|reserved_special_token_155|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128164": {
+      "content": "<|reserved_special_token_156|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128165": {
+      "content": "<|reserved_special_token_157|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128166": {
+      "content": "<|reserved_special_token_158|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128167": {
+      "content": "<|reserved_special_token_159|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128168": {
+      "content": "<|reserved_special_token_160|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128169": {
+      "content": "<|reserved_special_token_161|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128170": {
+      "content": "<|reserved_special_token_162|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128171": {
+      "content": "<|reserved_special_token_163|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128172": {
+      "content": "<|reserved_special_token_164|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128173": {
+      "content": "<|reserved_special_token_165|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128174": {
+      "content": "<|reserved_special_token_166|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128175": {
+      "content": "<|reserved_special_token_167|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128176": {
+      "content": "<|reserved_special_token_168|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128177": {
+      "content": "<|reserved_special_token_169|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128178": {
+      "content": "<|reserved_special_token_170|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128179": {
+      "content": "<|reserved_special_token_171|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128180": {
+      "content": "<|reserved_special_token_172|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128181": {
+      "content": "<|reserved_special_token_173|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128182": {
+      "content": "<|reserved_special_token_174|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128183": {
+      "content": "<|reserved_special_token_175|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128184": {
+      "content": "<|reserved_special_token_176|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128185": {
+      "content": "<|reserved_special_token_177|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128186": {
+      "content": "<|reserved_special_token_178|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128187": {
+      "content": "<|reserved_special_token_179|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128188": {
+      "content": "<|reserved_special_token_180|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128189": {
+      "content": "<|reserved_special_token_181|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128190": {
+      "content": "<|reserved_special_token_182|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128191": {
+      "content": "<|reserved_special_token_183|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128192": {
+      "content": "<|reserved_special_token_184|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128193": {
+      "content": "<|reserved_special_token_185|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128194": {
+      "content": "<|reserved_special_token_186|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128195": {
+      "content": "<|reserved_special_token_187|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128196": {
+      "content": "<|reserved_special_token_188|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128197": {
+      "content": "<|reserved_special_token_189|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128198": {
+      "content": "<|reserved_special_token_190|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128199": {
+      "content": "<|reserved_special_token_191|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128200": {
+      "content": "<|reserved_special_token_192|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128201": {
+      "content": "<|reserved_special_token_193|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128202": {
+      "content": "<|reserved_special_token_194|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128203": {
+      "content": "<|reserved_special_token_195|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128204": {
+      "content": "<|reserved_special_token_196|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128205": {
+      "content": "<|reserved_special_token_197|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128206": {
+      "content": "<|reserved_special_token_198|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128207": {
+      "content": "<|reserved_special_token_199|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128208": {
+      "content": "<|reserved_special_token_200|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128209": {
+      "content": "<|reserved_special_token_201|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128210": {
+      "content": "<|reserved_special_token_202|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128211": {
+      "content": "<|reserved_special_token_203|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128212": {
+      "content": "<|reserved_special_token_204|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128213": {
+      "content": "<|reserved_special_token_205|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128214": {
+      "content": "<|reserved_special_token_206|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128215": {
+      "content": "<|reserved_special_token_207|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128216": {
+      "content": "<|reserved_special_token_208|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128217": {
+      "content": "<|reserved_special_token_209|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128218": {
+      "content": "<|reserved_special_token_210|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128219": {
+      "content": "<|reserved_special_token_211|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128220": {
+      "content": "<|reserved_special_token_212|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128221": {
+      "content": "<|reserved_special_token_213|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128222": {
+      "content": "<|reserved_special_token_214|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128223": {
+      "content": "<|reserved_special_token_215|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128224": {
+      "content": "<|reserved_special_token_216|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128225": {
+      "content": "<|reserved_special_token_217|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128226": {
+      "content": "<|reserved_special_token_218|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128227": {
+      "content": "<|reserved_special_token_219|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128228": {
+      "content": "<|reserved_special_token_220|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128229": {
+      "content": "<|reserved_special_token_221|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128230": {
+      "content": "<|reserved_special_token_222|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128231": {
+      "content": "<|reserved_special_token_223|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128232": {
+      "content": "<|reserved_special_token_224|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128233": {
+      "content": "<|reserved_special_token_225|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128234": {
+      "content": "<|reserved_special_token_226|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128235": {
+      "content": "<|reserved_special_token_227|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128236": {
+      "content": "<|reserved_special_token_228|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128237": {
+      "content": "<|reserved_special_token_229|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128238": {
+      "content": "<|reserved_special_token_230|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128239": {
+      "content": "<|reserved_special_token_231|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128240": {
+      "content": "<|reserved_special_token_232|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128241": {
+      "content": "<|reserved_special_token_233|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128242": {
+      "content": "<|reserved_special_token_234|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128243": {
+      "content": "<|reserved_special_token_235|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128244": {
+      "content": "<|reserved_special_token_236|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128245": {
+      "content": "<|reserved_special_token_237|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128246": {
+      "content": "<|reserved_special_token_238|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128247": {
+      "content": "<|reserved_special_token_239|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128248": {
+      "content": "<|reserved_special_token_240|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128249": {
+      "content": "<|reserved_special_token_241|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128250": {
+      "content": "<|reserved_special_token_242|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128251": {
+      "content": "<|reserved_special_token_243|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128252": {
+      "content": "<|reserved_special_token_244|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128253": {
+      "content": "<|reserved_special_token_245|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128254": {
+      "content": "<|reserved_special_token_246|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128255": {
+      "content": "<|reserved_special_token_247|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128256": {
+      "content": "<+>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "[",
+    "]",
+    "{",
+    "}",
+    "<+>"
+  ],
+  "bos_token": "<|begin_of_text|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|eot_id|>",
+  "extra_special_tokens": {},
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 131072,
+  "pad_token": "<|eot_id|>",
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1234 @@

+{
+  "best_global_step": 7359,
+  "best_metric": 0.8462,
+  "best_model_checkpoint": "models/NED/EMEA_human_only_tfidf_hybrid_long_v2_addheaders/Llama-3.1-8B-Instruct/checkpoint-7359",
+  "epoch": 50.0,
+  "eval_steps": 500,
+  "global_step": 122650,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 1.1526817805758311,
+      "epoch": 1.0,
+      "grad_norm": 304.0,
+      "learning_rate": 1.9989130434782608e-05,
+      "loss": 0.7669,
+      "mean_token_accuracy": 0.8752253057546777,
+      "num_tokens": 15010779.0,
+      "step": 2453
+    },
+    {
+      "epoch": 1.0,
+      "eval_entropy": 1.2358426589232225,
+      "eval_loss": 0.6339517831802368,
+      "eval_mean_token_accuracy": 0.8988095246828519,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 15010779.0,
+      "eval_recall": 0.7308,
+      "eval_runtime": 3.6399,
+      "eval_samples_per_second": 7.143,
+      "eval_steps_per_second": 3.571,
+      "step": 2453
+    },
+    {
+      "entropy": 1.3605892632720036,
+      "epoch": 2.0,
+      "grad_norm": 12.1875,
+      "learning_rate": 2.9691098596284776e-05,
+      "loss": 0.5437,
+      "mean_token_accuracy": 0.9150349811612466,
+      "num_tokens": 30021558.0,
+      "step": 4906
+    },
+    {
+      "epoch": 2.0,
+      "eval_entropy": 1.1509519540346587,
+      "eval_loss": 0.4853871166706085,
+      "eval_mean_token_accuracy": 0.9201437464127173,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 30021558.0,
+      "eval_recall": 0.7692,
+      "eval_runtime": 3.627,
+      "eval_samples_per_second": 7.168,
+      "eval_steps_per_second": 3.584,
+      "step": 4906
+    },
+    {
+      "entropy": 1.1862413553719222,
+      "epoch": 3.0,
+      "grad_norm": 2.1875,
+      "learning_rate": 2.9072539295620746e-05,
+      "loss": 0.2619,
+      "mean_token_accuracy": 0.9548876376794495,
+      "num_tokens": 45032337.0,
+      "step": 7359
+    },
+    {
+      "epoch": 3.0,
+      "eval_entropy": 1.019592651954064,
+      "eval_loss": 0.5770813822746277,
+      "eval_mean_token_accuracy": 0.9220362993387076,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 45032337.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6363,
+      "eval_samples_per_second": 7.15,
+      "eval_steps_per_second": 3.575,
+      "step": 7359
+    },
+    {
+      "entropy": 0.9634018300311497,
+      "epoch": 4.0,
+      "grad_norm": 0.1240234375,
+      "learning_rate": 2.8453979994956713e-05,
+      "loss": 0.1216,
+      "mean_token_accuracy": 0.9782008502466845,
+      "num_tokens": 60043116.0,
+      "step": 9812
+    },
+    {
+      "epoch": 4.0,
+      "eval_entropy": 0.8699520321992728,
+      "eval_loss": 0.5446107387542725,
+      "eval_mean_token_accuracy": 0.940018314581651,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 60043116.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6143,
+      "eval_samples_per_second": 7.194,
+      "eval_steps_per_second": 3.597,
+      "step": 9812
+    },
+    {
+      "entropy": 0.7849812429144681,
+      "epoch": 5.0,
+      "grad_norm": 0.002227783203125,
+      "learning_rate": 2.783542069429268e-05,
+      "loss": 0.0517,
+      "mean_token_accuracy": 0.9894482943411997,
+      "num_tokens": 75053895.0,
+      "step": 12265
+    },
+    {
+      "epoch": 5.0,
+      "eval_entropy": 0.6801113898937519,
+      "eval_loss": 0.7289856672286987,
+      "eval_mean_token_accuracy": 0.9444444454633273,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 75053895.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6486,
+      "eval_samples_per_second": 7.126,
+      "eval_steps_per_second": 3.563,
+      "step": 12265
+    },
+    {
+      "entropy": 0.6892432886826181,
+      "epoch": 6.0,
+      "grad_norm": 0.0004749298095703125,
+      "learning_rate": 2.721686139362865e-05,
+      "loss": 0.0209,
+      "mean_token_accuracy": 0.9958273216359138,
+      "num_tokens": 90064674.0,
+      "step": 14718
+    },
+    {
+      "epoch": 6.0,
+      "eval_entropy": 0.577189931502709,
+      "eval_loss": 0.7246649265289307,
+      "eval_mean_token_accuracy": 0.9444444454633273,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 90064674.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6456,
+      "eval_samples_per_second": 7.132,
+      "eval_steps_per_second": 3.566,
+      "step": 14718
+    },
+    {
+      "entropy": 0.6557439389371696,
+      "epoch": 7.0,
+      "grad_norm": 0.000888824462890625,
+      "learning_rate": 2.659830209296461e-05,
+      "loss": 0.0078,
+      "mean_token_accuracy": 0.9979321826393635,
+      "num_tokens": 105075453.0,
+      "step": 17171
+    },
+    {
+      "epoch": 7.0,
+      "eval_entropy": 0.5603500146132249,
+      "eval_loss": 0.8045116662979126,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 105075453.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 5.557,
+      "eval_samples_per_second": 4.679,
+      "eval_steps_per_second": 2.339,
+      "step": 17171
+    },
+    {
+      "entropy": 0.6481096161976669,
+      "epoch": 8.0,
+      "grad_norm": 8.96453857421875e-05,
+      "learning_rate": 2.597974279230058e-05,
+      "loss": 0.0028,
+      "mean_token_accuracy": 0.9993061645391568,
+      "num_tokens": 120086232.0,
+      "step": 19624
+    },
+    {
+      "epoch": 8.0,
+      "eval_entropy": 0.5650725089586698,
+      "eval_loss": 0.8335245847702026,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 120086232.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6391,
+      "eval_samples_per_second": 7.145,
+      "eval_steps_per_second": 3.572,
+      "step": 19624
+    },
+    {
+      "entropy": 0.6384989822756452,
+      "epoch": 9.0,
+      "grad_norm": 0.00102996826171875,
+      "learning_rate": 2.5361183491636548e-05,
+      "loss": 0.0011,
+      "mean_token_accuracy": 0.9997574686275129,
+      "num_tokens": 135097011.0,
+      "step": 22077
+    },
+    {
+      "epoch": 9.0,
+      "eval_entropy": 0.5437194108963013,
+      "eval_loss": 0.8720409870147705,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 135097011.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6696,
+      "eval_samples_per_second": 7.085,
+      "eval_steps_per_second": 3.543,
+      "step": 22077
+    },
+    {
+      "entropy": 0.6327040182586792,
+      "epoch": 10.0,
+      "grad_norm": 0.00011968612670898438,
+      "learning_rate": 2.4742624190972517e-05,
+      "loss": 0.0002,
+      "mean_token_accuracy": 0.9999592335818012,
+      "num_tokens": 150107790.0,
+      "step": 24530
+    },
+    {
+      "epoch": 10.0,
+      "eval_entropy": 0.5456434029799241,
+      "eval_loss": 0.8786986470222473,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 150107790.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7213,
+      "eval_samples_per_second": 6.987,
+      "eval_steps_per_second": 3.493,
+      "step": 24530
+    },
+    {
+      "entropy": 0.6342776355527636,
+      "epoch": 11.0,
+      "grad_norm": 2.9206275939941406e-05,
+      "learning_rate": 2.412406489030848e-05,
+      "loss": 0.0001,
+      "mean_token_accuracy": 0.9999629396397,
+      "num_tokens": 165118569.0,
+      "step": 26983
+    },
+    {
+      "epoch": 11.0,
+      "eval_entropy": 0.5441241906239436,
+      "eval_loss": 0.8776129484176636,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 165118569.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6285,
+      "eval_samples_per_second": 7.165,
+      "eval_steps_per_second": 3.583,
+      "step": 26983
+    },
+    {
+      "entropy": 0.6330991076222742,
+      "epoch": 12.0,
+      "grad_norm": 0.000823974609375,
+      "learning_rate": 2.350550558964445e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 180129348.0,
+      "step": 29436
+    },
+    {
+      "epoch": 12.0,
+      "eval_entropy": 0.544509245799138,
+      "eval_loss": 0.88084477186203,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 180129348.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6661,
+      "eval_samples_per_second": 7.092,
+      "eval_steps_per_second": 3.546,
+      "step": 29436
+    },
+    {
+      "entropy": 0.6322705759061291,
+      "epoch": 13.0,
+      "grad_norm": 0.010498046875,
+      "learning_rate": 2.2886946288980416e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 195140127.0,
+      "step": 31889
+    },
+    {
+      "epoch": 13.0,
+      "eval_entropy": 0.5434356606923617,
+      "eval_loss": 0.8842343091964722,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 195140127.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 4.1268,
+      "eval_samples_per_second": 6.3,
+      "eval_steps_per_second": 3.15,
+      "step": 31889
+    },
+    {
+      "entropy": 0.6316640121908612,
+      "epoch": 14.0,
+      "grad_norm": 0.0035552978515625,
+      "learning_rate": 2.2268386988316383e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 210150906.0,
+      "step": 34342
+    },
+    {
+      "epoch": 14.0,
+      "eval_entropy": 0.543243577847114,
+      "eval_loss": 0.885927140712738,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 210150906.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7188,
+      "eval_samples_per_second": 6.991,
+      "eval_steps_per_second": 3.496,
+      "step": 34342
+    },
+    {
+      "entropy": 0.6321596540070241,
+      "epoch": 15.0,
+      "grad_norm": 2.4199485778808594e-05,
+      "learning_rate": 2.164982768765235e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 225161685.0,
+      "step": 36795
+    },
+    {
+      "epoch": 15.0,
+      "eval_entropy": 0.5422769280580374,
+      "eval_loss": 0.8823052644729614,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 225161685.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6723,
+      "eval_samples_per_second": 7.08,
+      "eval_steps_per_second": 3.54,
+      "step": 36795
+    },
+    {
+      "entropy": 0.6315903761194426,
+      "epoch": 16.0,
+      "grad_norm": 0.0291748046875,
+      "learning_rate": 2.1031268386988316e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 240172464.0,
+      "step": 39248
+    },
+    {
+      "epoch": 16.0,
+      "eval_entropy": 0.5426660546889672,
+      "eval_loss": 0.8869765996932983,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 240172464.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6896,
+      "eval_samples_per_second": 7.047,
+      "eval_steps_per_second": 3.523,
+      "step": 39248
+    },
+    {
+      "entropy": 0.6317922561279472,
+      "epoch": 17.0,
+      "grad_norm": 0.0001850128173828125,
+      "learning_rate": 2.0412709086324285e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 255183243.0,
+      "step": 41701
+    },
+    {
+      "epoch": 17.0,
+      "eval_entropy": 0.542809899036701,
+      "eval_loss": 0.8864607214927673,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 255183243.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6498,
+      "eval_samples_per_second": 7.124,
+      "eval_steps_per_second": 3.562,
+      "step": 41701
+    },
+    {
+      "entropy": 0.6319634849034763,
+      "epoch": 18.0,
+      "grad_norm": 2.1457672119140625e-05,
+      "learning_rate": 1.979414978566025e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 270194022.0,
+      "step": 44154
+    },
+    {
+      "epoch": 18.0,
+      "eval_entropy": 0.5426488243616544,
+      "eval_loss": 0.8861849308013916,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 270194022.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6568,
+      "eval_samples_per_second": 7.11,
+      "eval_steps_per_second": 3.555,
+      "step": 44154
+    },
+    {
+      "entropy": 0.631338802688325,
+      "epoch": 19.0,
+      "grad_norm": 4.076957702636719e-05,
+      "learning_rate": 1.9175590484996218e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 285204801.0,
+      "step": 46607
+    },
+    {
+      "epoch": 19.0,
+      "eval_entropy": 0.5423762339812058,
+      "eval_loss": 0.885791540145874,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 285204801.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.653,
+      "eval_samples_per_second": 7.118,
+      "eval_steps_per_second": 3.559,
+      "step": 46607
+    },
+    {
+      "entropy": 0.6311312203036976,
+      "epoch": 20.0,
+      "grad_norm": 0.0004634857177734375,
+      "learning_rate": 1.8557031184332184e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 300215580.0,
+      "step": 49060
+    },
+    {
+      "epoch": 20.0,
+      "eval_entropy": 0.5424229686076825,
+      "eval_loss": 0.8889456987380981,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 300215580.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.651,
+      "eval_samples_per_second": 7.121,
+      "eval_steps_per_second": 3.561,
+      "step": 49060
+    },
+    {
+      "entropy": 0.631198678741249,
+      "epoch": 21.0,
+      "grad_norm": 0.00031280517578125,
+      "learning_rate": 1.793847188366815e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 315226359.0,
+      "step": 51513
+    },
+    {
+      "epoch": 21.0,
+      "eval_entropy": 0.5428222968028142,
+      "eval_loss": 0.8843169808387756,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 315226359.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6619,
+      "eval_samples_per_second": 7.1,
+      "eval_steps_per_second": 3.55,
+      "step": 51513
+    },
+    {
+      "entropy": 0.6313406728478388,
+      "epoch": 22.0,
+      "grad_norm": 0.000759124755859375,
+      "learning_rate": 1.731991258300412e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 330237138.0,
+      "step": 53966
+    },
+    {
+      "epoch": 22.0,
+      "eval_entropy": 0.5427144765853882,
+      "eval_loss": 0.8861469030380249,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 330237138.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6544,
+      "eval_samples_per_second": 7.115,
+      "eval_steps_per_second": 3.557,
+      "step": 53966
+    },
+    {
+      "entropy": 0.6313331465647263,
+      "epoch": 23.0,
+      "grad_norm": 0.00051116943359375,
+      "learning_rate": 1.6701353282340083e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 345247917.0,
+      "step": 56419
+    },
+    {
+      "epoch": 23.0,
+      "eval_entropy": 0.5423137545585632,
+      "eval_loss": 0.8892049193382263,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 345247917.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6537,
+      "eval_samples_per_second": 7.116,
+      "eval_steps_per_second": 3.558,
+      "step": 56419
+    },
+    {
+      "entropy": 0.6310314053401527,
+      "epoch": 24.0,
+      "grad_norm": 3.600120544433594e-05,
+      "learning_rate": 1.6082793981676053e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 360258696.0,
+      "step": 58872
+    },
+    {
+      "epoch": 24.0,
+      "eval_entropy": 0.5423843631377587,
+      "eval_loss": 0.8886714577674866,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 360258696.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6316,
+      "eval_samples_per_second": 7.159,
+      "eval_steps_per_second": 3.58,
+      "step": 58872
+    },
+    {
+      "entropy": 0.6315073234496484,
+      "epoch": 25.0,
+      "grad_norm": 7.82012939453125e-05,
+      "learning_rate": 1.546423468101202e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 375269475.0,
+      "step": 61325
+    },
+    {
+      "epoch": 25.0,
+      "eval_entropy": 0.5420686419193561,
+      "eval_loss": 0.8865240812301636,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 375269475.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.613,
+      "eval_samples_per_second": 7.196,
+      "eval_steps_per_second": 3.598,
+      "step": 61325
+    },
+    {
+      "entropy": 0.632054461467718,
+      "epoch": 26.0,
+      "grad_norm": 0.00024318695068359375,
+      "learning_rate": 1.4845675380347987e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 15010779.0,
+      "step": 63778
+    },
+    {
+      "epoch": 26.0,
+      "eval_entropy": 0.5426568893285898,
+      "eval_loss": 0.88667893409729,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 15010779.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.647,
+      "eval_samples_per_second": 7.129,
+      "eval_steps_per_second": 3.565,
+      "step": 63778
+    },
+    {
+      "entropy": 0.6314872418356777,
+      "epoch": 27.0,
+      "grad_norm": 0.00011396408081054688,
+      "learning_rate": 1.4227116079683954e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 30021558.0,
+      "step": 66231
+    },
+    {
+      "epoch": 27.0,
+      "eval_entropy": 0.5423887417866633,
+      "eval_loss": 0.8907365798950195,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 30021558.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6242,
+      "eval_samples_per_second": 7.174,
+      "eval_steps_per_second": 3.587,
+      "step": 66231
+    },
+    {
+      "entropy": 0.6317801613055392,
+      "epoch": 28.0,
+      "grad_norm": 8.392333984375e-05,
+      "learning_rate": 1.3608556779019922e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 45032337.0,
+      "step": 68684
+    },
+    {
+      "epoch": 28.0,
+      "eval_entropy": 0.5428364735383254,
+      "eval_loss": 0.885719358921051,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 45032337.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6828,
+      "eval_samples_per_second": 7.06,
+      "eval_steps_per_second": 3.53,
+      "step": 68684
+    },
+    {
+      "entropy": 0.6310389586555389,
+      "epoch": 29.0,
+      "grad_norm": 0.000774383544921875,
+      "learning_rate": 1.2989997478355888e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 60043116.0,
+      "step": 71137
+    },
+    {
+      "epoch": 29.0,
+      "eval_entropy": 0.5424722524789664,
+      "eval_loss": 0.8864960074424744,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 60043116.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6359,
+      "eval_samples_per_second": 7.151,
+      "eval_steps_per_second": 3.576,
+      "step": 71137
+    },
+    {
+      "entropy": 0.6310345640461444,
+      "epoch": 30.0,
+      "grad_norm": 3.5762786865234375e-05,
+      "learning_rate": 1.2371438177691856e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 75053895.0,
+      "step": 73590
+    },
+    {
+      "epoch": 30.0,
+      "eval_entropy": 0.5427528161268967,
+      "eval_loss": 0.8871183395385742,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 75053895.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6648,
+      "eval_samples_per_second": 7.095,
+      "eval_steps_per_second": 3.547,
+      "step": 73590
+    },
+    {
+      "entropy": 0.6307261824680745,
+      "epoch": 31.0,
+      "grad_norm": 0.00015163421630859375,
+      "learning_rate": 1.1752878877027823e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 90064674.0,
+      "step": 76043
+    },
+    {
+      "epoch": 31.0,
+      "eval_entropy": 0.5423439878683823,
+      "eval_loss": 0.890313982963562,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 90064674.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6589,
+      "eval_samples_per_second": 7.106,
+      "eval_steps_per_second": 3.553,
+      "step": 76043
+    },
+    {
+      "entropy": 0.6317850742056279,
+      "epoch": 32.0,
+      "grad_norm": 0.0005035400390625,
+      "learning_rate": 1.113431957636379e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 105075453.0,
+      "step": 78496
+    },
+    {
+      "epoch": 32.0,
+      "eval_entropy": 0.5422184283916767,
+      "eval_loss": 0.8882402181625366,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 105075453.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6075,
+      "eval_samples_per_second": 7.207,
+      "eval_steps_per_second": 3.604,
+      "step": 78496
+    },
+    {
+      "entropy": 0.6315069926961121,
+      "epoch": 33.0,
+      "grad_norm": 0.0079345703125,
+      "learning_rate": 1.0515760275699757e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 120086232.0,
+      "step": 80949
+    },
+    {
+      "epoch": 33.0,
+      "eval_entropy": 0.5428683024186355,
+      "eval_loss": 0.8859032988548279,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 120086232.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6537,
+      "eval_samples_per_second": 7.116,
+      "eval_steps_per_second": 3.558,
+      "step": 80949
+    },
+    {
+      "entropy": 0.6313212784246381,
+      "epoch": 34.0,
+      "grad_norm": 0.000885009765625,
+      "learning_rate": 9.897200975035723e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 135097011.0,
+      "step": 83402
+    },
+    {
+      "epoch": 34.0,
+      "eval_entropy": 0.5425068598527175,
+      "eval_loss": 0.887780487537384,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 135097011.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6448,
+      "eval_samples_per_second": 7.133,
+      "eval_steps_per_second": 3.567,
+      "step": 83402
+    },
+    {
+      "entropy": 0.6308202771352254,
+      "epoch": 35.0,
+      "grad_norm": 0.00032806396484375,
+      "learning_rate": 9.27864167437169e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 150107790.0,
+      "step": 85855
+    },
+    {
+      "epoch": 35.0,
+      "eval_entropy": 0.54246619114509,
+      "eval_loss": 0.8900800347328186,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 150107790.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6253,
+      "eval_samples_per_second": 7.172,
+      "eval_steps_per_second": 3.586,
+      "step": 85855
+    },
+    {
+      "entropy": 0.6310893858737767,
+      "epoch": 36.0,
+      "grad_norm": 0.00543212890625,
+      "learning_rate": 8.660082373707658e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 165118569.0,
+      "step": 88308
+    },
+    {
+      "epoch": 36.0,
+      "eval_entropy": 0.542354785479032,
+      "eval_loss": 0.882867157459259,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 165118569.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6309,
+      "eval_samples_per_second": 7.161,
+      "eval_steps_per_second": 3.58,
+      "step": 88308
+    },
+    {
+      "entropy": 0.6313383878492308,
+      "epoch": 37.0,
+      "grad_norm": 0.0014495849609375,
+      "learning_rate": 8.041523073043624e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 180129348.0,
+      "step": 90761
+    },
+    {
+      "epoch": 37.0,
+      "eval_entropy": 0.5429406670423654,
+      "eval_loss": 0.8894430994987488,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 180129348.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6047,
+      "eval_samples_per_second": 7.213,
+      "eval_steps_per_second": 3.606,
+      "step": 90761
+    },
+    {
+      "entropy": 0.6315074832012738,
+      "epoch": 38.0,
+      "grad_norm": 1.8477439880371094e-05,
+      "learning_rate": 7.422963772379592e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 195140127.0,
+      "step": 93214
+    },
+    {
+      "epoch": 38.0,
+      "eval_entropy": 0.5428708929281968,
+      "eval_loss": 0.8853751420974731,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 195140127.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6095,
+      "eval_samples_per_second": 7.203,
+      "eval_steps_per_second": 3.602,
+      "step": 93214
+    },
+    {
+      "entropy": 0.6316086658156264,
+      "epoch": 39.0,
+      "grad_norm": 0.0019378662109375,
+      "learning_rate": 6.804404471715559e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 210150906.0,
+      "step": 95667
+    },
+    {
+      "epoch": 39.0,
+      "eval_entropy": 0.5423155472828791,
+      "eval_loss": 0.8865050673484802,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 210150906.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6105,
+      "eval_samples_per_second": 7.201,
+      "eval_steps_per_second": 3.601,
+      "step": 95667
+    },
+    {
+      "entropy": 0.6319762418161253,
+      "epoch": 40.0,
+      "grad_norm": 0.0076904296875,
+      "learning_rate": 6.185845171051526e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 225161685.0,
+      "step": 98120
+    },
+    {
+      "epoch": 40.0,
+      "eval_entropy": 0.5423448315033546,
+      "eval_loss": 0.887237012386322,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 225161685.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6062,
+      "eval_samples_per_second": 7.21,
+      "eval_steps_per_second": 3.605,
+      "step": 98120
+    },
+    {
+      "entropy": 0.6316094772090632,
+      "epoch": 41.0,
+      "grad_norm": 0.00040435791015625,
+      "learning_rate": 5.567285870387493e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 240172464.0,
+      "step": 100573
+    },
+    {
+      "epoch": 41.0,
+      "eval_entropy": 0.5424330555475675,
+      "eval_loss": 0.8862788081169128,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 240172464.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6042,
+      "eval_samples_per_second": 7.214,
+      "eval_steps_per_second": 3.607,
+      "step": 100573
+    },
+    {
+      "entropy": 0.6310035889118581,
+      "epoch": 42.0,
+      "grad_norm": 0.0020294189453125,
+      "learning_rate": 4.94872656972346e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 255183243.0,
+      "step": 103026
+    },
+    {
+      "epoch": 42.0,
+      "eval_entropy": 0.5431472292313209,
+      "eval_loss": 0.890018105506897,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 255183243.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6041,
+      "eval_samples_per_second": 7.214,
+      "eval_steps_per_second": 3.607,
+      "step": 103026
+    },
+    {
+      "entropy": 0.6312229550229838,
+      "epoch": 43.0,
+      "grad_norm": 0.0012969970703125,
+      "learning_rate": 4.330167269059427e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 270194022.0,
+      "step": 105479
+    },
+    {
+      "epoch": 43.0,
+      "eval_entropy": 0.5424636235603919,
+      "eval_loss": 0.8868480324745178,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 270194022.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.606,
+      "eval_samples_per_second": 7.21,
+      "eval_steps_per_second": 3.605,
+      "step": 105479
+    },
+    {
+      "entropy": 0.631434175660063,
+      "epoch": 44.0,
+      "grad_norm": 7.390975952148438e-05,
+      "learning_rate": 3.711607968395394e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 285204801.0,
+      "step": 107932
+    },
+    {
+      "epoch": 44.0,
+      "eval_entropy": 0.5421680899766775,
+      "eval_loss": 0.8860384821891785,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 285204801.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6344,
+      "eval_samples_per_second": 7.154,
+      "eval_steps_per_second": 3.577,
+      "step": 107932
+    },
+    {
+      "entropy": 0.6307510763127319,
+      "epoch": 45.0,
+      "grad_norm": 0.00927734375,
+      "learning_rate": 3.0930486677313608e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 300215580.0,
+      "step": 110385
+    },
+    {
+      "epoch": 45.0,
+      "eval_entropy": 0.54229736328125,
+      "eval_loss": 0.8853968977928162,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 300215580.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.61,
+      "eval_samples_per_second": 7.202,
+      "eval_steps_per_second": 3.601,
+      "step": 110385
+    },
+    {
+      "entropy": 0.6315490893937595,
+      "epoch": 46.0,
+      "grad_norm": 0.0001239776611328125,
+      "learning_rate": 2.474489367067328e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 315226359.0,
+      "step": 112838
+    },
+    {
+      "epoch": 46.0,
+      "eval_entropy": 0.5422170620698196,
+      "eval_loss": 0.8882192373275757,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 315226359.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7084,
+      "eval_samples_per_second": 7.011,
+      "eval_steps_per_second": 3.506,
+      "step": 112838
+    },
+    {
+      "entropy": 0.6317317981380761,
+      "epoch": 47.0,
+      "grad_norm": 3.3855438232421875e-05,
+      "learning_rate": 1.855930066403295e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 330237138.0,
+      "step": 115291
+    },
+    {
+      "epoch": 47.0,
+      "eval_entropy": 0.5427549022894639,
+      "eval_loss": 0.8879793882369995,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 330237138.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.6923,
+      "eval_samples_per_second": 7.042,
+      "eval_steps_per_second": 3.521,
+      "step": 115291
+    },
+    {
+      "entropy": 0.6314135375092869,
+      "epoch": 48.0,
+      "grad_norm": 0.0025634765625,
+      "learning_rate": 1.2373707657392621e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 345247917.0,
+      "step": 117744
+    },
+    {
+      "epoch": 48.0,
+      "eval_entropy": 0.5423269546948947,
+      "eval_loss": 0.887828528881073,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 345247917.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.661,
+      "eval_samples_per_second": 7.102,
+      "eval_steps_per_second": 3.551,
+      "step": 117744
+    },
+    {
+      "entropy": 0.6317788491650499,
+      "epoch": 49.0,
+      "grad_norm": 0.0015106201171875,
+      "learning_rate": 6.18811465075229e-07,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 360258696.0,
+      "step": 120197
+    },
+    {
+      "epoch": 49.0,
+      "eval_entropy": 0.5421000031324533,
+      "eval_loss": 0.886226236820221,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 360258696.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.8724,
+      "eval_samples_per_second": 6.714,
+      "eval_steps_per_second": 3.357,
+      "step": 120197
+    },
+    {
+      "entropy": 0.6307675256881722,
+      "epoch": 50.0,
+      "grad_norm": 0.0003414154052734375,
+      "learning_rate": 2.5216441119609984e-10,
+      "loss": 0.0,
+      "mean_token_accuracy": 1.0,
+      "num_tokens": 375269475.0,
+      "step": 122650
+    },
+    {
+      "epoch": 50.0,
+      "eval_entropy": 0.5427401478473957,
+      "eval_loss": 0.888108491897583,
+      "eval_mean_token_accuracy": 0.9358974374257601,
+      "eval_num_gold": 26,
+      "eval_num_guess": 26,
+      "eval_num_tokens": 375269475.0,
+      "eval_recall": 0.8462,
+      "eval_runtime": 3.7116,
+      "eval_samples_per_second": 7.005,
+      "eval_steps_per_second": 3.503,
+      "step": 122650
+    }
+  ],
+  "logging_steps": 0,
+  "max_steps": 122650,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 50,
+  "save_steps": 0,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.3796448253168845e+19,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18eb0c3939b0bb4035391490d7998e62734714a4aadf05a7ddf2f612a76980ce
+size 6289