Image-Text-to-Text
Transformers
Safetensors
German
ocr
vision-language
lightonocr
document-understanding
german
shorthand
manuscript
medieval
conversational
Instructions to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="wjbmattingly/LightOnOCR-2-1B-german-shorthand-line") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
- SGLang
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with Docker Model Runner:
docker model run hf.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
| { | |
| "best_global_step": 9200, | |
| "best_metric": 1.370642900466919, | |
| "best_model_checkpoint": "output/LightOnOCR-ft-german-shorthand/checkpoint-4500", | |
| "epoch": 10.0, | |
| "eval_steps": 50, | |
| "global_step": 11930, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.041911148365465216, | |
| "grad_norm": 1.0981535911560059, | |
| "learning_rate": 5.980369127516779e-05, | |
| "loss": 4.5902841186523435, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.041911148365465216, | |
| "eval_loss": 4.062541961669922, | |
| "eval_runtime": 1.3804, | |
| "eval_samples_per_second": 72.441, | |
| "eval_steps_per_second": 12.315, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.08382229673093043, | |
| "grad_norm": 1.5730311870574951, | |
| "learning_rate": 5.9552013422818794e-05, | |
| "loss": 3.8132562255859375, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.08382229673093043, | |
| "eval_loss": 3.7036218643188477, | |
| "eval_runtime": 1.3274, | |
| "eval_samples_per_second": 75.333, | |
| "eval_steps_per_second": 12.807, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.12573344509639564, | |
| "grad_norm": 1.8268605470657349, | |
| "learning_rate": 5.93003355704698e-05, | |
| "loss": 3.486875305175781, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.12573344509639564, | |
| "eval_loss": 3.5074617862701416, | |
| "eval_runtime": 1.3313, | |
| "eval_samples_per_second": 75.117, | |
| "eval_steps_per_second": 12.77, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.16764459346186086, | |
| "grad_norm": 2.342888116836548, | |
| "learning_rate": 5.9048657718120806e-05, | |
| "loss": 3.3502740478515625, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.16764459346186086, | |
| "eval_loss": 3.3255791664123535, | |
| "eval_runtime": 1.3281, | |
| "eval_samples_per_second": 75.295, | |
| "eval_steps_per_second": 12.8, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.20955574182732606, | |
| "grad_norm": 2.9285762310028076, | |
| "learning_rate": 5.879697986577182e-05, | |
| "loss": 3.1367117309570314, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.20955574182732606, | |
| "eval_loss": 3.119488477706909, | |
| "eval_runtime": 1.3486, | |
| "eval_samples_per_second": 74.152, | |
| "eval_steps_per_second": 12.606, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.2514668901927913, | |
| "grad_norm": 3.7957241535186768, | |
| "learning_rate": 5.854530201342282e-05, | |
| "loss": 2.935438232421875, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.2514668901927913, | |
| "eval_loss": 2.941737174987793, | |
| "eval_runtime": 1.3173, | |
| "eval_samples_per_second": 75.91, | |
| "eval_steps_per_second": 12.905, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.2933780385582565, | |
| "grad_norm": 4.438571929931641, | |
| "learning_rate": 5.829362416107383e-05, | |
| "loss": 2.775052185058594, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.2933780385582565, | |
| "eval_loss": 2.7863004207611084, | |
| "eval_runtime": 1.318, | |
| "eval_samples_per_second": 75.873, | |
| "eval_steps_per_second": 12.898, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.3352891869237217, | |
| "grad_norm": 4.838512897491455, | |
| "learning_rate": 5.804194630872483e-05, | |
| "loss": 2.6266168212890624, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.3352891869237217, | |
| "eval_loss": 2.6623446941375732, | |
| "eval_runtime": 1.3196, | |
| "eval_samples_per_second": 75.781, | |
| "eval_steps_per_second": 12.883, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.37720033528918695, | |
| "grad_norm": 5.87675666809082, | |
| "learning_rate": 5.779026845637584e-05, | |
| "loss": 2.500618896484375, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.37720033528918695, | |
| "eval_loss": 2.5778284072875977, | |
| "eval_runtime": 1.3249, | |
| "eval_samples_per_second": 75.479, | |
| "eval_steps_per_second": 12.832, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.4191114836546521, | |
| "grad_norm": 5.890925884246826, | |
| "learning_rate": 5.753859060402684e-05, | |
| "loss": 2.4262014770507814, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.4191114836546521, | |
| "eval_loss": 2.4782729148864746, | |
| "eval_runtime": 1.3205, | |
| "eval_samples_per_second": 75.727, | |
| "eval_steps_per_second": 12.874, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.46102263202011734, | |
| "grad_norm": 7.327637195587158, | |
| "learning_rate": 5.7286912751677856e-05, | |
| "loss": 2.34708740234375, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.46102263202011734, | |
| "eval_loss": 2.417339563369751, | |
| "eval_runtime": 1.3195, | |
| "eval_samples_per_second": 75.788, | |
| "eval_steps_per_second": 12.884, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.5029337803855826, | |
| "grad_norm": 6.353279113769531, | |
| "learning_rate": 5.703523489932886e-05, | |
| "loss": 2.266437530517578, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.5029337803855826, | |
| "eval_loss": 2.375779390335083, | |
| "eval_runtime": 1.3204, | |
| "eval_samples_per_second": 75.735, | |
| "eval_steps_per_second": 12.875, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.5448449287510477, | |
| "grad_norm": 5.457998752593994, | |
| "learning_rate": 5.678355704697987e-05, | |
| "loss": 2.2782778930664063, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.5448449287510477, | |
| "eval_loss": 2.344698905944824, | |
| "eval_runtime": 1.3229, | |
| "eval_samples_per_second": 75.589, | |
| "eval_steps_per_second": 12.85, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.586756077116513, | |
| "grad_norm": 8.852900505065918, | |
| "learning_rate": 5.6531879194630874e-05, | |
| "loss": 2.1273974609375, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.586756077116513, | |
| "eval_loss": 2.2804863452911377, | |
| "eval_runtime": 1.321, | |
| "eval_samples_per_second": 75.699, | |
| "eval_steps_per_second": 12.869, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.6286672254819782, | |
| "grad_norm": 6.826542854309082, | |
| "learning_rate": 5.628020134228188e-05, | |
| "loss": 2.181948699951172, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.6286672254819782, | |
| "eval_loss": 2.224287509918213, | |
| "eval_runtime": 1.319, | |
| "eval_samples_per_second": 75.814, | |
| "eval_steps_per_second": 12.888, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.6705783738474435, | |
| "grad_norm": 8.29145336151123, | |
| "learning_rate": 5.6028523489932886e-05, | |
| "loss": 2.101322021484375, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.6705783738474435, | |
| "eval_loss": 2.193394899368286, | |
| "eval_runtime": 1.3212, | |
| "eval_samples_per_second": 75.687, | |
| "eval_steps_per_second": 12.867, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.7124895222129086, | |
| "grad_norm": 7.018554210662842, | |
| "learning_rate": 5.577684563758389e-05, | |
| "loss": 2.078760681152344, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.7124895222129086, | |
| "eval_loss": 2.1545662879943848, | |
| "eval_runtime": 1.3171, | |
| "eval_samples_per_second": 75.923, | |
| "eval_steps_per_second": 12.907, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.7544006705783739, | |
| "grad_norm": 8.150788307189941, | |
| "learning_rate": 5.55251677852349e-05, | |
| "loss": 2.0556121826171876, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.7544006705783739, | |
| "eval_loss": 2.1176488399505615, | |
| "eval_runtime": 1.3202, | |
| "eval_samples_per_second": 75.748, | |
| "eval_steps_per_second": 12.877, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.7963118189438391, | |
| "grad_norm": 6.901854038238525, | |
| "learning_rate": 5.527348993288591e-05, | |
| "loss": 1.9831507873535157, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.7963118189438391, | |
| "eval_loss": 2.099759340286255, | |
| "eval_runtime": 1.3138, | |
| "eval_samples_per_second": 76.114, | |
| "eval_steps_per_second": 12.939, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.8382229673093042, | |
| "grad_norm": 9.57073974609375, | |
| "learning_rate": 5.502181208053691e-05, | |
| "loss": 1.9548426818847657, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.8382229673093042, | |
| "eval_loss": 2.063068389892578, | |
| "eval_runtime": 1.3164, | |
| "eval_samples_per_second": 75.964, | |
| "eval_steps_per_second": 12.914, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.8801341156747695, | |
| "grad_norm": 9.207406997680664, | |
| "learning_rate": 5.4770134228187924e-05, | |
| "loss": 1.934165496826172, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 0.8801341156747695, | |
| "eval_loss": 2.0627496242523193, | |
| "eval_runtime": 1.3096, | |
| "eval_samples_per_second": 76.361, | |
| "eval_steps_per_second": 12.981, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 0.9220452640402347, | |
| "grad_norm": 6.910806179046631, | |
| "learning_rate": 5.451845637583892e-05, | |
| "loss": 1.8687205505371094, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 0.9220452640402347, | |
| "eval_loss": 2.0254294872283936, | |
| "eval_runtime": 1.3089, | |
| "eval_samples_per_second": 76.402, | |
| "eval_steps_per_second": 12.988, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 0.9639564124057, | |
| "grad_norm": 6.514822483062744, | |
| "learning_rate": 5.4266778523489936e-05, | |
| "loss": 1.8899638366699218, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 0.9639564124057, | |
| "eval_loss": 2.0025417804718018, | |
| "eval_runtime": 1.3123, | |
| "eval_samples_per_second": 76.201, | |
| "eval_steps_per_second": 12.954, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 1.0058675607711651, | |
| "grad_norm": 7.338857173919678, | |
| "learning_rate": 5.4015100671140935e-05, | |
| "loss": 1.8701951599121094, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 1.0058675607711651, | |
| "eval_loss": 1.974115014076233, | |
| "eval_runtime": 1.3106, | |
| "eval_samples_per_second": 76.3, | |
| "eval_steps_per_second": 12.971, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 1.0477787091366304, | |
| "grad_norm": 6.799989700317383, | |
| "learning_rate": 5.376342281879195e-05, | |
| "loss": 1.7634503173828124, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 1.0477787091366304, | |
| "eval_loss": 1.947939395904541, | |
| "eval_runtime": 1.3106, | |
| "eval_samples_per_second": 76.301, | |
| "eval_steps_per_second": 12.971, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 1.0896898575020955, | |
| "grad_norm": 7.16735315322876, | |
| "learning_rate": 5.3511744966442955e-05, | |
| "loss": 1.7611782836914063, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 1.0896898575020955, | |
| "eval_loss": 1.9540085792541504, | |
| "eval_runtime": 1.3072, | |
| "eval_samples_per_second": 76.497, | |
| "eval_steps_per_second": 13.004, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 1.1316010058675607, | |
| "grad_norm": 7.926932334899902, | |
| "learning_rate": 5.326006711409396e-05, | |
| "loss": 1.7159942626953124, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 1.1316010058675607, | |
| "eval_loss": 1.9392883777618408, | |
| "eval_runtime": 1.3204, | |
| "eval_samples_per_second": 75.734, | |
| "eval_steps_per_second": 12.875, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 1.173512154233026, | |
| "grad_norm": 8.545011520385742, | |
| "learning_rate": 5.300838926174497e-05, | |
| "loss": 1.7063587951660155, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 1.173512154233026, | |
| "eval_loss": 1.9075002670288086, | |
| "eval_runtime": 1.3071, | |
| "eval_samples_per_second": 76.504, | |
| "eval_steps_per_second": 13.006, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 1.2154233025984913, | |
| "grad_norm": 8.624767303466797, | |
| "learning_rate": 5.275671140939597e-05, | |
| "loss": 1.7171022033691405, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 1.2154233025984913, | |
| "eval_loss": 1.9038127660751343, | |
| "eval_runtime": 1.3097, | |
| "eval_samples_per_second": 76.351, | |
| "eval_steps_per_second": 12.98, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 1.2573344509639564, | |
| "grad_norm": 7.939345359802246, | |
| "learning_rate": 5.250503355704698e-05, | |
| "loss": 1.743599395751953, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 1.2573344509639564, | |
| "eval_loss": 1.8788491487503052, | |
| "eval_runtime": 1.3136, | |
| "eval_samples_per_second": 76.126, | |
| "eval_steps_per_second": 12.941, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 1.2992455993294216, | |
| "grad_norm": 8.108811378479004, | |
| "learning_rate": 5.2253355704697985e-05, | |
| "loss": 1.6337568664550781, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 1.2992455993294216, | |
| "eval_loss": 1.8694273233413696, | |
| "eval_runtime": 1.3083, | |
| "eval_samples_per_second": 76.436, | |
| "eval_steps_per_second": 12.994, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 1.341156747694887, | |
| "grad_norm": 7.847909927368164, | |
| "learning_rate": 5.200167785234899e-05, | |
| "loss": 1.6766596984863282, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 1.341156747694887, | |
| "eval_loss": 1.8617148399353027, | |
| "eval_runtime": 1.3154, | |
| "eval_samples_per_second": 76.022, | |
| "eval_steps_per_second": 12.924, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 1.3830678960603522, | |
| "grad_norm": 7.955574989318848, | |
| "learning_rate": 5.1750000000000004e-05, | |
| "loss": 1.6566036987304686, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 1.3830678960603522, | |
| "eval_loss": 1.8520632982254028, | |
| "eval_runtime": 1.3116, | |
| "eval_samples_per_second": 76.241, | |
| "eval_steps_per_second": 12.961, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 1.4249790444258172, | |
| "grad_norm": 8.108285903930664, | |
| "learning_rate": 5.1498322147651004e-05, | |
| "loss": 1.6372518920898438, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 1.4249790444258172, | |
| "eval_loss": 1.8362972736358643, | |
| "eval_runtime": 1.3059, | |
| "eval_samples_per_second": 76.573, | |
| "eval_steps_per_second": 13.017, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 1.4668901927912825, | |
| "grad_norm": 7.391090393066406, | |
| "learning_rate": 5.1246644295302017e-05, | |
| "loss": 1.6237742614746093, | |
| "step": 1750 | |
| }, | |
| { | |
| "epoch": 1.4668901927912825, | |
| "eval_loss": 1.8265368938446045, | |
| "eval_runtime": 1.3072, | |
| "eval_samples_per_second": 76.499, | |
| "eval_steps_per_second": 13.005, | |
| "step": 1750 | |
| }, | |
| { | |
| "epoch": 1.5088013411567478, | |
| "grad_norm": 7.923129558563232, | |
| "learning_rate": 5.099496644295302e-05, | |
| "loss": 1.6319300842285156, | |
| "step": 1800 | |
| }, | |
| { | |
| "epoch": 1.5088013411567478, | |
| "eval_loss": 1.8251959085464478, | |
| "eval_runtime": 1.3119, | |
| "eval_samples_per_second": 76.223, | |
| "eval_steps_per_second": 12.958, | |
| "step": 1800 | |
| }, | |
| { | |
| "epoch": 1.5507124895222129, | |
| "grad_norm": 9.290380477905273, | |
| "learning_rate": 5.074328859060403e-05, | |
| "loss": 1.6264842224121094, | |
| "step": 1850 | |
| }, | |
| { | |
| "epoch": 1.5507124895222129, | |
| "eval_loss": 1.804435133934021, | |
| "eval_runtime": 1.3094, | |
| "eval_samples_per_second": 76.373, | |
| "eval_steps_per_second": 12.983, | |
| "step": 1850 | |
| }, | |
| { | |
| "epoch": 1.5926236378876781, | |
| "grad_norm": 8.859464645385742, | |
| "learning_rate": 5.0491610738255035e-05, | |
| "loss": 1.6374754333496093, | |
| "step": 1900 | |
| }, | |
| { | |
| "epoch": 1.5926236378876781, | |
| "eval_loss": 1.7807632684707642, | |
| "eval_runtime": 1.3136, | |
| "eval_samples_per_second": 76.129, | |
| "eval_steps_per_second": 12.942, | |
| "step": 1900 | |
| }, | |
| { | |
| "epoch": 1.6345347862531434, | |
| "grad_norm": 7.5553107261657715, | |
| "learning_rate": 5.023993288590604e-05, | |
| "loss": 1.5756221008300781, | |
| "step": 1950 | |
| }, | |
| { | |
| "epoch": 1.6345347862531434, | |
| "eval_loss": 1.7838184833526611, | |
| "eval_runtime": 1.3104, | |
| "eval_samples_per_second": 76.314, | |
| "eval_steps_per_second": 12.973, | |
| "step": 1950 | |
| }, | |
| { | |
| "epoch": 1.6764459346186085, | |
| "grad_norm": 7.127007961273193, | |
| "learning_rate": 4.9988255033557054e-05, | |
| "loss": 1.6108526611328124, | |
| "step": 2000 | |
| }, | |
| { | |
| "epoch": 1.6764459346186085, | |
| "eval_loss": 1.7690205574035645, | |
| "eval_runtime": 1.3123, | |
| "eval_samples_per_second": 76.2, | |
| "eval_steps_per_second": 12.954, | |
| "step": 2000 | |
| }, | |
| { | |
| "epoch": 1.7183570829840737, | |
| "grad_norm": 6.814781188964844, | |
| "learning_rate": 4.9736577181208053e-05, | |
| "loss": 1.5662849426269532, | |
| "step": 2050 | |
| }, | |
| { | |
| "epoch": 1.7183570829840737, | |
| "eval_loss": 1.7615545988082886, | |
| "eval_runtime": 1.3075, | |
| "eval_samples_per_second": 76.482, | |
| "eval_steps_per_second": 13.002, | |
| "step": 2050 | |
| }, | |
| { | |
| "epoch": 1.760268231349539, | |
| "grad_norm": 7.434536933898926, | |
| "learning_rate": 4.9484899328859066e-05, | |
| "loss": 1.6154570007324218, | |
| "step": 2100 | |
| }, | |
| { | |
| "epoch": 1.760268231349539, | |
| "eval_loss": 1.753432035446167, | |
| "eval_runtime": 1.3177, | |
| "eval_samples_per_second": 75.889, | |
| "eval_steps_per_second": 12.901, | |
| "step": 2100 | |
| }, | |
| { | |
| "epoch": 1.802179379715004, | |
| "grad_norm": 7.67100191116333, | |
| "learning_rate": 4.9233221476510066e-05, | |
| "loss": 1.5270895385742187, | |
| "step": 2150 | |
| }, | |
| { | |
| "epoch": 1.802179379715004, | |
| "eval_loss": 1.737377643585205, | |
| "eval_runtime": 1.3105, | |
| "eval_samples_per_second": 76.305, | |
| "eval_steps_per_second": 12.972, | |
| "step": 2150 | |
| }, | |
| { | |
| "epoch": 1.8440905280804694, | |
| "grad_norm": 8.817289352416992, | |
| "learning_rate": 4.898154362416108e-05, | |
| "loss": 1.5799215698242188, | |
| "step": 2200 | |
| }, | |
| { | |
| "epoch": 1.8440905280804694, | |
| "eval_loss": 1.738153338432312, | |
| "eval_runtime": 1.3164, | |
| "eval_samples_per_second": 75.967, | |
| "eval_steps_per_second": 12.914, | |
| "step": 2200 | |
| }, | |
| { | |
| "epoch": 1.8860016764459346, | |
| "grad_norm": 7.042121410369873, | |
| "learning_rate": 4.872986577181208e-05, | |
| "loss": 1.5948490905761719, | |
| "step": 2250 | |
| }, | |
| { | |
| "epoch": 1.8860016764459346, | |
| "eval_loss": 1.7221215963363647, | |
| "eval_runtime": 1.328, | |
| "eval_samples_per_second": 75.301, | |
| "eval_steps_per_second": 12.801, | |
| "step": 2250 | |
| }, | |
| { | |
| "epoch": 1.9279128248113997, | |
| "grad_norm": 7.423919677734375, | |
| "learning_rate": 4.847818791946309e-05, | |
| "loss": 1.5773471069335938, | |
| "step": 2300 | |
| }, | |
| { | |
| "epoch": 1.9279128248113997, | |
| "eval_loss": 1.7105484008789062, | |
| "eval_runtime": 1.321, | |
| "eval_samples_per_second": 75.701, | |
| "eval_steps_per_second": 12.869, | |
| "step": 2300 | |
| }, | |
| { | |
| "epoch": 1.9698239731768652, | |
| "grad_norm": 7.331556797027588, | |
| "learning_rate": 4.82265100671141e-05, | |
| "loss": 1.589197235107422, | |
| "step": 2350 | |
| }, | |
| { | |
| "epoch": 1.9698239731768652, | |
| "eval_loss": 1.6980195045471191, | |
| "eval_runtime": 1.3234, | |
| "eval_samples_per_second": 75.566, | |
| "eval_steps_per_second": 12.846, | |
| "step": 2350 | |
| }, | |
| { | |
| "epoch": 2.0117351215423303, | |
| "grad_norm": 8.534239768981934, | |
| "learning_rate": 4.79748322147651e-05, | |
| "loss": 1.49150390625, | |
| "step": 2400 | |
| }, | |
| { | |
| "epoch": 2.0117351215423303, | |
| "eval_loss": 1.6935464143753052, | |
| "eval_runtime": 1.3234, | |
| "eval_samples_per_second": 75.561, | |
| "eval_steps_per_second": 12.845, | |
| "step": 2400 | |
| }, | |
| { | |
| "epoch": 2.0536462699077953, | |
| "grad_norm": 8.918983459472656, | |
| "learning_rate": 4.772315436241611e-05, | |
| "loss": 1.4109703063964845, | |
| "step": 2450 | |
| }, | |
| { | |
| "epoch": 2.0536462699077953, | |
| "eval_loss": 1.682075023651123, | |
| "eval_runtime": 1.3195, | |
| "eval_samples_per_second": 75.784, | |
| "eval_steps_per_second": 12.883, | |
| "step": 2450 | |
| }, | |
| { | |
| "epoch": 2.095557418273261, | |
| "grad_norm": 8.450674057006836, | |
| "learning_rate": 4.7471476510067115e-05, | |
| "loss": 1.4147891235351562, | |
| "step": 2500 | |
| }, | |
| { | |
| "epoch": 2.095557418273261, | |
| "eval_loss": 1.6843719482421875, | |
| "eval_runtime": 1.3207, | |
| "eval_samples_per_second": 75.719, | |
| "eval_steps_per_second": 12.872, | |
| "step": 2500 | |
| }, | |
| { | |
| "epoch": 2.137468566638726, | |
| "grad_norm": 8.314532279968262, | |
| "learning_rate": 4.721979865771812e-05, | |
| "loss": 1.3551103210449218, | |
| "step": 2550 | |
| }, | |
| { | |
| "epoch": 2.137468566638726, | |
| "eval_loss": 1.6835081577301025, | |
| "eval_runtime": 1.3197, | |
| "eval_samples_per_second": 75.776, | |
| "eval_steps_per_second": 12.882, | |
| "step": 2550 | |
| }, | |
| { | |
| "epoch": 2.179379715004191, | |
| "grad_norm": 8.086017608642578, | |
| "learning_rate": 4.696812080536913e-05, | |
| "loss": 1.3754513549804688, | |
| "step": 2600 | |
| }, | |
| { | |
| "epoch": 2.179379715004191, | |
| "eval_loss": 1.6632437705993652, | |
| "eval_runtime": 1.3259, | |
| "eval_samples_per_second": 75.419, | |
| "eval_steps_per_second": 12.821, | |
| "step": 2600 | |
| }, | |
| { | |
| "epoch": 2.2212908633696564, | |
| "grad_norm": 7.914801120758057, | |
| "learning_rate": 4.6716442953020134e-05, | |
| "loss": 1.4001995849609374, | |
| "step": 2650 | |
| }, | |
| { | |
| "epoch": 2.2212908633696564, | |
| "eval_loss": 1.6767840385437012, | |
| "eval_runtime": 1.3211, | |
| "eval_samples_per_second": 75.693, | |
| "eval_steps_per_second": 12.868, | |
| "step": 2650 | |
| }, | |
| { | |
| "epoch": 2.2632020117351215, | |
| "grad_norm": 8.075952529907227, | |
| "learning_rate": 4.646476510067115e-05, | |
| "loss": 1.397560577392578, | |
| "step": 2700 | |
| }, | |
| { | |
| "epoch": 2.2632020117351215, | |
| "eval_loss": 1.6516294479370117, | |
| "eval_runtime": 1.3202, | |
| "eval_samples_per_second": 75.747, | |
| "eval_steps_per_second": 12.877, | |
| "step": 2700 | |
| }, | |
| { | |
| "epoch": 2.3051131601005865, | |
| "grad_norm": 7.579532623291016, | |
| "learning_rate": 4.6213087248322146e-05, | |
| "loss": 1.4046427917480468, | |
| "step": 2750 | |
| }, | |
| { | |
| "epoch": 2.3051131601005865, | |
| "eval_loss": 1.6551226377487183, | |
| "eval_runtime": 1.323, | |
| "eval_samples_per_second": 75.587, | |
| "eval_steps_per_second": 12.85, | |
| "step": 2750 | |
| }, | |
| { | |
| "epoch": 2.347024308466052, | |
| "grad_norm": 8.964472770690918, | |
| "learning_rate": 4.596140939597316e-05, | |
| "loss": 1.3893455505371093, | |
| "step": 2800 | |
| }, | |
| { | |
| "epoch": 2.347024308466052, | |
| "eval_loss": 1.6423428058624268, | |
| "eval_runtime": 1.32, | |
| "eval_samples_per_second": 75.757, | |
| "eval_steps_per_second": 12.879, | |
| "step": 2800 | |
| }, | |
| { | |
| "epoch": 2.388935456831517, | |
| "grad_norm": 10.200695991516113, | |
| "learning_rate": 4.570973154362416e-05, | |
| "loss": 1.4229560852050782, | |
| "step": 2850 | |
| }, | |
| { | |
| "epoch": 2.388935456831517, | |
| "eval_loss": 1.6394565105438232, | |
| "eval_runtime": 1.334, | |
| "eval_samples_per_second": 74.964, | |
| "eval_steps_per_second": 12.744, | |
| "step": 2850 | |
| }, | |
| { | |
| "epoch": 2.4308466051969826, | |
| "grad_norm": 8.970279693603516, | |
| "learning_rate": 4.545805369127517e-05, | |
| "loss": 1.3607884216308594, | |
| "step": 2900 | |
| }, | |
| { | |
| "epoch": 2.4308466051969826, | |
| "eval_loss": 1.6436243057250977, | |
| "eval_runtime": 1.3264, | |
| "eval_samples_per_second": 75.39, | |
| "eval_steps_per_second": 12.816, | |
| "step": 2900 | |
| }, | |
| { | |
| "epoch": 2.4727577535624476, | |
| "grad_norm": 7.988678455352783, | |
| "learning_rate": 4.520637583892617e-05, | |
| "loss": 1.401595458984375, | |
| "step": 2950 | |
| }, | |
| { | |
| "epoch": 2.4727577535624476, | |
| "eval_loss": 1.6337294578552246, | |
| "eval_runtime": 1.3203, | |
| "eval_samples_per_second": 75.738, | |
| "eval_steps_per_second": 12.875, | |
| "step": 2950 | |
| }, | |
| { | |
| "epoch": 2.5146689019279127, | |
| "grad_norm": 6.7093119621276855, | |
| "learning_rate": 4.4954697986577184e-05, | |
| "loss": 1.3668931579589845, | |
| "step": 3000 | |
| }, | |
| { | |
| "epoch": 2.5146689019279127, | |
| "eval_loss": 1.6250532865524292, | |
| "eval_runtime": 1.3184, | |
| "eval_samples_per_second": 75.85, | |
| "eval_steps_per_second": 12.895, | |
| "step": 3000 | |
| }, | |
| { | |
| "epoch": 2.556580050293378, | |
| "grad_norm": 10.19338607788086, | |
| "learning_rate": 4.470302013422819e-05, | |
| "loss": 1.3743846130371093, | |
| "step": 3050 | |
| }, | |
| { | |
| "epoch": 2.556580050293378, | |
| "eval_loss": 1.614687204360962, | |
| "eval_runtime": 1.3266, | |
| "eval_samples_per_second": 75.378, | |
| "eval_steps_per_second": 12.814, | |
| "step": 3050 | |
| }, | |
| { | |
| "epoch": 2.5984911986588433, | |
| "grad_norm": 8.771378517150879, | |
| "learning_rate": 4.4451342281879196e-05, | |
| "loss": 1.3890554809570312, | |
| "step": 3100 | |
| }, | |
| { | |
| "epoch": 2.5984911986588433, | |
| "eval_loss": 1.6118069887161255, | |
| "eval_runtime": 1.3238, | |
| "eval_samples_per_second": 75.541, | |
| "eval_steps_per_second": 12.842, | |
| "step": 3100 | |
| }, | |
| { | |
| "epoch": 2.6404023470243083, | |
| "grad_norm": 7.261502742767334, | |
| "learning_rate": 4.41996644295302e-05, | |
| "loss": 1.366646728515625, | |
| "step": 3150 | |
| }, | |
| { | |
| "epoch": 2.6404023470243083, | |
| "eval_loss": 1.6043864488601685, | |
| "eval_runtime": 1.323, | |
| "eval_samples_per_second": 75.587, | |
| "eval_steps_per_second": 12.85, | |
| "step": 3150 | |
| }, | |
| { | |
| "epoch": 2.682313495389774, | |
| "grad_norm": 7.476693153381348, | |
| "learning_rate": 4.394798657718121e-05, | |
| "loss": 1.3473231506347656, | |
| "step": 3200 | |
| }, | |
| { | |
| "epoch": 2.682313495389774, | |
| "eval_loss": 1.5811587572097778, | |
| "eval_runtime": 1.3063, | |
| "eval_samples_per_second": 76.553, | |
| "eval_steps_per_second": 13.014, | |
| "step": 3200 | |
| }, | |
| { | |
| "epoch": 2.724224643755239, | |
| "grad_norm": 8.328020095825195, | |
| "learning_rate": 4.3696308724832214e-05, | |
| "loss": 1.354296417236328, | |
| "step": 3250 | |
| }, | |
| { | |
| "epoch": 2.724224643755239, | |
| "eval_loss": 1.60099458694458, | |
| "eval_runtime": 1.3194, | |
| "eval_samples_per_second": 75.791, | |
| "eval_steps_per_second": 12.884, | |
| "step": 3250 | |
| }, | |
| { | |
| "epoch": 2.7661357921207044, | |
| "grad_norm": 7.663002967834473, | |
| "learning_rate": 4.344463087248322e-05, | |
| "loss": 1.3879621887207032, | |
| "step": 3300 | |
| }, | |
| { | |
| "epoch": 2.7661357921207044, | |
| "eval_loss": 1.5888242721557617, | |
| "eval_runtime": 1.3364, | |
| "eval_samples_per_second": 74.826, | |
| "eval_steps_per_second": 12.72, | |
| "step": 3300 | |
| }, | |
| { | |
| "epoch": 2.8080469404861694, | |
| "grad_norm": 9.024152755737305, | |
| "learning_rate": 4.319295302013423e-05, | |
| "loss": 1.368860321044922, | |
| "step": 3350 | |
| }, | |
| { | |
| "epoch": 2.8080469404861694, | |
| "eval_loss": 1.5736557245254517, | |
| "eval_runtime": 1.3912, | |
| "eval_samples_per_second": 71.88, | |
| "eval_steps_per_second": 12.22, | |
| "step": 3350 | |
| }, | |
| { | |
| "epoch": 2.8499580888516345, | |
| "grad_norm": 8.656745910644531, | |
| "learning_rate": 4.294127516778524e-05, | |
| "loss": 1.366145477294922, | |
| "step": 3400 | |
| }, | |
| { | |
| "epoch": 2.8499580888516345, | |
| "eval_loss": 1.5826553106307983, | |
| "eval_runtime": 1.3081, | |
| "eval_samples_per_second": 76.444, | |
| "eval_steps_per_second": 12.996, | |
| "step": 3400 | |
| }, | |
| { | |
| "epoch": 2.8918692372171, | |
| "grad_norm": 8.006067276000977, | |
| "learning_rate": 4.268959731543624e-05, | |
| "loss": 1.323511505126953, | |
| "step": 3450 | |
| }, | |
| { | |
| "epoch": 2.8918692372171, | |
| "eval_loss": 1.5641191005706787, | |
| "eval_runtime": 1.3071, | |
| "eval_samples_per_second": 76.505, | |
| "eval_steps_per_second": 13.006, | |
| "step": 3450 | |
| }, | |
| { | |
| "epoch": 2.933780385582565, | |
| "grad_norm": 8.63364028930664, | |
| "learning_rate": 4.243791946308725e-05, | |
| "loss": 1.34541015625, | |
| "step": 3500 | |
| }, | |
| { | |
| "epoch": 2.933780385582565, | |
| "eval_loss": 1.5582342147827148, | |
| "eval_runtime": 1.3089, | |
| "eval_samples_per_second": 76.401, | |
| "eval_steps_per_second": 12.988, | |
| "step": 3500 | |
| }, | |
| { | |
| "epoch": 2.97569153394803, | |
| "grad_norm": 8.497447967529297, | |
| "learning_rate": 4.218624161073825e-05, | |
| "loss": 1.3619944763183593, | |
| "step": 3550 | |
| }, | |
| { | |
| "epoch": 2.97569153394803, | |
| "eval_loss": 1.5561535358428955, | |
| "eval_runtime": 1.3139, | |
| "eval_samples_per_second": 76.11, | |
| "eval_steps_per_second": 12.939, | |
| "step": 3550 | |
| }, | |
| { | |
| "epoch": 3.0176026823134956, | |
| "grad_norm": 8.443196296691895, | |
| "learning_rate": 4.1934563758389264e-05, | |
| "loss": 1.2809111022949218, | |
| "step": 3600 | |
| }, | |
| { | |
| "epoch": 3.0176026823134956, | |
| "eval_loss": 1.5703850984573364, | |
| "eval_runtime": 1.3113, | |
| "eval_samples_per_second": 76.262, | |
| "eval_steps_per_second": 12.965, | |
| "step": 3600 | |
| }, | |
| { | |
| "epoch": 3.0595138306789607, | |
| "grad_norm": 8.75338363647461, | |
| "learning_rate": 4.1682885906040264e-05, | |
| "loss": 1.2329022216796874, | |
| "step": 3650 | |
| }, | |
| { | |
| "epoch": 3.0595138306789607, | |
| "eval_loss": 1.5693306922912598, | |
| "eval_runtime": 1.3103, | |
| "eval_samples_per_second": 76.319, | |
| "eval_steps_per_second": 12.974, | |
| "step": 3650 | |
| }, | |
| { | |
| "epoch": 3.1014249790444257, | |
| "grad_norm": 8.877315521240234, | |
| "learning_rate": 4.1431208053691276e-05, | |
| "loss": 1.2346041107177734, | |
| "step": 3700 | |
| }, | |
| { | |
| "epoch": 3.1014249790444257, | |
| "eval_loss": 1.5618982315063477, | |
| "eval_runtime": 1.3155, | |
| "eval_samples_per_second": 76.019, | |
| "eval_steps_per_second": 12.923, | |
| "step": 3700 | |
| }, | |
| { | |
| "epoch": 3.143336127409891, | |
| "grad_norm": 7.173608303070068, | |
| "learning_rate": 4.117953020134228e-05, | |
| "loss": 1.2359423828125, | |
| "step": 3750 | |
| }, | |
| { | |
| "epoch": 3.143336127409891, | |
| "eval_loss": 1.5642772912979126, | |
| "eval_runtime": 1.3133, | |
| "eval_samples_per_second": 76.145, | |
| "eval_steps_per_second": 12.945, | |
| "step": 3750 | |
| }, | |
| { | |
| "epoch": 3.1852472757753563, | |
| "grad_norm": 9.734856605529785, | |
| "learning_rate": 4.092785234899329e-05, | |
| "loss": 1.2488408660888672, | |
| "step": 3800 | |
| }, | |
| { | |
| "epoch": 3.1852472757753563, | |
| "eval_loss": 1.5560795068740845, | |
| "eval_runtime": 1.3079, | |
| "eval_samples_per_second": 76.456, | |
| "eval_steps_per_second": 12.997, | |
| "step": 3800 | |
| }, | |
| { | |
| "epoch": 3.2271584241408213, | |
| "grad_norm": 9.255692481994629, | |
| "learning_rate": 4.06761744966443e-05, | |
| "loss": 1.227440643310547, | |
| "step": 3850 | |
| }, | |
| { | |
| "epoch": 3.2271584241408213, | |
| "eval_loss": 1.5615720748901367, | |
| "eval_runtime": 1.3165, | |
| "eval_samples_per_second": 75.959, | |
| "eval_steps_per_second": 12.913, | |
| "step": 3850 | |
| }, | |
| { | |
| "epoch": 3.269069572506287, | |
| "grad_norm": 8.585817337036133, | |
| "learning_rate": 4.04244966442953e-05, | |
| "loss": 1.201407241821289, | |
| "step": 3900 | |
| }, | |
| { | |
| "epoch": 3.269069572506287, | |
| "eval_loss": 1.5547152757644653, | |
| "eval_runtime": 1.3129, | |
| "eval_samples_per_second": 76.165, | |
| "eval_steps_per_second": 12.948, | |
| "step": 3900 | |
| }, | |
| { | |
| "epoch": 3.310980720871752, | |
| "grad_norm": 8.844286918640137, | |
| "learning_rate": 4.0172818791946314e-05, | |
| "loss": 1.2329248046875, | |
| "step": 3950 | |
| }, | |
| { | |
| "epoch": 3.310980720871752, | |
| "eval_loss": 1.5513211488723755, | |
| "eval_runtime": 1.3181, | |
| "eval_samples_per_second": 75.869, | |
| "eval_steps_per_second": 12.898, | |
| "step": 3950 | |
| }, | |
| { | |
| "epoch": 3.352891869237217, | |
| "grad_norm": 7.8395676612854, | |
| "learning_rate": 3.992114093959731e-05, | |
| "loss": 1.2457215118408203, | |
| "step": 4000 | |
| }, | |
| { | |
| "epoch": 3.352891869237217, | |
| "eval_loss": 1.5339856147766113, | |
| "eval_runtime": 1.3069, | |
| "eval_samples_per_second": 76.518, | |
| "eval_steps_per_second": 13.008, | |
| "step": 4000 | |
| }, | |
| { | |
| "epoch": 3.3948030176026824, | |
| "grad_norm": 7.250827312469482, | |
| "learning_rate": 3.9669463087248326e-05, | |
| "loss": 1.2182432556152343, | |
| "step": 4050 | |
| }, | |
| { | |
| "epoch": 3.3948030176026824, | |
| "eval_loss": 1.528326153755188, | |
| "eval_runtime": 1.3064, | |
| "eval_samples_per_second": 76.547, | |
| "eval_steps_per_second": 13.013, | |
| "step": 4050 | |
| }, | |
| { | |
| "epoch": 3.4367141659681475, | |
| "grad_norm": 8.507390975952148, | |
| "learning_rate": 3.941778523489933e-05, | |
| "loss": 1.1763590240478516, | |
| "step": 4100 | |
| }, | |
| { | |
| "epoch": 3.4367141659681475, | |
| "eval_loss": 1.537937879562378, | |
| "eval_runtime": 1.3058, | |
| "eval_samples_per_second": 76.584, | |
| "eval_steps_per_second": 13.019, | |
| "step": 4100 | |
| }, | |
| { | |
| "epoch": 3.4786253143336126, | |
| "grad_norm": 8.171996116638184, | |
| "learning_rate": 3.916610738255034e-05, | |
| "loss": 1.2010728454589843, | |
| "step": 4150 | |
| }, | |
| { | |
| "epoch": 3.4786253143336126, | |
| "eval_loss": 1.530064344406128, | |
| "eval_runtime": 1.3085, | |
| "eval_samples_per_second": 76.425, | |
| "eval_steps_per_second": 12.992, | |
| "step": 4150 | |
| }, | |
| { | |
| "epoch": 3.520536462699078, | |
| "grad_norm": 7.582270622253418, | |
| "learning_rate": 3.8914429530201345e-05, | |
| "loss": 1.1953775787353516, | |
| "step": 4200 | |
| }, | |
| { | |
| "epoch": 3.520536462699078, | |
| "eval_loss": 1.5104107856750488, | |
| "eval_runtime": 1.308, | |
| "eval_samples_per_second": 76.453, | |
| "eval_steps_per_second": 12.997, | |
| "step": 4200 | |
| }, | |
| { | |
| "epoch": 3.562447611064543, | |
| "grad_norm": 6.7094807624816895, | |
| "learning_rate": 3.866275167785235e-05, | |
| "loss": 1.1888941192626954, | |
| "step": 4250 | |
| }, | |
| { | |
| "epoch": 3.562447611064543, | |
| "eval_loss": 1.51458740234375, | |
| "eval_runtime": 1.3142, | |
| "eval_samples_per_second": 76.095, | |
| "eval_steps_per_second": 12.936, | |
| "step": 4250 | |
| }, | |
| { | |
| "epoch": 3.604358759430008, | |
| "grad_norm": 8.12093448638916, | |
| "learning_rate": 3.841107382550336e-05, | |
| "loss": 1.2248832702636718, | |
| "step": 4300 | |
| }, | |
| { | |
| "epoch": 3.604358759430008, | |
| "eval_loss": 1.511207938194275, | |
| "eval_runtime": 1.3073, | |
| "eval_samples_per_second": 76.496, | |
| "eval_steps_per_second": 13.004, | |
| "step": 4300 | |
| }, | |
| { | |
| "epoch": 3.6462699077954737, | |
| "grad_norm": 8.503448486328125, | |
| "learning_rate": 3.815939597315436e-05, | |
| "loss": 1.2137846374511718, | |
| "step": 4350 | |
| }, | |
| { | |
| "epoch": 3.6462699077954737, | |
| "eval_loss": 1.495904803276062, | |
| "eval_runtime": 1.3129, | |
| "eval_samples_per_second": 76.165, | |
| "eval_steps_per_second": 12.948, | |
| "step": 4350 | |
| }, | |
| { | |
| "epoch": 3.6881810561609387, | |
| "grad_norm": 8.030956268310547, | |
| "learning_rate": 3.790771812080537e-05, | |
| "loss": 1.2259557342529297, | |
| "step": 4400 | |
| }, | |
| { | |
| "epoch": 3.6881810561609387, | |
| "eval_loss": 1.4991346597671509, | |
| "eval_runtime": 1.3147, | |
| "eval_samples_per_second": 76.061, | |
| "eval_steps_per_second": 12.93, | |
| "step": 4400 | |
| }, | |
| { | |
| "epoch": 3.7300922045264038, | |
| "grad_norm": 8.265947341918945, | |
| "learning_rate": 3.765604026845638e-05, | |
| "loss": 1.2022461700439453, | |
| "step": 4450 | |
| }, | |
| { | |
| "epoch": 3.7300922045264038, | |
| "eval_loss": 1.5012933015823364, | |
| "eval_runtime": 1.307, | |
| "eval_samples_per_second": 76.509, | |
| "eval_steps_per_second": 13.007, | |
| "step": 4450 | |
| }, | |
| { | |
| "epoch": 3.7720033528918693, | |
| "grad_norm": 9.090805053710938, | |
| "learning_rate": 3.740436241610738e-05, | |
| "loss": 1.200530014038086, | |
| "step": 4500 | |
| }, | |
| { | |
| "epoch": 3.7720033528918693, | |
| "eval_loss": 1.4842772483825684, | |
| "eval_runtime": 1.317, | |
| "eval_samples_per_second": 75.932, | |
| "eval_steps_per_second": 12.909, | |
| "step": 4500 | |
| }, | |
| { | |
| "epoch": 3.8139145012573343, | |
| "grad_norm": 7.245903491973877, | |
| "learning_rate": 3.7152684563758394e-05, | |
| "loss": 1.1915716552734374, | |
| "step": 4550 | |
| }, | |
| { | |
| "epoch": 3.8139145012573343, | |
| "eval_loss": 1.4761041402816772, | |
| "eval_runtime": 1.3108, | |
| "eval_samples_per_second": 76.287, | |
| "eval_steps_per_second": 12.969, | |
| "step": 4550 | |
| }, | |
| { | |
| "epoch": 3.8558256496227994, | |
| "grad_norm": 8.151508331298828, | |
| "learning_rate": 3.6901006711409394e-05, | |
| "loss": 1.1981431579589843, | |
| "step": 4600 | |
| }, | |
| { | |
| "epoch": 3.8558256496227994, | |
| "eval_loss": 1.4900405406951904, | |
| "eval_runtime": 1.3069, | |
| "eval_samples_per_second": 76.52, | |
| "eval_steps_per_second": 13.008, | |
| "step": 4600 | |
| }, | |
| { | |
| "epoch": 3.897736797988265, | |
| "grad_norm": 7.736331462860107, | |
| "learning_rate": 3.664932885906041e-05, | |
| "loss": 1.2032262420654296, | |
| "step": 4650 | |
| }, | |
| { | |
| "epoch": 3.897736797988265, | |
| "eval_loss": 1.4922600984573364, | |
| "eval_runtime": 1.3152, | |
| "eval_samples_per_second": 76.036, | |
| "eval_steps_per_second": 12.926, | |
| "step": 4650 | |
| }, | |
| { | |
| "epoch": 3.93964794635373, | |
| "grad_norm": 7.156012058258057, | |
| "learning_rate": 3.6397651006711406e-05, | |
| "loss": 1.1885392761230469, | |
| "step": 4700 | |
| }, | |
| { | |
| "epoch": 3.93964794635373, | |
| "eval_loss": 1.4893136024475098, | |
| "eval_runtime": 1.3081, | |
| "eval_samples_per_second": 76.445, | |
| "eval_steps_per_second": 12.996, | |
| "step": 4700 | |
| }, | |
| { | |
| "epoch": 3.9815590947191954, | |
| "grad_norm": 7.402718544006348, | |
| "learning_rate": 3.614597315436242e-05, | |
| "loss": 1.167281494140625, | |
| "step": 4750 | |
| }, | |
| { | |
| "epoch": 3.9815590947191954, | |
| "eval_loss": 1.4926365613937378, | |
| "eval_runtime": 1.3049, | |
| "eval_samples_per_second": 76.631, | |
| "eval_steps_per_second": 13.027, | |
| "step": 4750 | |
| }, | |
| { | |
| "epoch": 4.0234702430846605, | |
| "grad_norm": 6.653599262237549, | |
| "learning_rate": 3.5894295302013425e-05, | |
| "loss": 1.1446471405029297, | |
| "step": 4800 | |
| }, | |
| { | |
| "epoch": 4.0234702430846605, | |
| "eval_loss": 1.4886661767959595, | |
| "eval_runtime": 1.3155, | |
| "eval_samples_per_second": 76.015, | |
| "eval_steps_per_second": 12.923, | |
| "step": 4800 | |
| }, | |
| { | |
| "epoch": 4.065381391450126, | |
| "grad_norm": 8.677276611328125, | |
| "learning_rate": 3.564261744966443e-05, | |
| "loss": 1.0744084930419922, | |
| "step": 4850 | |
| }, | |
| { | |
| "epoch": 4.065381391450126, | |
| "eval_loss": 1.5003914833068848, | |
| "eval_runtime": 1.3162, | |
| "eval_samples_per_second": 75.976, | |
| "eval_steps_per_second": 12.916, | |
| "step": 4850 | |
| }, | |
| { | |
| "epoch": 4.107292539815591, | |
| "grad_norm": 7.787823677062988, | |
| "learning_rate": 3.539093959731544e-05, | |
| "loss": 1.093729248046875, | |
| "step": 4900 | |
| }, | |
| { | |
| "epoch": 4.107292539815591, | |
| "eval_loss": 1.4922502040863037, | |
| "eval_runtime": 1.3085, | |
| "eval_samples_per_second": 76.422, | |
| "eval_steps_per_second": 12.992, | |
| "step": 4900 | |
| }, | |
| { | |
| "epoch": 4.149203688181056, | |
| "grad_norm": 7.291721343994141, | |
| "learning_rate": 3.5139261744966444e-05, | |
| "loss": 1.0987574768066406, | |
| "step": 4950 | |
| }, | |
| { | |
| "epoch": 4.149203688181056, | |
| "eval_loss": 1.4958311319351196, | |
| "eval_runtime": 1.3227, | |
| "eval_samples_per_second": 75.603, | |
| "eval_steps_per_second": 12.853, | |
| "step": 4950 | |
| }, | |
| { | |
| "epoch": 4.191114836546522, | |
| "grad_norm": 7.971698760986328, | |
| "learning_rate": 3.488758389261745e-05, | |
| "loss": 1.1118472290039063, | |
| "step": 5000 | |
| }, | |
| { | |
| "epoch": 4.191114836546522, | |
| "eval_loss": 1.4774560928344727, | |
| "eval_runtime": 1.3119, | |
| "eval_samples_per_second": 76.228, | |
| "eval_steps_per_second": 12.959, | |
| "step": 5000 | |
| }, | |
| { | |
| "epoch": 4.233025984911986, | |
| "grad_norm": 8.064397811889648, | |
| "learning_rate": 3.4635906040268456e-05, | |
| "loss": 1.0769783782958984, | |
| "step": 5050 | |
| }, | |
| { | |
| "epoch": 4.233025984911986, | |
| "eval_loss": 1.482987642288208, | |
| "eval_runtime": 1.3057, | |
| "eval_samples_per_second": 76.586, | |
| "eval_steps_per_second": 13.02, | |
| "step": 5050 | |
| }, | |
| { | |
| "epoch": 4.274937133277452, | |
| "grad_norm": 8.052140235900879, | |
| "learning_rate": 3.438422818791946e-05, | |
| "loss": 1.0794688415527345, | |
| "step": 5100 | |
| }, | |
| { | |
| "epoch": 4.274937133277452, | |
| "eval_loss": 1.491633653640747, | |
| "eval_runtime": 1.3097, | |
| "eval_samples_per_second": 76.352, | |
| "eval_steps_per_second": 12.98, | |
| "step": 5100 | |
| }, | |
| { | |
| "epoch": 4.316848281642917, | |
| "grad_norm": 7.1745452880859375, | |
| "learning_rate": 3.4132550335570475e-05, | |
| "loss": 1.125449447631836, | |
| "step": 5150 | |
| }, | |
| { | |
| "epoch": 4.316848281642917, | |
| "eval_loss": 1.4796397686004639, | |
| "eval_runtime": 1.311, | |
| "eval_samples_per_second": 76.275, | |
| "eval_steps_per_second": 12.967, | |
| "step": 5150 | |
| }, | |
| { | |
| "epoch": 4.358759430008382, | |
| "grad_norm": 9.49004077911377, | |
| "learning_rate": 3.3880872483221474e-05, | |
| "loss": 1.1010012817382813, | |
| "step": 5200 | |
| }, | |
| { | |
| "epoch": 4.358759430008382, | |
| "eval_loss": 1.4847711324691772, | |
| "eval_runtime": 1.3099, | |
| "eval_samples_per_second": 76.342, | |
| "eval_steps_per_second": 12.978, | |
| "step": 5200 | |
| }, | |
| { | |
| "epoch": 4.400670578373847, | |
| "grad_norm": 6.858426094055176, | |
| "learning_rate": 3.362919463087249e-05, | |
| "loss": 1.0861676025390625, | |
| "step": 5250 | |
| }, | |
| { | |
| "epoch": 4.400670578373847, | |
| "eval_loss": 1.4817886352539062, | |
| "eval_runtime": 1.3093, | |
| "eval_samples_per_second": 76.378, | |
| "eval_steps_per_second": 12.984, | |
| "step": 5250 | |
| }, | |
| { | |
| "epoch": 4.442581726739313, | |
| "grad_norm": 8.979911804199219, | |
| "learning_rate": 3.3377516778523487e-05, | |
| "loss": 1.0603124237060546, | |
| "step": 5300 | |
| }, | |
| { | |
| "epoch": 4.442581726739313, | |
| "eval_loss": 1.4553375244140625, | |
| "eval_runtime": 1.3067, | |
| "eval_samples_per_second": 76.527, | |
| "eval_steps_per_second": 13.01, | |
| "step": 5300 | |
| }, | |
| { | |
| "epoch": 4.4844928751047775, | |
| "grad_norm": 8.207121849060059, | |
| "learning_rate": 3.31258389261745e-05, | |
| "loss": 1.1343705749511719, | |
| "step": 5350 | |
| }, | |
| { | |
| "epoch": 4.4844928751047775, | |
| "eval_loss": 1.4546470642089844, | |
| "eval_runtime": 1.3114, | |
| "eval_samples_per_second": 76.255, | |
| "eval_steps_per_second": 12.963, | |
| "step": 5350 | |
| }, | |
| { | |
| "epoch": 4.526404023470243, | |
| "grad_norm": 8.417594909667969, | |
| "learning_rate": 3.28741610738255e-05, | |
| "loss": 1.087770004272461, | |
| "step": 5400 | |
| }, | |
| { | |
| "epoch": 4.526404023470243, | |
| "eval_loss": 1.4662039279937744, | |
| "eval_runtime": 1.3103, | |
| "eval_samples_per_second": 76.318, | |
| "eval_steps_per_second": 12.974, | |
| "step": 5400 | |
| }, | |
| { | |
| "epoch": 4.5683151718357085, | |
| "grad_norm": 7.539185523986816, | |
| "learning_rate": 3.262248322147651e-05, | |
| "loss": 1.0903499603271485, | |
| "step": 5450 | |
| }, | |
| { | |
| "epoch": 4.5683151718357085, | |
| "eval_loss": 1.456904649734497, | |
| "eval_runtime": 1.3096, | |
| "eval_samples_per_second": 76.357, | |
| "eval_steps_per_second": 12.981, | |
| "step": 5450 | |
| }, | |
| { | |
| "epoch": 4.610226320201173, | |
| "grad_norm": 8.131542205810547, | |
| "learning_rate": 3.237080536912752e-05, | |
| "loss": 1.0831554412841797, | |
| "step": 5500 | |
| }, | |
| { | |
| "epoch": 4.610226320201173, | |
| "eval_loss": 1.4568111896514893, | |
| "eval_runtime": 1.3131, | |
| "eval_samples_per_second": 76.155, | |
| "eval_steps_per_second": 12.946, | |
| "step": 5500 | |
| }, | |
| { | |
| "epoch": 4.652137468566639, | |
| "grad_norm": 8.471091270446777, | |
| "learning_rate": 3.2119127516778524e-05, | |
| "loss": 1.113310775756836, | |
| "step": 5550 | |
| }, | |
| { | |
| "epoch": 4.652137468566639, | |
| "eval_loss": 1.4639575481414795, | |
| "eval_runtime": 1.3101, | |
| "eval_samples_per_second": 76.328, | |
| "eval_steps_per_second": 12.976, | |
| "step": 5550 | |
| }, | |
| { | |
| "epoch": 4.694048616932104, | |
| "grad_norm": 9.265482902526855, | |
| "learning_rate": 3.186744966442953e-05, | |
| "loss": 1.0913118743896484, | |
| "step": 5600 | |
| }, | |
| { | |
| "epoch": 4.694048616932104, | |
| "eval_loss": 1.4528746604919434, | |
| "eval_runtime": 1.3084, | |
| "eval_samples_per_second": 76.427, | |
| "eval_steps_per_second": 12.993, | |
| "step": 5600 | |
| }, | |
| { | |
| "epoch": 4.735959765297569, | |
| "grad_norm": 7.978163719177246, | |
| "learning_rate": 3.1615771812080536e-05, | |
| "loss": 1.1045352172851564, | |
| "step": 5650 | |
| }, | |
| { | |
| "epoch": 4.735959765297569, | |
| "eval_loss": 1.461908221244812, | |
| "eval_runtime": 1.3082, | |
| "eval_samples_per_second": 76.442, | |
| "eval_steps_per_second": 12.995, | |
| "step": 5650 | |
| }, | |
| { | |
| "epoch": 4.777870913663034, | |
| "grad_norm": 7.514118194580078, | |
| "learning_rate": 3.136409395973154e-05, | |
| "loss": 1.1434143829345702, | |
| "step": 5700 | |
| }, | |
| { | |
| "epoch": 4.777870913663034, | |
| "eval_loss": 1.4462722539901733, | |
| "eval_runtime": 1.3084, | |
| "eval_samples_per_second": 76.431, | |
| "eval_steps_per_second": 12.993, | |
| "step": 5700 | |
| }, | |
| { | |
| "epoch": 4.8197820620285, | |
| "grad_norm": 7.936262130737305, | |
| "learning_rate": 3.111241610738255e-05, | |
| "loss": 1.0651226043701172, | |
| "step": 5750 | |
| }, | |
| { | |
| "epoch": 4.8197820620285, | |
| "eval_loss": 1.429209589958191, | |
| "eval_runtime": 1.3077, | |
| "eval_samples_per_second": 76.468, | |
| "eval_steps_per_second": 13.0, | |
| "step": 5750 | |
| }, | |
| { | |
| "epoch": 4.861693210393965, | |
| "grad_norm": 8.838064193725586, | |
| "learning_rate": 3.086073825503356e-05, | |
| "loss": 1.0680814361572266, | |
| "step": 5800 | |
| }, | |
| { | |
| "epoch": 4.861693210393965, | |
| "eval_loss": 1.4328811168670654, | |
| "eval_runtime": 1.3089, | |
| "eval_samples_per_second": 76.399, | |
| "eval_steps_per_second": 12.988, | |
| "step": 5800 | |
| }, | |
| { | |
| "epoch": 4.90360435875943, | |
| "grad_norm": 8.240621566772461, | |
| "learning_rate": 3.060906040268457e-05, | |
| "loss": 1.0674250030517578, | |
| "step": 5850 | |
| }, | |
| { | |
| "epoch": 4.90360435875943, | |
| "eval_loss": 1.4490281343460083, | |
| "eval_runtime": 1.3067, | |
| "eval_samples_per_second": 76.527, | |
| "eval_steps_per_second": 13.01, | |
| "step": 5850 | |
| }, | |
| { | |
| "epoch": 4.945515507124895, | |
| "grad_norm": 6.849717617034912, | |
| "learning_rate": 3.035738255033557e-05, | |
| "loss": 1.127254180908203, | |
| "step": 5900 | |
| }, | |
| { | |
| "epoch": 4.945515507124895, | |
| "eval_loss": 1.4320799112319946, | |
| "eval_runtime": 1.3055, | |
| "eval_samples_per_second": 76.597, | |
| "eval_steps_per_second": 13.021, | |
| "step": 5900 | |
| }, | |
| { | |
| "epoch": 4.987426655490361, | |
| "grad_norm": 8.04859733581543, | |
| "learning_rate": 3.010570469798658e-05, | |
| "loss": 1.0843943786621093, | |
| "step": 5950 | |
| }, | |
| { | |
| "epoch": 4.987426655490361, | |
| "eval_loss": 1.4507101774215698, | |
| "eval_runtime": 1.3159, | |
| "eval_samples_per_second": 75.991, | |
| "eval_steps_per_second": 12.918, | |
| "step": 5950 | |
| }, | |
| { | |
| "epoch": 5.029337803855825, | |
| "grad_norm": 8.27238941192627, | |
| "learning_rate": 2.9854026845637583e-05, | |
| "loss": 0.9925987243652343, | |
| "step": 6000 | |
| }, | |
| { | |
| "epoch": 5.029337803855825, | |
| "eval_loss": 1.4500988721847534, | |
| "eval_runtime": 1.3154, | |
| "eval_samples_per_second": 76.022, | |
| "eval_steps_per_second": 12.924, | |
| "step": 6000 | |
| }, | |
| { | |
| "epoch": 5.071248952221291, | |
| "grad_norm": 7.7484588623046875, | |
| "learning_rate": 2.9602348993288592e-05, | |
| "loss": 0.967954330444336, | |
| "step": 6050 | |
| }, | |
| { | |
| "epoch": 5.071248952221291, | |
| "eval_loss": 1.4389163255691528, | |
| "eval_runtime": 1.3088, | |
| "eval_samples_per_second": 76.407, | |
| "eval_steps_per_second": 12.989, | |
| "step": 6050 | |
| }, | |
| { | |
| "epoch": 5.113160100586756, | |
| "grad_norm": 8.324884414672852, | |
| "learning_rate": 2.93506711409396e-05, | |
| "loss": 0.9880841064453125, | |
| "step": 6100 | |
| }, | |
| { | |
| "epoch": 5.113160100586756, | |
| "eval_loss": 1.4397937059402466, | |
| "eval_runtime": 1.3132, | |
| "eval_samples_per_second": 76.147, | |
| "eval_steps_per_second": 12.945, | |
| "step": 6100 | |
| }, | |
| { | |
| "epoch": 5.155071248952221, | |
| "grad_norm": 8.043445587158203, | |
| "learning_rate": 2.9098993288590605e-05, | |
| "loss": 0.9956832885742187, | |
| "step": 6150 | |
| }, | |
| { | |
| "epoch": 5.155071248952221, | |
| "eval_loss": 1.453405737876892, | |
| "eval_runtime": 1.3094, | |
| "eval_samples_per_second": 76.372, | |
| "eval_steps_per_second": 12.983, | |
| "step": 6150 | |
| }, | |
| { | |
| "epoch": 5.1969823973176865, | |
| "grad_norm": 8.966429710388184, | |
| "learning_rate": 2.884731543624161e-05, | |
| "loss": 0.9851046752929687, | |
| "step": 6200 | |
| }, | |
| { | |
| "epoch": 5.1969823973176865, | |
| "eval_loss": 1.450720191001892, | |
| "eval_runtime": 1.3106, | |
| "eval_samples_per_second": 76.301, | |
| "eval_steps_per_second": 12.971, | |
| "step": 6200 | |
| }, | |
| { | |
| "epoch": 5.238893545683152, | |
| "grad_norm": 8.247909545898438, | |
| "learning_rate": 2.8595637583892617e-05, | |
| "loss": 0.9631963348388672, | |
| "step": 6250 | |
| }, | |
| { | |
| "epoch": 5.238893545683152, | |
| "eval_loss": 1.4458355903625488, | |
| "eval_runtime": 1.3493, | |
| "eval_samples_per_second": 74.112, | |
| "eval_steps_per_second": 12.599, | |
| "step": 6250 | |
| }, | |
| { | |
| "epoch": 5.280804694048617, | |
| "grad_norm": 7.969563961029053, | |
| "learning_rate": 2.8343959731543623e-05, | |
| "loss": 0.9722949981689453, | |
| "step": 6300 | |
| }, | |
| { | |
| "epoch": 5.280804694048617, | |
| "eval_loss": 1.4396941661834717, | |
| "eval_runtime": 1.3152, | |
| "eval_samples_per_second": 76.033, | |
| "eval_steps_per_second": 12.926, | |
| "step": 6300 | |
| }, | |
| { | |
| "epoch": 5.322715842414082, | |
| "grad_norm": 10.03349494934082, | |
| "learning_rate": 2.809228187919463e-05, | |
| "loss": 1.012443084716797, | |
| "step": 6350 | |
| }, | |
| { | |
| "epoch": 5.322715842414082, | |
| "eval_loss": 1.4293105602264404, | |
| "eval_runtime": 1.3105, | |
| "eval_samples_per_second": 76.305, | |
| "eval_steps_per_second": 12.972, | |
| "step": 6350 | |
| }, | |
| { | |
| "epoch": 5.364626990779548, | |
| "grad_norm": 7.895944595336914, | |
| "learning_rate": 2.784060402684564e-05, | |
| "loss": 0.993044662475586, | |
| "step": 6400 | |
| }, | |
| { | |
| "epoch": 5.364626990779548, | |
| "eval_loss": 1.4404101371765137, | |
| "eval_runtime": 1.3093, | |
| "eval_samples_per_second": 76.379, | |
| "eval_steps_per_second": 12.984, | |
| "step": 6400 | |
| }, | |
| { | |
| "epoch": 5.406538139145012, | |
| "grad_norm": 7.853381156921387, | |
| "learning_rate": 2.7588926174496645e-05, | |
| "loss": 1.0133386993408202, | |
| "step": 6450 | |
| }, | |
| { | |
| "epoch": 5.406538139145012, | |
| "eval_loss": 1.4487998485565186, | |
| "eval_runtime": 1.3109, | |
| "eval_samples_per_second": 76.284, | |
| "eval_steps_per_second": 12.968, | |
| "step": 6450 | |
| }, | |
| { | |
| "epoch": 5.448449287510478, | |
| "grad_norm": 7.938576698303223, | |
| "learning_rate": 2.733724832214765e-05, | |
| "loss": 0.9752484893798828, | |
| "step": 6500 | |
| }, | |
| { | |
| "epoch": 5.448449287510478, | |
| "eval_loss": 1.439013957977295, | |
| "eval_runtime": 1.3237, | |
| "eval_samples_per_second": 75.545, | |
| "eval_steps_per_second": 12.843, | |
| "step": 6500 | |
| }, | |
| { | |
| "epoch": 5.490360435875943, | |
| "grad_norm": 8.683091163635254, | |
| "learning_rate": 2.7085570469798657e-05, | |
| "loss": 0.9687157440185546, | |
| "step": 6550 | |
| }, | |
| { | |
| "epoch": 5.490360435875943, | |
| "eval_loss": 1.4339245557785034, | |
| "eval_runtime": 1.3171, | |
| "eval_samples_per_second": 75.924, | |
| "eval_steps_per_second": 12.907, | |
| "step": 6550 | |
| }, | |
| { | |
| "epoch": 5.532271584241408, | |
| "grad_norm": 7.201942443847656, | |
| "learning_rate": 2.6833892617449663e-05, | |
| "loss": 1.003968505859375, | |
| "step": 6600 | |
| }, | |
| { | |
| "epoch": 5.532271584241408, | |
| "eval_loss": 1.4294220209121704, | |
| "eval_runtime": 1.3257, | |
| "eval_samples_per_second": 75.432, | |
| "eval_steps_per_second": 12.823, | |
| "step": 6600 | |
| }, | |
| { | |
| "epoch": 5.574182732606873, | |
| "grad_norm": 7.990872859954834, | |
| "learning_rate": 2.658221476510067e-05, | |
| "loss": 1.0251528167724608, | |
| "step": 6650 | |
| }, | |
| { | |
| "epoch": 5.574182732606873, | |
| "eval_loss": 1.4342848062515259, | |
| "eval_runtime": 1.3203, | |
| "eval_samples_per_second": 75.742, | |
| "eval_steps_per_second": 12.876, | |
| "step": 6650 | |
| }, | |
| { | |
| "epoch": 5.616093880972339, | |
| "grad_norm": 8.979780197143555, | |
| "learning_rate": 2.6330536912751675e-05, | |
| "loss": 1.0025927734375, | |
| "step": 6700 | |
| }, | |
| { | |
| "epoch": 5.616093880972339, | |
| "eval_loss": 1.427762508392334, | |
| "eval_runtime": 1.3273, | |
| "eval_samples_per_second": 75.343, | |
| "eval_steps_per_second": 12.808, | |
| "step": 6700 | |
| }, | |
| { | |
| "epoch": 5.6580050293378035, | |
| "grad_norm": 8.129778861999512, | |
| "learning_rate": 2.6078859060402685e-05, | |
| "loss": 0.9966341400146485, | |
| "step": 6750 | |
| }, | |
| { | |
| "epoch": 5.6580050293378035, | |
| "eval_loss": 1.4209414720535278, | |
| "eval_runtime": 1.3215, | |
| "eval_samples_per_second": 75.67, | |
| "eval_steps_per_second": 12.864, | |
| "step": 6750 | |
| }, | |
| { | |
| "epoch": 5.699916177703269, | |
| "grad_norm": 8.076849937438965, | |
| "learning_rate": 2.5827181208053695e-05, | |
| "loss": 1.0694184112548828, | |
| "step": 6800 | |
| }, | |
| { | |
| "epoch": 5.699916177703269, | |
| "eval_loss": 1.4195367097854614, | |
| "eval_runtime": 1.3194, | |
| "eval_samples_per_second": 75.791, | |
| "eval_steps_per_second": 12.884, | |
| "step": 6800 | |
| }, | |
| { | |
| "epoch": 5.7418273260687345, | |
| "grad_norm": 9.296486854553223, | |
| "learning_rate": 2.55755033557047e-05, | |
| "loss": 0.9731069946289063, | |
| "step": 6850 | |
| }, | |
| { | |
| "epoch": 5.7418273260687345, | |
| "eval_loss": 1.4187347888946533, | |
| "eval_runtime": 1.3189, | |
| "eval_samples_per_second": 75.819, | |
| "eval_steps_per_second": 12.889, | |
| "step": 6850 | |
| }, | |
| { | |
| "epoch": 5.7837384744342, | |
| "grad_norm": 8.775923728942871, | |
| "learning_rate": 2.5323825503355707e-05, | |
| "loss": 1.0156724548339844, | |
| "step": 6900 | |
| }, | |
| { | |
| "epoch": 5.7837384744342, | |
| "eval_loss": 1.4232662916183472, | |
| "eval_runtime": 1.3211, | |
| "eval_samples_per_second": 75.692, | |
| "eval_steps_per_second": 12.868, | |
| "step": 6900 | |
| }, | |
| { | |
| "epoch": 5.825649622799665, | |
| "grad_norm": 9.39543342590332, | |
| "learning_rate": 2.5072147651006713e-05, | |
| "loss": 1.0193019104003906, | |
| "step": 6950 | |
| }, | |
| { | |
| "epoch": 5.825649622799665, | |
| "eval_loss": 1.4033825397491455, | |
| "eval_runtime": 1.3262, | |
| "eval_samples_per_second": 75.401, | |
| "eval_steps_per_second": 12.818, | |
| "step": 6950 | |
| }, | |
| { | |
| "epoch": 5.86756077116513, | |
| "grad_norm": 8.898547172546387, | |
| "learning_rate": 2.482046979865772e-05, | |
| "loss": 0.9925969696044922, | |
| "step": 7000 | |
| }, | |
| { | |
| "epoch": 5.86756077116513, | |
| "eval_loss": 1.403509497642517, | |
| "eval_runtime": 1.324, | |
| "eval_samples_per_second": 75.53, | |
| "eval_steps_per_second": 12.84, | |
| "step": 7000 | |
| }, | |
| { | |
| "epoch": 5.909471919530596, | |
| "grad_norm": 7.939687728881836, | |
| "learning_rate": 2.4568791946308725e-05, | |
| "loss": 0.9992769622802734, | |
| "step": 7050 | |
| }, | |
| { | |
| "epoch": 5.909471919530596, | |
| "eval_loss": 1.4148446321487427, | |
| "eval_runtime": 1.3217, | |
| "eval_samples_per_second": 75.663, | |
| "eval_steps_per_second": 12.863, | |
| "step": 7050 | |
| }, | |
| { | |
| "epoch": 5.95138306789606, | |
| "grad_norm": 7.085293769836426, | |
| "learning_rate": 2.4317114093959735e-05, | |
| "loss": 1.0247660064697266, | |
| "step": 7100 | |
| }, | |
| { | |
| "epoch": 5.95138306789606, | |
| "eval_loss": 1.4027897119522095, | |
| "eval_runtime": 1.3239, | |
| "eval_samples_per_second": 75.537, | |
| "eval_steps_per_second": 12.841, | |
| "step": 7100 | |
| }, | |
| { | |
| "epoch": 5.993294216261526, | |
| "grad_norm": 7.334988594055176, | |
| "learning_rate": 2.406543624161074e-05, | |
| "loss": 0.9960948181152344, | |
| "step": 7150 | |
| }, | |
| { | |
| "epoch": 5.993294216261526, | |
| "eval_loss": 1.3956409692764282, | |
| "eval_runtime": 1.3259, | |
| "eval_samples_per_second": 75.419, | |
| "eval_steps_per_second": 12.821, | |
| "step": 7150 | |
| }, | |
| { | |
| "epoch": 6.035205364626991, | |
| "grad_norm": 7.5491042137146, | |
| "learning_rate": 2.3813758389261747e-05, | |
| "loss": 0.9590288543701172, | |
| "step": 7200 | |
| }, | |
| { | |
| "epoch": 6.035205364626991, | |
| "eval_loss": 1.421021580696106, | |
| "eval_runtime": 1.3276, | |
| "eval_samples_per_second": 75.321, | |
| "eval_steps_per_second": 12.805, | |
| "step": 7200 | |
| }, | |
| { | |
| "epoch": 6.077116512992456, | |
| "grad_norm": 8.102331161499023, | |
| "learning_rate": 2.3562080536912753e-05, | |
| "loss": 0.8921336364746094, | |
| "step": 7250 | |
| }, | |
| { | |
| "epoch": 6.077116512992456, | |
| "eval_loss": 1.4105138778686523, | |
| "eval_runtime": 1.3172, | |
| "eval_samples_per_second": 75.918, | |
| "eval_steps_per_second": 12.906, | |
| "step": 7250 | |
| }, | |
| { | |
| "epoch": 6.119027661357921, | |
| "grad_norm": 8.010552406311035, | |
| "learning_rate": 2.331040268456376e-05, | |
| "loss": 0.8992021179199219, | |
| "step": 7300 | |
| }, | |
| { | |
| "epoch": 6.119027661357921, | |
| "eval_loss": 1.4192215204238892, | |
| "eval_runtime": 1.3213, | |
| "eval_samples_per_second": 75.683, | |
| "eval_steps_per_second": 12.866, | |
| "step": 7300 | |
| }, | |
| { | |
| "epoch": 6.160938809723387, | |
| "grad_norm": 7.069246768951416, | |
| "learning_rate": 2.3058724832214765e-05, | |
| "loss": 0.910236587524414, | |
| "step": 7350 | |
| }, | |
| { | |
| "epoch": 6.160938809723387, | |
| "eval_loss": 1.4088717699050903, | |
| "eval_runtime": 1.3221, | |
| "eval_samples_per_second": 75.639, | |
| "eval_steps_per_second": 12.859, | |
| "step": 7350 | |
| }, | |
| { | |
| "epoch": 6.202849958088851, | |
| "grad_norm": 7.486292839050293, | |
| "learning_rate": 2.280704697986577e-05, | |
| "loss": 0.866146469116211, | |
| "step": 7400 | |
| }, | |
| { | |
| "epoch": 6.202849958088851, | |
| "eval_loss": 1.4406752586364746, | |
| "eval_runtime": 1.3265, | |
| "eval_samples_per_second": 75.385, | |
| "eval_steps_per_second": 12.815, | |
| "step": 7400 | |
| }, | |
| { | |
| "epoch": 6.244761106454317, | |
| "grad_norm": 8.706947326660156, | |
| "learning_rate": 2.255536912751678e-05, | |
| "loss": 0.9541705322265625, | |
| "step": 7450 | |
| }, | |
| { | |
| "epoch": 6.244761106454317, | |
| "eval_loss": 1.4225102663040161, | |
| "eval_runtime": 1.3203, | |
| "eval_samples_per_second": 75.738, | |
| "eval_steps_per_second": 12.875, | |
| "step": 7450 | |
| }, | |
| { | |
| "epoch": 6.286672254819782, | |
| "grad_norm": 8.882471084594727, | |
| "learning_rate": 2.2303691275167787e-05, | |
| "loss": 0.9190203857421875, | |
| "step": 7500 | |
| }, | |
| { | |
| "epoch": 6.286672254819782, | |
| "eval_loss": 1.414080023765564, | |
| "eval_runtime": 1.3229, | |
| "eval_samples_per_second": 75.592, | |
| "eval_steps_per_second": 12.851, | |
| "step": 7500 | |
| }, | |
| { | |
| "epoch": 6.328583403185247, | |
| "grad_norm": 8.999835014343262, | |
| "learning_rate": 2.2052013422818793e-05, | |
| "loss": 0.9251675415039062, | |
| "step": 7550 | |
| }, | |
| { | |
| "epoch": 6.328583403185247, | |
| "eval_loss": 1.4201768636703491, | |
| "eval_runtime": 1.3234, | |
| "eval_samples_per_second": 75.561, | |
| "eval_steps_per_second": 12.845, | |
| "step": 7550 | |
| }, | |
| { | |
| "epoch": 6.3704945515507125, | |
| "grad_norm": 19.048643112182617, | |
| "learning_rate": 2.18003355704698e-05, | |
| "loss": 0.9465265655517578, | |
| "step": 7600 | |
| }, | |
| { | |
| "epoch": 6.3704945515507125, | |
| "eval_loss": 1.4244815111160278, | |
| "eval_runtime": 1.3165, | |
| "eval_samples_per_second": 75.959, | |
| "eval_steps_per_second": 12.913, | |
| "step": 7600 | |
| }, | |
| { | |
| "epoch": 6.412405699916178, | |
| "grad_norm": 7.876664638519287, | |
| "learning_rate": 2.1548657718120806e-05, | |
| "loss": 0.9214114379882813, | |
| "step": 7650 | |
| }, | |
| { | |
| "epoch": 6.412405699916178, | |
| "eval_loss": 1.4028754234313965, | |
| "eval_runtime": 1.36, | |
| "eval_samples_per_second": 73.529, | |
| "eval_steps_per_second": 12.5, | |
| "step": 7650 | |
| }, | |
| { | |
| "epoch": 6.454316848281643, | |
| "grad_norm": 7.950414657592773, | |
| "learning_rate": 2.1296979865771812e-05, | |
| "loss": 0.9261102294921875, | |
| "step": 7700 | |
| }, | |
| { | |
| "epoch": 6.454316848281643, | |
| "eval_loss": 1.4066473245620728, | |
| "eval_runtime": 1.3251, | |
| "eval_samples_per_second": 75.467, | |
| "eval_steps_per_second": 12.829, | |
| "step": 7700 | |
| }, | |
| { | |
| "epoch": 6.496227996647108, | |
| "grad_norm": 10.028432846069336, | |
| "learning_rate": 2.1045302013422818e-05, | |
| "loss": 0.9592378997802734, | |
| "step": 7750 | |
| }, | |
| { | |
| "epoch": 6.496227996647108, | |
| "eval_loss": 1.3987481594085693, | |
| "eval_runtime": 1.3299, | |
| "eval_samples_per_second": 75.196, | |
| "eval_steps_per_second": 12.783, | |
| "step": 7750 | |
| }, | |
| { | |
| "epoch": 6.538139145012574, | |
| "grad_norm": 7.391389846801758, | |
| "learning_rate": 2.0793624161073828e-05, | |
| "loss": 0.9425662994384766, | |
| "step": 7800 | |
| }, | |
| { | |
| "epoch": 6.538139145012574, | |
| "eval_loss": 1.3959389925003052, | |
| "eval_runtime": 1.3529, | |
| "eval_samples_per_second": 73.918, | |
| "eval_steps_per_second": 12.566, | |
| "step": 7800 | |
| }, | |
| { | |
| "epoch": 6.580050293378038, | |
| "grad_norm": 10.522768020629883, | |
| "learning_rate": 2.0541946308724834e-05, | |
| "loss": 0.8912117767333985, | |
| "step": 7850 | |
| }, | |
| { | |
| "epoch": 6.580050293378038, | |
| "eval_loss": 1.3973815441131592, | |
| "eval_runtime": 1.3092, | |
| "eval_samples_per_second": 76.382, | |
| "eval_steps_per_second": 12.985, | |
| "step": 7850 | |
| }, | |
| { | |
| "epoch": 6.621961441743504, | |
| "grad_norm": 9.219219207763672, | |
| "learning_rate": 2.029026845637584e-05, | |
| "loss": 0.9257565307617187, | |
| "step": 7900 | |
| }, | |
| { | |
| "epoch": 6.621961441743504, | |
| "eval_loss": 1.386621117591858, | |
| "eval_runtime": 1.3124, | |
| "eval_samples_per_second": 76.198, | |
| "eval_steps_per_second": 12.954, | |
| "step": 7900 | |
| }, | |
| { | |
| "epoch": 6.663872590108969, | |
| "grad_norm": 7.803803443908691, | |
| "learning_rate": 2.0038590604026846e-05, | |
| "loss": 0.9501576232910156, | |
| "step": 7950 | |
| }, | |
| { | |
| "epoch": 6.663872590108969, | |
| "eval_loss": 1.387654423713684, | |
| "eval_runtime": 1.3237, | |
| "eval_samples_per_second": 75.546, | |
| "eval_steps_per_second": 12.843, | |
| "step": 7950 | |
| }, | |
| { | |
| "epoch": 6.705783738474434, | |
| "grad_norm": 8.187541961669922, | |
| "learning_rate": 1.9786912751677852e-05, | |
| "loss": 0.94219482421875, | |
| "step": 8000 | |
| }, | |
| { | |
| "epoch": 6.705783738474434, | |
| "eval_loss": 1.396321415901184, | |
| "eval_runtime": 1.3092, | |
| "eval_samples_per_second": 76.382, | |
| "eval_steps_per_second": 12.985, | |
| "step": 8000 | |
| }, | |
| { | |
| "epoch": 6.747694886839899, | |
| "grad_norm": 7.921971797943115, | |
| "learning_rate": 1.9535234899328858e-05, | |
| "loss": 0.9036112213134766, | |
| "step": 8050 | |
| }, | |
| { | |
| "epoch": 6.747694886839899, | |
| "eval_loss": 1.392683744430542, | |
| "eval_runtime": 1.3145, | |
| "eval_samples_per_second": 76.075, | |
| "eval_steps_per_second": 12.933, | |
| "step": 8050 | |
| }, | |
| { | |
| "epoch": 6.789606035205365, | |
| "grad_norm": 7.816883563995361, | |
| "learning_rate": 1.9283557046979864e-05, | |
| "loss": 0.9160438537597656, | |
| "step": 8100 | |
| }, | |
| { | |
| "epoch": 6.789606035205365, | |
| "eval_loss": 1.3972537517547607, | |
| "eval_runtime": 1.3067, | |
| "eval_samples_per_second": 76.531, | |
| "eval_steps_per_second": 13.01, | |
| "step": 8100 | |
| }, | |
| { | |
| "epoch": 6.8315171835708295, | |
| "grad_norm": 7.804800987243652, | |
| "learning_rate": 1.9031879194630874e-05, | |
| "loss": 0.9270132446289062, | |
| "step": 8150 | |
| }, | |
| { | |
| "epoch": 6.8315171835708295, | |
| "eval_loss": 1.3819900751113892, | |
| "eval_runtime": 1.3094, | |
| "eval_samples_per_second": 76.374, | |
| "eval_steps_per_second": 12.984, | |
| "step": 8150 | |
| }, | |
| { | |
| "epoch": 6.873428331936295, | |
| "grad_norm": 8.289756774902344, | |
| "learning_rate": 1.878020134228188e-05, | |
| "loss": 0.9180772399902344, | |
| "step": 8200 | |
| }, | |
| { | |
| "epoch": 6.873428331936295, | |
| "eval_loss": 1.3977206945419312, | |
| "eval_runtime": 1.3094, | |
| "eval_samples_per_second": 76.369, | |
| "eval_steps_per_second": 12.983, | |
| "step": 8200 | |
| }, | |
| { | |
| "epoch": 6.9153394803017605, | |
| "grad_norm": 9.39144515991211, | |
| "learning_rate": 1.8528523489932886e-05, | |
| "loss": 0.9062419891357422, | |
| "step": 8250 | |
| }, | |
| { | |
| "epoch": 6.9153394803017605, | |
| "eval_loss": 1.375473976135254, | |
| "eval_runtime": 1.3089, | |
| "eval_samples_per_second": 76.398, | |
| "eval_steps_per_second": 12.988, | |
| "step": 8250 | |
| }, | |
| { | |
| "epoch": 6.957250628667225, | |
| "grad_norm": 8.791142463684082, | |
| "learning_rate": 1.8276845637583892e-05, | |
| "loss": 0.952476577758789, | |
| "step": 8300 | |
| }, | |
| { | |
| "epoch": 6.957250628667225, | |
| "eval_loss": 1.389543056488037, | |
| "eval_runtime": 1.3217, | |
| "eval_samples_per_second": 75.66, | |
| "eval_steps_per_second": 12.862, | |
| "step": 8300 | |
| }, | |
| { | |
| "epoch": 6.999161777032691, | |
| "grad_norm": 8.911063194274902, | |
| "learning_rate": 1.80251677852349e-05, | |
| "loss": 0.9405780029296875, | |
| "step": 8350 | |
| }, | |
| { | |
| "epoch": 6.999161777032691, | |
| "eval_loss": 1.381428837776184, | |
| "eval_runtime": 1.3144, | |
| "eval_samples_per_second": 76.083, | |
| "eval_steps_per_second": 12.934, | |
| "step": 8350 | |
| }, | |
| { | |
| "epoch": 7.041072925398156, | |
| "grad_norm": 8.737972259521484, | |
| "learning_rate": 1.7773489932885905e-05, | |
| "loss": 0.8461763000488282, | |
| "step": 8400 | |
| }, | |
| { | |
| "epoch": 7.041072925398156, | |
| "eval_loss": 1.404624581336975, | |
| "eval_runtime": 1.3101, | |
| "eval_samples_per_second": 76.328, | |
| "eval_steps_per_second": 12.976, | |
| "step": 8400 | |
| }, | |
| { | |
| "epoch": 7.082984073763621, | |
| "grad_norm": 7.6500630378723145, | |
| "learning_rate": 1.752181208053691e-05, | |
| "loss": 0.8605227661132813, | |
| "step": 8450 | |
| }, | |
| { | |
| "epoch": 7.082984073763621, | |
| "eval_loss": 1.4016004800796509, | |
| "eval_runtime": 1.3339, | |
| "eval_samples_per_second": 74.968, | |
| "eval_steps_per_second": 12.745, | |
| "step": 8450 | |
| }, | |
| { | |
| "epoch": 7.124895222129086, | |
| "grad_norm": 8.47529125213623, | |
| "learning_rate": 1.727013422818792e-05, | |
| "loss": 0.8707926177978516, | |
| "step": 8500 | |
| }, | |
| { | |
| "epoch": 7.124895222129086, | |
| "eval_loss": 1.4020596742630005, | |
| "eval_runtime": 1.3125, | |
| "eval_samples_per_second": 76.192, | |
| "eval_steps_per_second": 12.953, | |
| "step": 8500 | |
| }, | |
| { | |
| "epoch": 7.166806370494552, | |
| "grad_norm": 8.03702163696289, | |
| "learning_rate": 1.7018456375838926e-05, | |
| "loss": 0.8673520660400391, | |
| "step": 8550 | |
| }, | |
| { | |
| "epoch": 7.166806370494552, | |
| "eval_loss": 1.409217357635498, | |
| "eval_runtime": 1.3112, | |
| "eval_samples_per_second": 76.268, | |
| "eval_steps_per_second": 12.966, | |
| "step": 8550 | |
| }, | |
| { | |
| "epoch": 7.208717518860016, | |
| "grad_norm": 9.523116111755371, | |
| "learning_rate": 1.6766778523489933e-05, | |
| "loss": 0.8610651397705078, | |
| "step": 8600 | |
| }, | |
| { | |
| "epoch": 7.208717518860016, | |
| "eval_loss": 1.413827657699585, | |
| "eval_runtime": 1.3076, | |
| "eval_samples_per_second": 76.476, | |
| "eval_steps_per_second": 13.001, | |
| "step": 8600 | |
| }, | |
| { | |
| "epoch": 7.250628667225482, | |
| "grad_norm": 8.69580078125, | |
| "learning_rate": 1.651510067114094e-05, | |
| "loss": 0.8646464538574219, | |
| "step": 8650 | |
| }, | |
| { | |
| "epoch": 7.250628667225482, | |
| "eval_loss": 1.4039324522018433, | |
| "eval_runtime": 1.3085, | |
| "eval_samples_per_second": 76.424, | |
| "eval_steps_per_second": 12.992, | |
| "step": 8650 | |
| }, | |
| { | |
| "epoch": 7.292539815590947, | |
| "grad_norm": 9.004595756530762, | |
| "learning_rate": 1.6263422818791945e-05, | |
| "loss": 0.8606627655029296, | |
| "step": 8700 | |
| }, | |
| { | |
| "epoch": 7.292539815590947, | |
| "eval_loss": 1.3969203233718872, | |
| "eval_runtime": 1.3144, | |
| "eval_samples_per_second": 76.083, | |
| "eval_steps_per_second": 12.934, | |
| "step": 8700 | |
| }, | |
| { | |
| "epoch": 7.334450963956412, | |
| "grad_norm": 10.599759101867676, | |
| "learning_rate": 1.6011744966442954e-05, | |
| "loss": 0.8652530670166015, | |
| "step": 8750 | |
| }, | |
| { | |
| "epoch": 7.334450963956412, | |
| "eval_loss": 1.3936374187469482, | |
| "eval_runtime": 1.3142, | |
| "eval_samples_per_second": 76.09, | |
| "eval_steps_per_second": 12.935, | |
| "step": 8750 | |
| }, | |
| { | |
| "epoch": 7.376362112321877, | |
| "grad_norm": 9.753110885620117, | |
| "learning_rate": 1.576006711409396e-05, | |
| "loss": 0.8457546997070312, | |
| "step": 8800 | |
| }, | |
| { | |
| "epoch": 7.376362112321877, | |
| "eval_loss": 1.3915700912475586, | |
| "eval_runtime": 1.3129, | |
| "eval_samples_per_second": 76.17, | |
| "eval_steps_per_second": 12.949, | |
| "step": 8800 | |
| }, | |
| { | |
| "epoch": 7.418273260687343, | |
| "grad_norm": 9.131157875061035, | |
| "learning_rate": 1.5508389261744967e-05, | |
| "loss": 0.849459457397461, | |
| "step": 8850 | |
| }, | |
| { | |
| "epoch": 7.418273260687343, | |
| "eval_loss": 1.3911302089691162, | |
| "eval_runtime": 1.3158, | |
| "eval_samples_per_second": 76.001, | |
| "eval_steps_per_second": 12.92, | |
| "step": 8850 | |
| }, | |
| { | |
| "epoch": 7.460184409052808, | |
| "grad_norm": 9.842427253723145, | |
| "learning_rate": 1.5256711409395975e-05, | |
| "loss": 0.9080851745605468, | |
| "step": 8900 | |
| }, | |
| { | |
| "epoch": 7.460184409052808, | |
| "eval_loss": 1.3832069635391235, | |
| "eval_runtime": 1.3075, | |
| "eval_samples_per_second": 76.48, | |
| "eval_steps_per_second": 13.002, | |
| "step": 8900 | |
| }, | |
| { | |
| "epoch": 7.502095557418273, | |
| "grad_norm": 8.517088890075684, | |
| "learning_rate": 1.500503355704698e-05, | |
| "loss": 0.8484629821777344, | |
| "step": 8950 | |
| }, | |
| { | |
| "epoch": 7.502095557418273, | |
| "eval_loss": 1.387845516204834, | |
| "eval_runtime": 1.3107, | |
| "eval_samples_per_second": 76.298, | |
| "eval_steps_per_second": 12.971, | |
| "step": 8950 | |
| }, | |
| { | |
| "epoch": 7.5440067057837386, | |
| "grad_norm": 8.61888313293457, | |
| "learning_rate": 1.4753355704697987e-05, | |
| "loss": 0.859649658203125, | |
| "step": 9000 | |
| }, | |
| { | |
| "epoch": 7.5440067057837386, | |
| "eval_loss": 1.3796684741973877, | |
| "eval_runtime": 1.3117, | |
| "eval_samples_per_second": 76.24, | |
| "eval_steps_per_second": 12.961, | |
| "step": 9000 | |
| }, | |
| { | |
| "epoch": 7.585917854149204, | |
| "grad_norm": 9.451814651489258, | |
| "learning_rate": 1.4501677852348993e-05, | |
| "loss": 0.8550418090820312, | |
| "step": 9050 | |
| }, | |
| { | |
| "epoch": 7.585917854149204, | |
| "eval_loss": 1.3890985250473022, | |
| "eval_runtime": 1.3091, | |
| "eval_samples_per_second": 76.39, | |
| "eval_steps_per_second": 12.986, | |
| "step": 9050 | |
| }, | |
| { | |
| "epoch": 7.627829002514669, | |
| "grad_norm": 7.689682960510254, | |
| "learning_rate": 1.4249999999999999e-05, | |
| "loss": 0.8378238677978516, | |
| "step": 9100 | |
| }, | |
| { | |
| "epoch": 7.627829002514669, | |
| "eval_loss": 1.382077693939209, | |
| "eval_runtime": 1.3129, | |
| "eval_samples_per_second": 76.165, | |
| "eval_steps_per_second": 12.948, | |
| "step": 9100 | |
| }, | |
| { | |
| "epoch": 7.669740150880134, | |
| "grad_norm": 8.914044380187988, | |
| "learning_rate": 1.3998322147651007e-05, | |
| "loss": 0.8677043151855469, | |
| "step": 9150 | |
| }, | |
| { | |
| "epoch": 7.669740150880134, | |
| "eval_loss": 1.3811883926391602, | |
| "eval_runtime": 1.3094, | |
| "eval_samples_per_second": 76.371, | |
| "eval_steps_per_second": 12.983, | |
| "step": 9150 | |
| }, | |
| { | |
| "epoch": 7.7116512992456, | |
| "grad_norm": 8.388723373413086, | |
| "learning_rate": 1.3746644295302013e-05, | |
| "loss": 0.8768821716308594, | |
| "step": 9200 | |
| }, | |
| { | |
| "epoch": 7.7116512992456, | |
| "eval_loss": 1.370642900466919, | |
| "eval_runtime": 1.3151, | |
| "eval_samples_per_second": 76.041, | |
| "eval_steps_per_second": 12.927, | |
| "step": 9200 | |
| }, | |
| { | |
| "epoch": 7.753562447611064, | |
| "grad_norm": 7.788331985473633, | |
| "learning_rate": 1.3494966442953021e-05, | |
| "loss": 0.9003873443603516, | |
| "step": 9250 | |
| }, | |
| { | |
| "epoch": 7.753562447611064, | |
| "eval_loss": 1.3828204870224, | |
| "eval_runtime": 1.3115, | |
| "eval_samples_per_second": 76.25, | |
| "eval_steps_per_second": 12.962, | |
| "step": 9250 | |
| }, | |
| { | |
| "epoch": 7.79547359597653, | |
| "grad_norm": 10.433382987976074, | |
| "learning_rate": 1.3243288590604029e-05, | |
| "loss": 0.868186264038086, | |
| "step": 9300 | |
| }, | |
| { | |
| "epoch": 7.79547359597653, | |
| "eval_loss": 1.380051612854004, | |
| "eval_runtime": 1.3104, | |
| "eval_samples_per_second": 76.31, | |
| "eval_steps_per_second": 12.973, | |
| "step": 9300 | |
| }, | |
| { | |
| "epoch": 7.837384744341995, | |
| "grad_norm": 8.336097717285156, | |
| "learning_rate": 1.2991610738255035e-05, | |
| "loss": 0.8668665313720703, | |
| "step": 9350 | |
| }, | |
| { | |
| "epoch": 7.837384744341995, | |
| "eval_loss": 1.3708611726760864, | |
| "eval_runtime": 1.3089, | |
| "eval_samples_per_second": 76.401, | |
| "eval_steps_per_second": 12.988, | |
| "step": 9350 | |
| }, | |
| { | |
| "epoch": 7.87929589270746, | |
| "grad_norm": 8.220783233642578, | |
| "learning_rate": 1.2739932885906041e-05, | |
| "loss": 0.8469175720214843, | |
| "step": 9400 | |
| }, | |
| { | |
| "epoch": 7.87929589270746, | |
| "eval_loss": 1.3757524490356445, | |
| "eval_runtime": 1.3076, | |
| "eval_samples_per_second": 76.476, | |
| "eval_steps_per_second": 13.001, | |
| "step": 9400 | |
| }, | |
| { | |
| "epoch": 7.921207041072925, | |
| "grad_norm": 7.74770975112915, | |
| "learning_rate": 1.2488255033557047e-05, | |
| "loss": 0.8731184387207032, | |
| "step": 9450 | |
| }, | |
| { | |
| "epoch": 7.921207041072925, | |
| "eval_loss": 1.374078392982483, | |
| "eval_runtime": 1.3072, | |
| "eval_samples_per_second": 76.502, | |
| "eval_steps_per_second": 13.005, | |
| "step": 9450 | |
| }, | |
| { | |
| "epoch": 7.963118189438391, | |
| "grad_norm": 13.994065284729004, | |
| "learning_rate": 1.2236577181208055e-05, | |
| "loss": 0.8569823455810547, | |
| "step": 9500 | |
| }, | |
| { | |
| "epoch": 7.963118189438391, | |
| "eval_loss": 1.3806568384170532, | |
| "eval_runtime": 1.3103, | |
| "eval_samples_per_second": 76.32, | |
| "eval_steps_per_second": 12.974, | |
| "step": 9500 | |
| }, | |
| { | |
| "epoch": 8.005029337803856, | |
| "grad_norm": 8.4920072555542, | |
| "learning_rate": 1.1984899328859061e-05, | |
| "loss": 0.8283097839355469, | |
| "step": 9550 | |
| }, | |
| { | |
| "epoch": 8.005029337803856, | |
| "eval_loss": 1.3723480701446533, | |
| "eval_runtime": 1.3124, | |
| "eval_samples_per_second": 76.198, | |
| "eval_steps_per_second": 12.954, | |
| "step": 9550 | |
| }, | |
| { | |
| "epoch": 8.046940486169321, | |
| "grad_norm": 8.069540977478027, | |
| "learning_rate": 1.1733221476510067e-05, | |
| "loss": 0.821678466796875, | |
| "step": 9600 | |
| }, | |
| { | |
| "epoch": 8.046940486169321, | |
| "eval_loss": 1.3937333822250366, | |
| "eval_runtime": 1.3136, | |
| "eval_samples_per_second": 76.127, | |
| "eval_steps_per_second": 12.942, | |
| "step": 9600 | |
| }, | |
| { | |
| "epoch": 8.088851634534786, | |
| "grad_norm": 9.921940803527832, | |
| "learning_rate": 1.1481543624161075e-05, | |
| "loss": 0.8030374145507813, | |
| "step": 9650 | |
| }, | |
| { | |
| "epoch": 8.088851634534786, | |
| "eval_loss": 1.381339430809021, | |
| "eval_runtime": 1.3144, | |
| "eval_samples_per_second": 76.082, | |
| "eval_steps_per_second": 12.934, | |
| "step": 9650 | |
| }, | |
| { | |
| "epoch": 8.130762782900252, | |
| "grad_norm": 9.074210166931152, | |
| "learning_rate": 1.1229865771812081e-05, | |
| "loss": 0.7958895111083985, | |
| "step": 9700 | |
| }, | |
| { | |
| "epoch": 8.130762782900252, | |
| "eval_loss": 1.3928741216659546, | |
| "eval_runtime": 1.3161, | |
| "eval_samples_per_second": 75.982, | |
| "eval_steps_per_second": 12.917, | |
| "step": 9700 | |
| }, | |
| { | |
| "epoch": 8.172673931265717, | |
| "grad_norm": 8.97559642791748, | |
| "learning_rate": 1.0978187919463087e-05, | |
| "loss": 0.783568115234375, | |
| "step": 9750 | |
| }, | |
| { | |
| "epoch": 8.172673931265717, | |
| "eval_loss": 1.4037460088729858, | |
| "eval_runtime": 1.311, | |
| "eval_samples_per_second": 76.278, | |
| "eval_steps_per_second": 12.967, | |
| "step": 9750 | |
| }, | |
| { | |
| "epoch": 8.214585079631181, | |
| "grad_norm": 7.9130330085754395, | |
| "learning_rate": 1.0726510067114094e-05, | |
| "loss": 0.9023107147216797, | |
| "step": 9800 | |
| }, | |
| { | |
| "epoch": 8.214585079631181, | |
| "eval_loss": 1.3975722789764404, | |
| "eval_runtime": 1.3099, | |
| "eval_samples_per_second": 76.339, | |
| "eval_steps_per_second": 12.978, | |
| "step": 9800 | |
| }, | |
| { | |
| "epoch": 8.256496227996648, | |
| "grad_norm": 10.273441314697266, | |
| "learning_rate": 1.0474832214765101e-05, | |
| "loss": 0.8038414001464844, | |
| "step": 9850 | |
| }, | |
| { | |
| "epoch": 8.256496227996648, | |
| "eval_loss": 1.396716594696045, | |
| "eval_runtime": 1.3054, | |
| "eval_samples_per_second": 76.602, | |
| "eval_steps_per_second": 13.022, | |
| "step": 9850 | |
| }, | |
| { | |
| "epoch": 8.298407376362112, | |
| "grad_norm": 7.952096939086914, | |
| "learning_rate": 1.0223154362416108e-05, | |
| "loss": 0.7995934295654297, | |
| "step": 9900 | |
| }, | |
| { | |
| "epoch": 8.298407376362112, | |
| "eval_loss": 1.3998767137527466, | |
| "eval_runtime": 1.3124, | |
| "eval_samples_per_second": 76.195, | |
| "eval_steps_per_second": 12.953, | |
| "step": 9900 | |
| }, | |
| { | |
| "epoch": 8.340318524727577, | |
| "grad_norm": 9.221684455871582, | |
| "learning_rate": 9.971476510067114e-06, | |
| "loss": 0.8333676147460938, | |
| "step": 9950 | |
| }, | |
| { | |
| "epoch": 8.340318524727577, | |
| "eval_loss": 1.4037277698516846, | |
| "eval_runtime": 1.3082, | |
| "eval_samples_per_second": 76.441, | |
| "eval_steps_per_second": 12.995, | |
| "step": 9950 | |
| }, | |
| { | |
| "epoch": 8.382229673093043, | |
| "grad_norm": 11.292074203491211, | |
| "learning_rate": 9.71979865771812e-06, | |
| "loss": 0.7862572479248047, | |
| "step": 10000 | |
| }, | |
| { | |
| "epoch": 8.382229673093043, | |
| "eval_loss": 1.397322416305542, | |
| "eval_runtime": 1.3084, | |
| "eval_samples_per_second": 76.432, | |
| "eval_steps_per_second": 12.993, | |
| "step": 10000 | |
| }, | |
| { | |
| "epoch": 8.424140821458508, | |
| "grad_norm": 8.441837310791016, | |
| "learning_rate": 9.468120805369128e-06, | |
| "loss": 0.8175559997558594, | |
| "step": 10050 | |
| }, | |
| { | |
| "epoch": 8.424140821458508, | |
| "eval_loss": 1.3924691677093506, | |
| "eval_runtime": 1.3062, | |
| "eval_samples_per_second": 76.56, | |
| "eval_steps_per_second": 13.015, | |
| "step": 10050 | |
| }, | |
| { | |
| "epoch": 8.466051969823972, | |
| "grad_norm": 8.004411697387695, | |
| "learning_rate": 9.216442953020134e-06, | |
| "loss": 0.8158341979980469, | |
| "step": 10100 | |
| }, | |
| { | |
| "epoch": 8.466051969823972, | |
| "eval_loss": 1.3930532932281494, | |
| "eval_runtime": 1.3076, | |
| "eval_samples_per_second": 76.474, | |
| "eval_steps_per_second": 13.001, | |
| "step": 10100 | |
| }, | |
| { | |
| "epoch": 8.507963118189439, | |
| "grad_norm": 11.439962387084961, | |
| "learning_rate": 8.96476510067114e-06, | |
| "loss": 0.7973479461669922, | |
| "step": 10150 | |
| }, | |
| { | |
| "epoch": 8.507963118189439, | |
| "eval_loss": 1.3848984241485596, | |
| "eval_runtime": 1.3077, | |
| "eval_samples_per_second": 76.467, | |
| "eval_steps_per_second": 12.999, | |
| "step": 10150 | |
| }, | |
| { | |
| "epoch": 8.549874266554903, | |
| "grad_norm": 8.261672019958496, | |
| "learning_rate": 8.713087248322148e-06, | |
| "loss": 0.8254232788085938, | |
| "step": 10200 | |
| }, | |
| { | |
| "epoch": 8.549874266554903, | |
| "eval_loss": 1.3915879726409912, | |
| "eval_runtime": 1.3064, | |
| "eval_samples_per_second": 76.545, | |
| "eval_steps_per_second": 13.013, | |
| "step": 10200 | |
| }, | |
| { | |
| "epoch": 8.591785414920368, | |
| "grad_norm": 7.573044776916504, | |
| "learning_rate": 8.461409395973156e-06, | |
| "loss": 0.8546940612792969, | |
| "step": 10250 | |
| }, | |
| { | |
| "epoch": 8.591785414920368, | |
| "eval_loss": 1.3879568576812744, | |
| "eval_runtime": 1.3084, | |
| "eval_samples_per_second": 76.432, | |
| "eval_steps_per_second": 12.993, | |
| "step": 10250 | |
| }, | |
| { | |
| "epoch": 8.633696563285834, | |
| "grad_norm": 7.979793071746826, | |
| "learning_rate": 8.209731543624162e-06, | |
| "loss": 0.7714322662353515, | |
| "step": 10300 | |
| }, | |
| { | |
| "epoch": 8.633696563285834, | |
| "eval_loss": 1.382614254951477, | |
| "eval_runtime": 1.3092, | |
| "eval_samples_per_second": 76.383, | |
| "eval_steps_per_second": 12.985, | |
| "step": 10300 | |
| }, | |
| { | |
| "epoch": 8.675607711651299, | |
| "grad_norm": 7.7293267250061035, | |
| "learning_rate": 7.958053691275168e-06, | |
| "loss": 0.8123702239990235, | |
| "step": 10350 | |
| }, | |
| { | |
| "epoch": 8.675607711651299, | |
| "eval_loss": 1.3874996900558472, | |
| "eval_runtime": 1.3111, | |
| "eval_samples_per_second": 76.27, | |
| "eval_steps_per_second": 12.966, | |
| "step": 10350 | |
| }, | |
| { | |
| "epoch": 8.717518860016764, | |
| "grad_norm": 8.47427749633789, | |
| "learning_rate": 7.706375838926176e-06, | |
| "loss": 0.8496389007568359, | |
| "step": 10400 | |
| }, | |
| { | |
| "epoch": 8.717518860016764, | |
| "eval_loss": 1.3822097778320312, | |
| "eval_runtime": 1.3104, | |
| "eval_samples_per_second": 76.311, | |
| "eval_steps_per_second": 12.973, | |
| "step": 10400 | |
| }, | |
| { | |
| "epoch": 8.75943000838223, | |
| "grad_norm": 9.511306762695312, | |
| "learning_rate": 7.454697986577181e-06, | |
| "loss": 0.8049738311767578, | |
| "step": 10450 | |
| }, | |
| { | |
| "epoch": 8.75943000838223, | |
| "eval_loss": 1.3828835487365723, | |
| "eval_runtime": 1.3084, | |
| "eval_samples_per_second": 76.431, | |
| "eval_steps_per_second": 12.993, | |
| "step": 10450 | |
| }, | |
| { | |
| "epoch": 8.801341156747695, | |
| "grad_norm": 8.368938446044922, | |
| "learning_rate": 7.203020134228189e-06, | |
| "loss": 0.8114649200439453, | |
| "step": 10500 | |
| }, | |
| { | |
| "epoch": 8.801341156747695, | |
| "eval_loss": 1.3777148723602295, | |
| "eval_runtime": 1.3099, | |
| "eval_samples_per_second": 76.343, | |
| "eval_steps_per_second": 12.978, | |
| "step": 10500 | |
| }, | |
| { | |
| "epoch": 8.84325230511316, | |
| "grad_norm": 9.411191940307617, | |
| "learning_rate": 6.951342281879195e-06, | |
| "loss": 0.8087822723388672, | |
| "step": 10550 | |
| }, | |
| { | |
| "epoch": 8.84325230511316, | |
| "eval_loss": 1.3824328184127808, | |
| "eval_runtime": 1.3148, | |
| "eval_samples_per_second": 76.056, | |
| "eval_steps_per_second": 12.93, | |
| "step": 10550 | |
| }, | |
| { | |
| "epoch": 8.885163453478626, | |
| "grad_norm": 8.663768768310547, | |
| "learning_rate": 6.699664429530202e-06, | |
| "loss": 0.8029358673095703, | |
| "step": 10600 | |
| }, | |
| { | |
| "epoch": 8.885163453478626, | |
| "eval_loss": 1.3831167221069336, | |
| "eval_runtime": 1.3228, | |
| "eval_samples_per_second": 75.6, | |
| "eval_steps_per_second": 12.852, | |
| "step": 10600 | |
| }, | |
| { | |
| "epoch": 8.92707460184409, | |
| "grad_norm": 15.937172889709473, | |
| "learning_rate": 6.447986577181208e-06, | |
| "loss": 0.8014051055908203, | |
| "step": 10650 | |
| }, | |
| { | |
| "epoch": 8.92707460184409, | |
| "eval_loss": 1.3844794034957886, | |
| "eval_runtime": 1.324, | |
| "eval_samples_per_second": 75.531, | |
| "eval_steps_per_second": 12.84, | |
| "step": 10650 | |
| }, | |
| { | |
| "epoch": 8.968985750209555, | |
| "grad_norm": 8.052315711975098, | |
| "learning_rate": 6.196308724832215e-06, | |
| "loss": 0.8436643218994141, | |
| "step": 10700 | |
| }, | |
| { | |
| "epoch": 8.968985750209555, | |
| "eval_loss": 1.384423851966858, | |
| "eval_runtime": 1.3196, | |
| "eval_samples_per_second": 75.781, | |
| "eval_steps_per_second": 12.883, | |
| "step": 10700 | |
| }, | |
| { | |
| "epoch": 9.010896898575021, | |
| "grad_norm": 11.16855239868164, | |
| "learning_rate": 5.944630872483221e-06, | |
| "loss": 0.7894086456298828, | |
| "step": 10750 | |
| }, | |
| { | |
| "epoch": 9.010896898575021, | |
| "eval_loss": 1.3818109035491943, | |
| "eval_runtime": 1.325, | |
| "eval_samples_per_second": 75.472, | |
| "eval_steps_per_second": 12.83, | |
| "step": 10750 | |
| }, | |
| { | |
| "epoch": 9.052808046940486, | |
| "grad_norm": 7.098580360412598, | |
| "learning_rate": 5.692953020134228e-06, | |
| "loss": 0.797387466430664, | |
| "step": 10800 | |
| }, | |
| { | |
| "epoch": 9.052808046940486, | |
| "eval_loss": 1.3898545503616333, | |
| "eval_runtime": 1.3197, | |
| "eval_samples_per_second": 75.773, | |
| "eval_steps_per_second": 12.881, | |
| "step": 10800 | |
| }, | |
| { | |
| "epoch": 9.09471919530595, | |
| "grad_norm": 9.469057083129883, | |
| "learning_rate": 5.441275167785235e-06, | |
| "loss": 0.7750323486328125, | |
| "step": 10850 | |
| }, | |
| { | |
| "epoch": 9.09471919530595, | |
| "eval_loss": 1.3957501649856567, | |
| "eval_runtime": 1.326, | |
| "eval_samples_per_second": 75.417, | |
| "eval_steps_per_second": 12.821, | |
| "step": 10850 | |
| }, | |
| { | |
| "epoch": 9.136630343671417, | |
| "grad_norm": 9.597176551818848, | |
| "learning_rate": 5.189597315436241e-06, | |
| "loss": 0.7900724792480469, | |
| "step": 10900 | |
| }, | |
| { | |
| "epoch": 9.136630343671417, | |
| "eval_loss": 1.396691918373108, | |
| "eval_runtime": 1.3207, | |
| "eval_samples_per_second": 75.715, | |
| "eval_steps_per_second": 12.872, | |
| "step": 10900 | |
| }, | |
| { | |
| "epoch": 9.178541492036882, | |
| "grad_norm": 9.788046836853027, | |
| "learning_rate": 4.937919463087248e-06, | |
| "loss": 0.7536945343017578, | |
| "step": 10950 | |
| }, | |
| { | |
| "epoch": 9.178541492036882, | |
| "eval_loss": 1.4024462699890137, | |
| "eval_runtime": 1.3239, | |
| "eval_samples_per_second": 75.532, | |
| "eval_steps_per_second": 12.84, | |
| "step": 10950 | |
| }, | |
| { | |
| "epoch": 9.220452640402346, | |
| "grad_norm": 7.397531509399414, | |
| "learning_rate": 4.686241610738255e-06, | |
| "loss": 0.7831705474853515, | |
| "step": 11000 | |
| }, | |
| { | |
| "epoch": 9.220452640402346, | |
| "eval_loss": 1.3908883333206177, | |
| "eval_runtime": 1.3274, | |
| "eval_samples_per_second": 75.335, | |
| "eval_steps_per_second": 12.807, | |
| "step": 11000 | |
| }, | |
| { | |
| "epoch": 9.262363788767813, | |
| "grad_norm": 16.95655059814453, | |
| "learning_rate": 4.434563758389262e-06, | |
| "loss": 0.7408596038818359, | |
| "step": 11050 | |
| }, | |
| { | |
| "epoch": 9.262363788767813, | |
| "eval_loss": 1.3987650871276855, | |
| "eval_runtime": 1.3224, | |
| "eval_samples_per_second": 75.619, | |
| "eval_steps_per_second": 12.855, | |
| "step": 11050 | |
| }, | |
| { | |
| "epoch": 9.304274937133277, | |
| "grad_norm": 9.14644718170166, | |
| "learning_rate": 4.1828859060402685e-06, | |
| "loss": 0.7788568115234376, | |
| "step": 11100 | |
| }, | |
| { | |
| "epoch": 9.304274937133277, | |
| "eval_loss": 1.3970659971237183, | |
| "eval_runtime": 1.3223, | |
| "eval_samples_per_second": 75.624, | |
| "eval_steps_per_second": 12.856, | |
| "step": 11100 | |
| }, | |
| { | |
| "epoch": 9.346186085498744, | |
| "grad_norm": 7.950272560119629, | |
| "learning_rate": 3.9312080536912755e-06, | |
| "loss": 0.7493972778320312, | |
| "step": 11150 | |
| }, | |
| { | |
| "epoch": 9.346186085498744, | |
| "eval_loss": 1.3924659490585327, | |
| "eval_runtime": 1.3168, | |
| "eval_samples_per_second": 75.94, | |
| "eval_steps_per_second": 12.91, | |
| "step": 11150 | |
| }, | |
| { | |
| "epoch": 9.388097233864208, | |
| "grad_norm": 9.526105880737305, | |
| "learning_rate": 3.679530201342282e-06, | |
| "loss": 0.7985635375976563, | |
| "step": 11200 | |
| }, | |
| { | |
| "epoch": 9.388097233864208, | |
| "eval_loss": 1.3947174549102783, | |
| "eval_runtime": 1.3216, | |
| "eval_samples_per_second": 75.664, | |
| "eval_steps_per_second": 12.863, | |
| "step": 11200 | |
| }, | |
| { | |
| "epoch": 9.430008382229673, | |
| "grad_norm": 8.139911651611328, | |
| "learning_rate": 3.4278523489932886e-06, | |
| "loss": 0.7662862396240234, | |
| "step": 11250 | |
| }, | |
| { | |
| "epoch": 9.430008382229673, | |
| "eval_loss": 1.3849976062774658, | |
| "eval_runtime": 1.3199, | |
| "eval_samples_per_second": 75.761, | |
| "eval_steps_per_second": 12.879, | |
| "step": 11250 | |
| }, | |
| { | |
| "epoch": 9.47191953059514, | |
| "grad_norm": 8.182343482971191, | |
| "learning_rate": 3.176174496644295e-06, | |
| "loss": 0.7885094451904296, | |
| "step": 11300 | |
| }, | |
| { | |
| "epoch": 9.47191953059514, | |
| "eval_loss": 1.3892548084259033, | |
| "eval_runtime": 1.3202, | |
| "eval_samples_per_second": 75.745, | |
| "eval_steps_per_second": 12.877, | |
| "step": 11300 | |
| }, | |
| { | |
| "epoch": 9.513830678960604, | |
| "grad_norm": 8.144624710083008, | |
| "learning_rate": 2.9244966442953017e-06, | |
| "loss": 0.7756035614013672, | |
| "step": 11350 | |
| }, | |
| { | |
| "epoch": 9.513830678960604, | |
| "eval_loss": 1.3939690589904785, | |
| "eval_runtime": 1.3225, | |
| "eval_samples_per_second": 75.614, | |
| "eval_steps_per_second": 12.854, | |
| "step": 11350 | |
| }, | |
| { | |
| "epoch": 9.555741827326068, | |
| "grad_norm": 7.10036563873291, | |
| "learning_rate": 2.6728187919463087e-06, | |
| "loss": 0.7817750549316407, | |
| "step": 11400 | |
| }, | |
| { | |
| "epoch": 9.555741827326068, | |
| "eval_loss": 1.3921759128570557, | |
| "eval_runtime": 1.3226, | |
| "eval_samples_per_second": 75.609, | |
| "eval_steps_per_second": 12.854, | |
| "step": 11400 | |
| }, | |
| { | |
| "epoch": 9.597652975691535, | |
| "grad_norm": 9.49985122680664, | |
| "learning_rate": 2.4211409395973157e-06, | |
| "loss": 0.7772653198242188, | |
| "step": 11450 | |
| }, | |
| { | |
| "epoch": 9.597652975691535, | |
| "eval_loss": 1.3944430351257324, | |
| "eval_runtime": 1.33, | |
| "eval_samples_per_second": 75.19, | |
| "eval_steps_per_second": 12.782, | |
| "step": 11450 | |
| }, | |
| { | |
| "epoch": 9.639564124057, | |
| "grad_norm": 7.12808895111084, | |
| "learning_rate": 2.1694630872483223e-06, | |
| "loss": 0.7531359100341797, | |
| "step": 11500 | |
| }, | |
| { | |
| "epoch": 9.639564124057, | |
| "eval_loss": 1.3952937126159668, | |
| "eval_runtime": 1.326, | |
| "eval_samples_per_second": 75.415, | |
| "eval_steps_per_second": 12.821, | |
| "step": 11500 | |
| }, | |
| { | |
| "epoch": 9.681475272422464, | |
| "grad_norm": 10.803433418273926, | |
| "learning_rate": 1.917785234899329e-06, | |
| "loss": 0.7737187957763672, | |
| "step": 11550 | |
| }, | |
| { | |
| "epoch": 9.681475272422464, | |
| "eval_loss": 1.3902716636657715, | |
| "eval_runtime": 1.3189, | |
| "eval_samples_per_second": 75.822, | |
| "eval_steps_per_second": 12.89, | |
| "step": 11550 | |
| }, | |
| { | |
| "epoch": 9.72338642078793, | |
| "grad_norm": 8.085272789001465, | |
| "learning_rate": 1.6661073825503356e-06, | |
| "loss": 0.7857860565185547, | |
| "step": 11600 | |
| }, | |
| { | |
| "epoch": 9.72338642078793, | |
| "eval_loss": 1.3925994634628296, | |
| "eval_runtime": 1.3286, | |
| "eval_samples_per_second": 75.266, | |
| "eval_steps_per_second": 12.795, | |
| "step": 11600 | |
| }, | |
| { | |
| "epoch": 9.765297569153395, | |
| "grad_norm": 9.875799179077148, | |
| "learning_rate": 1.4144295302013422e-06, | |
| "loss": 0.7734789276123046, | |
| "step": 11650 | |
| }, | |
| { | |
| "epoch": 9.765297569153395, | |
| "eval_loss": 1.3881347179412842, | |
| "eval_runtime": 1.3227, | |
| "eval_samples_per_second": 75.601, | |
| "eval_steps_per_second": 12.852, | |
| "step": 11650 | |
| }, | |
| { | |
| "epoch": 9.80720871751886, | |
| "grad_norm": 8.159299850463867, | |
| "learning_rate": 1.162751677852349e-06, | |
| "loss": 0.7852881622314453, | |
| "step": 11700 | |
| }, | |
| { | |
| "epoch": 9.80720871751886, | |
| "eval_loss": 1.3894987106323242, | |
| "eval_runtime": 1.3191, | |
| "eval_samples_per_second": 75.811, | |
| "eval_steps_per_second": 12.888, | |
| "step": 11700 | |
| }, | |
| { | |
| "epoch": 9.849119865884326, | |
| "grad_norm": 8.143811225891113, | |
| "learning_rate": 9.110738255033557e-07, | |
| "loss": 0.7636336517333985, | |
| "step": 11750 | |
| }, | |
| { | |
| "epoch": 9.849119865884326, | |
| "eval_loss": 1.3910897970199585, | |
| "eval_runtime": 1.3151, | |
| "eval_samples_per_second": 76.039, | |
| "eval_steps_per_second": 12.927, | |
| "step": 11750 | |
| }, | |
| { | |
| "epoch": 9.89103101424979, | |
| "grad_norm": 8.091937065124512, | |
| "learning_rate": 6.593959731543624e-07, | |
| "loss": 0.7982243347167969, | |
| "step": 11800 | |
| }, | |
| { | |
| "epoch": 9.89103101424979, | |
| "eval_loss": 1.3893768787384033, | |
| "eval_runtime": 1.3109, | |
| "eval_samples_per_second": 76.281, | |
| "eval_steps_per_second": 12.968, | |
| "step": 11800 | |
| }, | |
| { | |
| "epoch": 9.932942162615255, | |
| "grad_norm": 8.95569133758545, | |
| "learning_rate": 4.0771812080536915e-07, | |
| "loss": 0.7712848663330079, | |
| "step": 11850 | |
| }, | |
| { | |
| "epoch": 9.932942162615255, | |
| "eval_loss": 1.3869608640670776, | |
| "eval_runtime": 1.3678, | |
| "eval_samples_per_second": 73.111, | |
| "eval_steps_per_second": 12.429, | |
| "step": 11850 | |
| }, | |
| { | |
| "epoch": 9.974853310980722, | |
| "grad_norm": 8.08212947845459, | |
| "learning_rate": 1.5604026845637585e-07, | |
| "loss": 0.7806188201904297, | |
| "step": 11900 | |
| }, | |
| { | |
| "epoch": 9.974853310980722, | |
| "eval_loss": 1.3863285779953003, | |
| "eval_runtime": 1.3533, | |
| "eval_samples_per_second": 73.896, | |
| "eval_steps_per_second": 12.562, | |
| "step": 11900 | |
| } | |
| ], | |
| "logging_steps": 50, | |
| "max_steps": 11930, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 10, | |
| "save_steps": 500, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 1.0860409316671488e+17, | |
| "train_batch_size": 4, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |