Text Generation
Transformers
TensorBoard
Safetensors
gemma3_text
Generated from Trainer
conversational
text-generation-inference
Instructions to use Scale-or-Reason/gemma3-1B_0_split with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Scale-or-Reason/gemma3-1B_0_split with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Scale-or-Reason/gemma3-1B_0_split") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Scale-or-Reason/gemma3-1B_0_split") model = AutoModelForCausalLM.from_pretrained("Scale-or-Reason/gemma3-1B_0_split") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Scale-or-Reason/gemma3-1B_0_split with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Scale-or-Reason/gemma3-1B_0_split" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Scale-or-Reason/gemma3-1B_0_split", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Scale-or-Reason/gemma3-1B_0_split
- SGLang
How to use Scale-or-Reason/gemma3-1B_0_split with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Scale-or-Reason/gemma3-1B_0_split" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Scale-or-Reason/gemma3-1B_0_split", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Scale-or-Reason/gemma3-1B_0_split" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Scale-or-Reason/gemma3-1B_0_split", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Scale-or-Reason/gemma3-1B_0_split with Docker Model Runner:
docker model run hf.co/Scale-or-Reason/gemma3-1B_0_split
| 3: W1124 00:03:06.850000 675180 torch/distributed/run.py:792] | |
| 3: W1124 00:03:06.850000 675180 torch/distributed/run.py:792] ***************************************** | |
| 3: W1124 00:03:06.850000 675180 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| 3: W1124 00:03:06.850000 675180 torch/distributed/run.py:792] ***************************************** | |
| 0: W1124 00:03:06.866000 4127050 torch/distributed/run.py:792] | |
| 0: W1124 00:03:06.866000 4127050 torch/distributed/run.py:792] ***************************************** | |
| 0: W1124 00:03:06.866000 4127050 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| 0: W1124 00:03:06.866000 4127050 torch/distributed/run.py:792] ***************************************** | |
| 2: W1124 00:03:06.882000 628563 torch/distributed/run.py:792] | |
| 2: W1124 00:03:06.882000 628563 torch/distributed/run.py:792] ***************************************** | |
| 2: W1124 00:03:06.882000 628563 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| 2: W1124 00:03:06.882000 628563 torch/distributed/run.py:792] ***************************************** | |
| 1: W1124 00:03:06.884000 2620875 torch/distributed/run.py:792] | |
| 1: W1124 00:03:06.884000 2620875 torch/distributed/run.py:792] ***************************************** | |
| 1: W1124 00:03:06.884000 2620875 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| 1: W1124 00:03:06.884000 2620875 torch/distributed/run.py:792] ***************************************** | |
| 0: | |
| Dropping Long Sequences (>16384) (num_proc=192): 0%| | 1000/557277 Dropping Long Sequences (>16384) (num_proc=192): 5%|β | 27000/557277 Dropping Long Sequences (>16384) (num_proc=192): 8%|β | 43000/557277 Dropping Long Sequences (>16384) (num_proc=192): 11%|ββ | 64000/557277 Dropping Long Sequences (>16384) (num_proc=192): 15%|ββ | 82000/557277 Dropping Long Sequences (>16384) (num_proc=192): 18%|ββ | 101000/557277 Dropping Long Sequences (>16384) (num_proc=192): 21%|ββ | 118030/557277 Dropping Long Sequences (>16384) (num_proc=192): 24%|βββ | 131545/557277 | |
| Dropping Long Sequences (>16384) (num_proc=192): 48%|βββββ | 265598/557277 Dropping Long Sequences (>16384) (num_proc=192): 79%|ββββββββ | 437843/557277 Dropping Long Sequences (>16384) (num_proc=192): 94%|ββββββββββ| 525217/557277 Dropping Long Sequences (>16384) (num_proc=192): 100%|ββββββββββ| 557277/557277 | |
| 1: Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 0/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 1000/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 2%|β | 9000/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 3%|β | 17000/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 6%|β | 32000/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 8%|β | 45899/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 10%|β | 57596/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 12%|ββ | 67990/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 14%| | |
| 1: ββ | 76788/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 15%|ββ | 84485/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 16%|ββ | 91182/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 18%|ββ | 97980/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 19%|ββ | 104778/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 21%|ββ | 114475/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 22%|βββ | 122273/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 23%|βββ | 128869/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 24%|βββ | |
| 1: | 135566/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 25%|βββ | 141465/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 26%|βββ | 147263/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 27%|βββ | 152960/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 53%|ββββββ | 295112/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 95%|ββββββββββ| 531123/556595 Drop Samples with Zero Trainable Tokens (num_proc=192): 100%|ββββββββββ| 556595/556595 | |
| 1: Add position_id column (Sample Packing) (num_proc=192): 0%| | 0/556595 Add position_id column (Sample Packing) (num_proc=192): 0%| | 1000/556595 Add position_id column (Sample Packing) (num_proc=192): 2%|β | 13000/556595 Add position_id column (Sample Packing) (num_proc=192): 5%|β | 28000/556595 Add position_id column (Sample Packing) (num_proc=192): 7%|β | 39000/556595 Add position_id column (Sample Packing) (num_proc=192): 10%|β | 55000/556595 Add position_id column (Sample Packing) (num_proc=192): 13%|ββ | 71000/556595 Add position_id column (Sample Packing) (num_proc=192): 15%|ββ | 84000/556595 Add position_id column (Sample Packing) (num_proc=192): 1 | |
| 1: 8%|ββ | 98000/556595 Add position_id column (Sample Packing) (num_proc=192): 20%|ββ | 113000/556595 Add position_id column (Sample Packing) (num_proc=192): 23%|βββ | 126495/556595 Add position_id column (Sample Packing) (num_proc=192): 49%|βββββ | 272354/556595 Add position_id column (Sample Packing) (num_proc=192): 64%|βββββββ | 358213/556595 Add position_id column (Sample Packing) (num_proc=192): 75%|ββββββββ | 416395/556595 Add position_id column (Sample Packing) (num_proc=192): 83%|βββββββββ | 464577/556595 Add position_id column | |
| 1: (Sample Packing) (num_proc=192): 91%|βββββββββ | 505153/556595 Add position_id column (Sample Packing) (num_proc=192): 97%|ββββββββββ| 540321/556595 Add position_id column (Sample Packing) (num_proc=192): 100%|ββββββββββ| 556595/556595 | |
| 1: Saving the dataset (0/192 shards): 0%| | 0/556595 Saving the dataset (0/192 shards): 1%| | 2899/556595 Saving the dataset (1/192 shards): 1%| | 2899/556595 Saving the dataset (2/192 shards): 1%| | 5798/556595 Saving the dataset (3/192 shards): 2%|β | 8697/556595 Saving the dataset (4/192 shards): 2%|β | 11596/556595 Saving the dataset (5/192 shards): 3%|β | 14495/556595 Saving the dataset (6/192 shards): 3%|β | 17394/556595 Saving the dataset (7/192 shards): 4%|β | 20293/556595 Saving the dataset (8/192 shards): 4%|β | 23192/556595 Saving the dataset (9/192 shards): 5% | |
| 1: |β | 26091/556595 Saving the dataset (10/192 shards): 5%|β | 28990/556595 Saving the dataset (11/192 shards): 6%|β | 31889/556595 Saving the dataset (12/192 shards): 6%|β | 34788/556595 Saving the dataset (13/192 shards): 7%|β | 37687/556595 Saving the dataset (14/192 shards): 7%|β | 40586/556595 Saving the dataset (15/192 shards): 8%|β | 43485/556595 Saving the dataset (16/192 shards): 8%|β | 46384/556595 Saving the dataset (17/192 shards): 9%|β | 49283/556595 Saving the dataset (18/192 shards): 9%|β | 52182/556595 Saving the dataset (19/192 shards): 10%|β | |
| 1: | 57980/556595 Saving the dataset (20/192 shards): 10%|β | 57980/556595 Saving the dataset (20/192 shards): 11%|β | 60879/556595 Saving the dataset (21/192 shards): 11%|β | 60879/556595 Saving the dataset (22/192 shards): 11%|ββ | 63778/556595 Saving the dataset (23/192 shards): 12%|ββ | 66677/556595 Saving the dataset (24/192 shards): 13%|ββ | 69576/556595 Saving the dataset (25/192 shards): 13%|ββ | 72475/556595 Saving the dataset (26/192 shards): 14%|ββ | 75374/556595 Saving the dataset (27/192 shards): 14%|ββ | 78273/556595 Saving the dataset (28/192 s | |
| 1: hards): 15%|ββ | 81172/556595 Saving the dataset (29/192 shards): 15%|ββ | 84071/556595 Saving the dataset (30/192 shards): 16%|ββ | 86970/556595 Saving the dataset (31/192 shards): 16%|ββ | 89869/556595 Saving the dataset (32/192 shards): 17%|ββ | 92768/556595 Saving the dataset (32/192 shards): 17%|ββ | 95667/556595 Saving the dataset (33/192 shards): 17%|ββ | 95667/556595 Saving the dataset (34/192 shards): 18%|ββ | 98566/556595 Saving the dataset (35/192 shards): 18%|ββ | 101465/556595 Saving the dataset (36/192 shards): 19%|ββ | 104364/556595 | |
| 1: Saving the dataset (37/192 shards): 19%|ββ | 107263/556595 Saving the dataset (38/192 shards): 20%|ββ | 110162/556595 Saving the dataset (39/192 shards): 20%|ββ | 113061/556595 Saving the dataset (40/192 shards): 21%|ββ | 115960/556595 Saving the dataset (41/192 shards): 21%|βββ | 118859/556595 Saving the dataset (41/192 shards): 22%|βββ | 121758/556595 Saving the dataset (42/192 shards): 22%|βββ | 121758/556595 Saving the dataset (43/192 shards): 22%|βββ | 124657/556595 Saving the dataset (44/192 shards): 23%|βββ | 127556/556595 Saving the dataset (45/192 shards): 23%|βββ | 1 | |
| 1: 30455/556595 Saving the dataset (46/192 shards): 24%|βββ | 133354/556595 Saving the dataset (47/192 shards): 24%|βββ | 136253/556595 Saving the dataset (48/192 shards): 25%|βββ | 139152/556595 Saving the dataset (49/192 shards): 26%|βββ | 142051/556595 Saving the dataset (50/192 shards): 26%|βββ | 144950/556595 Saving the dataset (50/192 shards): 27%|βββ | 147849/556595 Saving the dataset (51/192 shards): 27%|βββ | 147849/556595 Saving the dataset (52/192 shards): 27%|βββ | 150748/556595 Saving the dataset (53/192 shards): 28%|βββ | 153647/556595 | |
| 1: Saving the dataset (54/192 shards): 28%|βββ | 156546/556595 Saving the dataset (55/192 shards): 29%|βββ | 159445/556595 Saving the dataset (56/192 shards): 29%|βββ | 162344/556595 Saving the dataset (57/192 shards): 30%|βββ | 165243/556595 Saving the dataset (58/192 shards): 30%|βββ | 168142/556595 Saving the dataset (58/192 shards): 31%|βββ | 171041/556595 Saving the dataset (59/192 shards): 31%|βββ | 171041/556595 Saving the dataset (60/192 shards): 31%|ββββ | 173940/556595 Saving the dataset (61/192 shards): 32%|ββββ | 176839/556595 Saving the dataset (62/192 shards): 32 | |
| 1: %|ββββ | 179738/556595 Saving the dataset (63/192 shards): 33%|ββββ | 185536/556595 Saving the dataset (64/192 shards): 33%|ββββ | 185536/556595 Saving the dataset (65/192 shards): 34%|ββββ | 188435/556595 Saving the dataset (66/192 shards): 34%|ββββ | 191334/556595 Saving the dataset (66/192 shards): 35%|ββββ | 194233/556595 Saving the dataset (67/192 shards): 35%|ββββ | 194233/556595 Saving the dataset (68/192 shards): 35%|ββββ | 197132/556595 Saving the dataset (69/192 shards): 36%|ββββ | 200031/556595 Saving the dataset (70/192 shards): 36%|ββββ | 202 | |
| 1: 930/556595 Saving the dataset (71/192 shards): 37%|ββββ | 205829/556595 Saving the dataset (72/192 shards): 38%|ββββ | 208728/556595 Saving the dataset (73/192 shards): 38%|ββββ | 211627/556595 Saving the dataset (73/192 shards): 39%|ββββ | 214526/556595 Saving the dataset (74/192 shards): 39%|ββββ | 214526/556595 Saving the dataset (75/192 shards): 39%|ββββ | 217425/556595 Saving the dataset (76/192 shards): 40%|ββββ | 220324/556595 Saving the dataset (77/192 shards): 40%|ββββ | 223223/556595 Saving the dataset (78/192 shards): 41%|ββββ | 226122/556595 | |
| Saving the dataset (79/192 shards): 41%|ββββ | 229021/556595 Saving the dataset (80/192 shards): 42%|βββββ | 231920/556595 Saving the dataset (81/192 shards): 42%|βββββ | 234819/556595 Saving the dataset (81/192 shards): 43%|βββββ | 237718/556595 Saving the dataset (82/192 shards): 43%|βββββ | 237718/556595 Saving the dataset (83/192 shards): 43%|βββββ | 240617/556595 Saving the dataset (84/192 shards): 44%|βββββ | 243516/556595 Saving the dataset (85/192 shards): 44%|βββββ | 246415/556595 Saving the dataset (86/192 shards): 45%|βββββ | 249314/556595 | |
| Saving the dataset (87/192 shards): 45%|βββββ | 252213/556595 Saving the dataset (88/192 shards): 46%|βββββ | 255112/556595 Saving the dataset (88/192 shards): 46%|βββββ | 258011/556595 Saving the dataset (89/192 shards): 46%|βββββ | 258011/556595 Saving the dataset (90/192 shards): 47%|βββββ | 260910/556595 Saving the dataset (91/192 shards): 47%|βββββ | 263809/556595 Saving the dataset (92/192 shards): 48%|βββββ | 266708/556595 Saving the dataset (93/192 shards): 48%|βββββ | 269607/556595 Saving the dataset (94/192 shards): 49%|βββββ | 272506/556595 | |
| Saving the dataset (95/192 shards): 49%|βββββ | 275405/556595 Saving the dataset (96/192 shards): 50%|βββββ | 278304/556595 Saving the dataset (96/192 shards): 51%|βββββ | 281203/556595 Saving the dataset (97/192 shards): 51%|βββββ | 281203/556595 Saving the dataset (98/192 shards): 52%|ββββββ | 289900/556595 Saving the dataset (99/192 shards): 52%|ββββββ | 289900/556595 Saving the dataset (100/192 shards): 52%|ββββββ | 289900/556595 Saving the dataset (101/192 shards): 53%|ββββββ | 292799/556595 Saving the dataset (102/192 shards): 53%|ββββββ | 295698/556595 | |
| Saving the dataset (103/192 shards): 54%|ββββββ | 298597/556595 Saving the dataset (103/192 shards): 54%|ββββββ | 301496/556595 Saving the dataset (104/192 shards): 55%|ββββββ | 304395/556595 Saving the dataset (105/192 shards): 55%|ββββββ | 304395/556595 Saving the dataset (106/192 shards): 55%|ββββββ | 307294/556595 Saving the dataset (107/192 shards): 56%|ββββββ | 310193/556595 Saving the dataset (108/192 shards): 56%|ββββββ | 313092/556595 Saving the dataset (109/192 shards): 57%|ββββββ | 315991/556595 Saving the dataset (110/192 shards): 57%|ββββββ | 318890/556595 | |
| Saving the dataset (110/192 shards): 58%|ββββββ | 321789/556595 Saving the dataset (111/192 shards): 58%|ββββββ | 321789/556595 Saving the dataset (112/192 shards): 58%|ββββββ | 324688/556595 Saving the dataset (113/192 shards): 59%|ββββββ | 327587/556595 Saving the dataset (114/192 shards): 59%|ββββββ | 330486/556595 Saving the dataset (115/192 shards): 60%|ββββββ | 333385/556595 Saving the dataset (116/192 shards): 60%|ββββββ | 336284/556595 Saving the dataset (117/192 shards): 61%|ββββββ | 339183/556595 Saving the dataset (118/192 shards): 61%|βββββββ | |
| 1: | 342082/556595 Saving the dataset (118/192 shards): 62%|βββββββ | 344981/556595 Saving the dataset (119/192 shards): 62%|βββββββ | 344981/556595 Saving the dataset (120/192 shards): 63%|βββββββ | 347880/556595 Saving the dataset (121/192 shards): 63%|βββββββ | 350779/556595 Saving the dataset (122/192 shards): 64%|βββββββ | 353678/556595 Saving the dataset (123/192 shards): 64%|βββββββ | 356577/556595 Saving the dataset (124/192 shards): 65%|βββββββ | 359476/556595 Saving the dataset (125/192 shards): 66%|βββββββ | 365274/556595 Saving the dataset (126/19 | |
| 1: 2 shards): 66%|βββββββ | 365274/556595 Saving the dataset (126/192 shards): 66%|βββββββ | 368173/556595 Saving the dataset (127/192 shards): 66%|βββββββ | 368173/556595 Saving the dataset (128/192 shards): 67%|βββββββ | 371072/556595 Saving the dataset (129/192 shards): 68%|βββββββ | 376870/556595 Saving the dataset (130/192 shards): 68%|βββββββ | 376870/556595 Saving the dataset (131/192 shards): 70%|βββββββ | 390466/556595 Saving the dataset (132/192 shards): 70%|βββββββ | 390466/556595 Saving the dataset (133/192 shards): 70%|βββββββ | 390466/556595 | |
| Saving the dataset (134/192 shards): 71%|βββββββ | 393365/556595 Saving the dataset (135/192 shards): 73%|ββββββββ | 408062/556595 Saving the dataset (136/192 shards): 75%|ββββββββ | 416759/556595 Saving the dataset (137/192 shards): 75%|ββββββββ | 416759/556595 Saving the dataset (138/192 shards): 77%|ββββββββ | 427658/556595 Saving the dataset (139/192 shards): 79%|ββββββββ | 439254/556595 Saving the dataset (140/192 shards): 79%|ββββββββ | 442153/556595 Saving the dataset (141/192 shards): 79%|ββββββββ | 442153/556595 Saving the dataset (142/192 shards): 79%|ββββ | |
| 1: ββββ | 442153/556595 Saving the dataset (143/192 shards): 80%|ββββββββ | 445052/556595 Saving the dataset (144/192 shards): 80%|ββββββββ | 445052/556595 Saving the dataset (145/192 shards): 80%|ββββββββ | 447052/556595 Saving the dataset (146/192 shards): 82%|βββββββββ | 454648/556595 Saving the dataset (147/192 shards): 82%|βββββββββ | 455547/556595 Saving the dataset (148/192 shards): 82%|βββββββββ | 458446/556595 Saving the dataset (149/192 shards): 84%|βββββββββ | 467143/556595 Saving the dataset (150/192 shards): 84%|βββββββββ | 467143/556595 | |
| Saving the dataset (151/192 shards): 84%|βββββββββ | 468042/556595 Saving the dataset (152/192 shards): 84%|βββββββββ | 468042/556595 Saving the dataset (153/192 shards): 84%|βββββββββ | 468042/556595 Saving the dataset (154/192 shards): 84%|βββββββββ | 468941/556595 Saving the dataset (155/192 shards): 84%|βββββββββ | 469840/556595 Saving the dataset (156/192 shards): 85%|βββββββββ | 471840/556595 Saving the dataset (157/192 shards): 85%|βββββββββ | 474739/556595 Saving the dataset (158/192 shards): 85%|βββββββββ | 474739/556595 Saving the dataset (159/192 shards): | |
| 1: 85%|βββββββββ | 474739/556595 Saving the dataset (160/192 shards): 85%|βββββββββ | 474739/556595 Saving the dataset (161/192 shards): 85%|βββββββββ | 475638/556595 Saving the dataset (162/192 shards): 87%|βββββββββ | 484537/556595 Saving the dataset (163/192 shards): 87%|βββββββββ | 484537/556595 Saving the dataset (164/192 shards): 89%|βββββββββ | 496132/556595 Saving the dataset (165/192 shards): 91%|βββββββββ | 503929/556595 Saving the dataset (166/192 shards): 91%|βββββββββ | 503929/556595 Saving the dataset (167/192 shards): 91%|βββββββββ | 503929/55659 | |
| 1: 5 Saving the dataset (167/192 shards): 91%|βββββββββ | 506828/556595 Saving the dataset (168/192 shards): 91%|ββββββββββ| 508828/556595 Saving the dataset (169/192 shards): 91%|ββββββββββ| 508828/556595 Saving the dataset (170/192 shards): 92%|ββββββββββ| 509727/556595 Saving the dataset (171/192 shards): 93%|ββββββββββ| 518221/556595 Saving the dataset (172/192 shards): 94%|ββββββββββ| 521119/556595 Saving the dataset (173/192 shards): 94%|ββββββββββ| 524017/556595 Saving the dataset (174/192 shards): 95%|ββββββββββ| 526915/556595 | |
| Saving the dataset (175/192 shards): 95%|ββββββββββ| 526915/556595 Saving the dataset (176/192 shards): 95%|ββββββββββ| 527814/556595 Saving the dataset (177/192 shards): 95%|ββββββββββ| 527814/556595 Saving the dataset (178/192 shards): 95%|ββββββββββ| 529611/556595 Saving the dataset (179/192 shards): 95%|ββββββββββ| 529611/556595 Saving the dataset (180/192 shards): 95%|ββββββββββ| 529611/556595 Saving the dataset (181/192 shards): 95%|ββββββββββ| 529611/556595 Saving the dataset (182/192 shards): 95%|ββββββββββ| 529611/556595 Saving the dataset (183/192 | |
| 1: shards): 97%|ββββββββββ| 538306/556595 Saving the dataset (184/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (185/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (186/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (187/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (188/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (189/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (190/192 shards): 99%|ββββββββββ| 553697/556595 Saving the dataset (191/192 shards): 99%|ββββββ | |
| 1: ββββ| 553697/556595 Saving the dataset (192/192 shards): 100%|ββββββββββ| 556595/556595 Saving the dataset (192/192 shards): 100%|ββββββββββ| 556595/556595 | |
| 0: | |
| 0%| | 2/711 1%| | 5/711 1%| | 6/711 1%| | 7/711 1%| | 8/711 1%|β | 9/711 1%|β | 10/711 2%|β | 11/711 2%|β | 12/711 2%|β | 13/711 2%|β | 14/711 2%|β | 15/711 2%|β | 16/711 2%|β | 17/711 3%|β | 18/711 3%|οΏ½ | |
| 0: {'loss': 0.9902, 'grad_norm': 1.160769879445861, 'learning_rate': 5.420000000000001e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.02} | |
| 0: {'loss': 0.9244, 'grad_norm': 1.058149478414618, 'learning_rate': 7.22e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.03} | |
| 0: οΏ½οΏ½ | 19/711 3%|β | 20/711 3%|β | 20/711 3%|β | 21/711 3%|β | 22/711 3%|β | 23/711 3%|β | 24/711 4%|β | 25/711 4%|β | 26/711 4%|β | 27/711 4%|β | 28/711 4%|β | 29/711 4%|β | 30/711 4%|β | 30/711 4%|β | 31/711 5%|β | 32/711 5%|β | 33/711 5%|β | 34/711 5%|β | |
| 0: {'loss': 0.8932, 'grad_norm': 0.9348341250213281, 'learning_rate': 9.020000000000002e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.03} | |
| 0: {'loss': 0.8577, 'grad_norm': 0.794975085250345, 'learning_rate': 1.0820000000000001e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.04} | |
| 0: | 35/711 5%|β | 36/711 5%|β | 37/711 5%|β | 38/711 5%|β | 39/711 6%|β | 40/711 6%|β | 40/711 6%|β | 41/711 6%|β | 42/711 6%|β | 43/711 6%|β | 44/711 6%|β | 45/711 6%|β | 46/711 7%|β | 47/711 7%|β | 48/711 7%|β | 49/711 7%|β | 50/711 7%|β | 50/711 7%|β | | |
| 0: {'loss': 0.8286, 'grad_norm': 0.8899530497730146, 'learning_rate': 1.2620000000000001e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.05} | |
| 0: 51/711 7%|β | 52/711 7%|β | 53/711 8%|β | 54/711 8%|β | 55/711 8%|β | 56/711 8%|β | 57/711 8%|β | 58/711 8%|β | 59/711 8%|β | 60/711 8%|β | 60/711 9%|β | 61/711 9%|β | 62/711 9%|β | 63/711 9%|β | 64/711 9%|β | 65/711 9%|β | 66/711 9%|β | 67/711 10%|β | 68/711 10%|β | 69/7 | |
| 0: {'loss': 0.8345, 'grad_norm': 0.9574271371234939, 'learning_rate': 1.4420000000000001e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.06} | |
| 0: {'loss': 0.8208, 'grad_norm': 0.9418691707757363, 'learning_rate': 1.6220000000000004e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.07} | |
| 0: 11 10%|β | 70/711 10%|β | 70/711 10%|β | 71/711 10%|β | 72/711 10%|β | 73/711 10%|β | 74/711 11%|β | 75/711 11%|β | 76/711 11%|β | 77/711 11%|β | 78/711 11%|β | 79/711 11%|ββ | 80/711 11%|ββ | 80/711 11%|ββ | 81/711 12%|ββ | 82/711 12%|ββ | 83/711 12%|ββ | 84/711 12%|ββ | |
| 0: {'loss': 0.7899, 'grad_norm': 0.9662815954776516, 'learning_rate': 1.802e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.08} | |
| 0: {'loss': 0.7661, 'grad_norm': 1.2089037525815847, 'learning_rate': 1.982e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.08} | |
| 0: | 85/711 12%|ββ | 86/711 12%|ββ | 87/711 12%|ββ | 88/711 13%|ββ | 89/711 13%|ββ | 90/711 13%|ββ | 90/711 13%|ββ | 91/711 13%|ββ | 92/711 13%|ββ | 93/711 13%|ββ | 94/711 13%|ββ | 95/711 14%|ββ | 96/711 14%|ββ | 97/711 14%|ββ | 98/711 14%|ββ | 99/711 14%|ββ | 100/711 14%|ββ | 100/711 | |
| 14%|ββ | 101/711 14%|ββ | 102/711 14%|ββ | 103/711 15%|ββ | 104/711 15%|ββ | 105/711 15%|ββ | 106/711 15%|ββ | 107/711 15%|ββ | 108/711 15%|ββ | 109/711 15%|ββ | 110/711 15%|ββ | 110/711 16%|ββ | 111/711 16%|ββ | 112/711 16%|ββ | 113/711 16%|ββ | 114/711 16%|ββ | 115/711 16%|ββ | 116/711 16%|ββ | 117/711 | |
| 17%|ββ | 118/711 17%|ββ | 119/711 17%|ββ | 120/711 17%|ββ | 120/711 17%|ββ | 121/711 17%|ββ | 122/711 17%|ββ | 123/711 17%|ββ | 124/711 18%|ββ | 125/711 18%|ββ | 126/711 18%|ββ | 127/711 18%|ββ | 128/711 18%|ββ | 129/711 18%|ββ | 130/711 18%|ββ | 130/711 18%|ββ | 131/711 19%|ββ | 132/711 | |
| 19%|ββ | 133/711 19%|ββ | 134/711 19%|ββ | 135/711 19%|ββ | 136/711 19%|ββ | 137/711 19%|ββ | 138/711 20%|ββ | 139/711 20%|ββ | 140/711 20%|ββ | 140/711 20%|ββ | 141/711 20%|ββ | 142/711 20%|ββ | 143/711 20%|ββ | 144/711 20%|ββ | 145/711 21%|ββ | 146/711 21%|ββ | 147/711 21%|ββ | 148/711 21%|ββ | 149/711 | |
| 21%|ββ | 150/711 21%|ββ | 150/711 21%|ββ | 151/711 21%|βββ | 152/711 22%|βββ | 153/711 22%|βββ | 154/711 22%|βββ | 155/711 22%|βββ | 156/711 22%|βββ | 157/711 22%|βββ | 158/711 22%|βββ | 159/711 23%|βββ | 160/711 23%|βββ | 160/711 23%|βββ | 161/711 23%|βββ | 162/711 23%|βββ | 163/711 23%|βββ | | |
| 0: {'loss': 0.7515, 'grad_norm': 0.9756062058818691, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.14} | |
| 0: 164/711 23%|βββ | 165/711 23%|βββ | 166/711 23%|βββ | 167/711 24%|βββ | 168/711 24%|βββ | 169/711 24%|βββ | 170/711 24%|βββ | 170/711 24%|βββ | 171/711 24%|βββ | 172/711 24%|βββ | 173/711 24%|βββ | 174/711 25%|βββ | 175/711 25%|βββ | 176/711 25%|βββ | 177/711 25%|βββ | 178/711 25%|βββ | 179/711 25%|βββ | 180/711 | |
| 25%|βββ | 180/711 25%|βββ | 181/711 26%|βββ | 182/711 26%|βββ | 183/711 26%|βββ | 184/711 26%|βββ | 185/711 26%|βββ | 186/711 26%|βββ | 187/711 26%|βββ | 188/711 27%|βββ | 189/711 27%|βββ | 190/711 27%|βββ | 190/711 27%|βββ | 191/711 27%|βββ | 192/711 27%|βββ | 193/711 27%|βββ | 194/711 27%|ββοΏ½ | |
| 0: {'loss': 0.7419, 'grad_norm': 0.9093993578822128, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.17} | |
| 0: οΏ½ | 195/711 28%|βββ | 196/711 28%|βββ | 197/711 28%|βββ | 198/711 28%|βββ | 199/711 28%|βββ | 200/711 28%|βββ | 200/711 28%|βββ | 201/711 28%|βββ | 202/711 29%|βββ | 203/711 29%|βββ | 204/711 29%|βββ | 205/711 29%|βββ | 206/711 29%|βββ | 207/711 29%|βββ | 208/711 29%|βββ | 209/711 30%|βββ | 210/711 | |
| 0: {'loss': 0.7289, 'grad_norm': 0.9027102829983507, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.18} | |
| 0: {'loss': 0.7468, 'grad_norm': 0.8749606638751157, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.19} | |
| 0: 30%|βββ | 210/711 30%|βββ | 211/711 30%|βββ | 212/711 30%|βββ | 213/711 30%|βββ | 214/711 30%|βββ | 215/711 30%|βββ | 216/711 31%|βββ | 217/711 31%|βββ | 218/711 31%|βββ | 219/711 31%|βββ | 220/711 31%|βββ | 220/711 31%|βββ | 221/711 31%|βββ | 222/711 31%|ββββ | 223/711 32%|ββββ | 224/711 32%|ββββ | 225/711 | |
| 32%|ββββ | 226/711 32%|ββββ | 227/711 32%|ββββ | 228/711 32%|ββββ | 229/711 32%|ββββ | 230/711 32%|ββββ | 230/711 32%|ββββ | 231/711 33%|ββββ | 232/711 33%|ββββ | 233/711 33%|ββββ | 234/711 33%|ββββ | 235/711 33%|ββββ | 236/711 33%|ββββ | 237/711 33%|ββββ | 238/711 34%|ββββ | 239/711 34%|ββββ | 240/711 | |
| 0: {'loss': 0.7361, 'grad_norm': 0.8938800949879834, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.2} | |
| 0: {'loss': 0.7165, 'grad_norm': 0.8865745843602661, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.21} | |
| 0: 34%|ββββ | 240/711 34%|ββββ | 241/711 34%|ββββ | 242/711 34%|ββββ | 243/711 34%|ββββ | 244/711 34%|ββββ | 245/711 35%|ββββ | 246/711 35%|ββββ | 247/711 35%|ββββ | 248/711 35%|ββββ | 249/711 35%|ββββ | 250/711 35%|ββββ | 250/711 35%|ββββ | 251/711 35%|ββββ | 252/711 36%|ββββ | 253/711 36%|ββββ | 254/711 36%|ββββ | 255/711 | |
| 36%|ββββ | 256/711 36%|ββββ | 257/711 36%|ββββ | 258/711 36%|ββββ | 259/711 37%|ββββ | 260/711 37%|ββββ | 260/711 37%|ββββ | 261/711 37%|ββββ | 262/711 37%|ββββ | 263/711 37%|ββββ | 264/711 37%|ββββ | 265/711 37%|ββββ | 266/711 38%|ββββ | 267/711 38%|ββββ | 268/711 38%|ββββ | 269/711 38%|ββββ | 270/711 | |
| 0: {'loss': 0.7113, 'grad_norm': 0.9763568156539483, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.23} | |
| 0: {'loss': 0.7247, 'grad_norm': 0.9774181687381295, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.24} | |
| 0: 38%|ββββ | 270/711 38%|ββββ | 271/711 38%|ββββ | 272/711 38%|ββββ | 273/711 39%|ββββ | 274/711 39%|ββββ | 275/711 39%|ββββ | 276/711 39%|ββββ | 277/711 39%|ββββ | 278/711 39%|ββββ | 279/711 39%|ββββ | 280/711 39%|ββββ | 280/711 40%|ββββ | 281/711 40%|ββββ | 282/711 40%|ββββ | 283/711 40%|ββββ | 284/711 40%|ββββ | |
| 0: {'loss': 0.6936, 'grad_norm': 0.8964349971328741, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.24} | |
| 0: | 285/711 40%|ββββ | 286/711 40%|ββββ | 287/711 41%|ββββ | 288/711 41%|ββββ | 289/711 41%|ββββ | 290/711 41%|ββββ | 290/711 41%|ββββ | 291/711 41%|ββββ | 292/711 41%|ββββ | 293/711 41%|βββββ | 294/711 41%|βββββ | 295/711 42%|βββββ | 296/711 42%|βββββ | 297/711 42%|βββββ | 298/711 42%|βββββ | 299/711 42%|βββββ | 300/711 | |
| 42%|βββββ | 300/711 42%|βββββ | 301/711 42%|βββββ | 302/711 43%|βββββ | 303/711 43%|βββββ | 304/711 43%|βββββ | 305/711 43%|βββββ | 306/711 43%|βββββ | 307/711 43%|βββββ | 308/711 43%|βββββ | 309/711 44%|βββββ | 310/711 44%|βββββ | 310/711 44%|βββββ | 311/711 44%|βββββ | 312/711 44%|βββββ | 313/711 44%|ββοΏ½ | |
| 0: {'loss': 0.7113, 'grad_norm': 0.8590726758549844, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.27} | |
| 0: οΏ½ββ | 314/711 44%|βββββ | 315/711 44%|βββββ | 316/711 45%|βββββ | 317/711 45%|βββββ | 318/711 45%|βββββ | 319/711 45%|βββββ | 320/711 45%|βββββ | 320/711 45%|βββββ | 321/711 45%|βββββ | 322/711 45%|βββββ | 323/711 46%|βββββ | 324/711 46%|βββββ | 325/711 46%|βββββ | 326/711 46%|βββββ | 327/711 46%|βββββ | 328/711 46%|βββββ | | |
| 0: {'loss': 0.7097, 'grad_norm': 0.9011059611829694, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.28} | |
| 0: {'loss': 0.695, 'grad_norm': 0.8452924322256501, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.29} | |
| 0: 329/711 46%|βββββ | 330/711 46%|βββββ | 330/711 47%|βββββ | 331/711 47%|βββββ | 332/711 47%|βββββ | 333/711 47%|βββββ | 334/711 47%|βββββ | 335/711 47%|βββββ | 336/711 47%|βββββ | 337/711 48%|βββββ | 338/711 48%|βββββ | 339/711 48%|βββββ | 340/711 48%|βββββ | 340/711 48%|βββββ | 341/711 48%|βββββ | 342/711 | |
| 48%|βββββ | 343/711 48%|βββββ | 344/711 49%|βββββ | 345/711 49%|βββββ | 346/711 49%|βββββ | 347/711 49%|βββββ | 348/711 49%|βββββ | 349/711 49%|βββββ | 350/711 49%|βββββ | 350/711 49%|βββββ | 351/711 50%|βββββ | 352/711 50%|βββββ | 353/711 50%|βββββ | 354/711 50%|βββββ | 355/711 50%|βββββ | 356/711 50%|βββββ | 357/711 | |
| 0: {'loss': 0.7127, 'grad_norm': 0.8028586105919348, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.3} | |
| 0: {'loss': 0.7049, 'grad_norm': 0.8418301197643927, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.31} | |
| 0: 50%|βββββ | 358/711 50%|βββββ | 359/711 51%|βββββ | 360/711 51%|βββββ | 360/711 51%|βββββ | 361/711 51%|βββββ | 362/711 51%|βββββ | 363/711 51%|βββββ | 364/711 51%|ββββββ | 365/711 51%|ββββββ | 366/711 52%|ββββββ | 367/711 52%|ββββββ | 368/711 52%|ββββββ | 369/711 52%|ββββββ | 370/711 52%|ββββββ | 370/711 52%|βοΏ½ | |
| 0: {'loss': 0.7056, 'grad_norm': 0.8018926220188637, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.32} | |
| 0: οΏ½οΏ½ββββ | 371/711 52%|ββββββ | 372/711 52%|ββββββ | 373/711 53%|ββββββ | 374/711 53%|ββββββ | 375/711 53%|ββββββ | 376/711 53%|ββββββ | 377/711 53%|ββββββ | 378/711 53%|ββββββ | 379/711 53%|ββββββ | 380/711 53%|ββββββ | 380/711 54%|ββββββ | 381/711 54%|ββββββ | 382/711 54%|ββββββ | 383/711 54%|ββββββ | 384/711 54%|ββββββ | 385/711 | |
| 54%|ββββββ | 386/711 54%|ββββββ | 387/711 55%|ββββββ | 388/711 55%|ββββββ | 389/711 55%|ββββββ | 390/711 55%|ββββββ | 390/711 55%|ββββββ | 391/711 55%|ββββββ | 392/711 55%|ββββββ | 393/711 55%|ββββββ | 394/711 56%|ββββββ | 395/711 56%|ββββββ | 396/711 56%|ββββββ | 397/711 56%|ββββββ | 398/711 56%|ββββββ | 399/711 56%|ββββββ | 400/ | |
| 0: {'loss': 0.6793, 'grad_norm': 0.9500235419290328, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.34} | |
| 0: {'loss': 0.6814, 'grad_norm': 0.8451661040419431, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.35} | |
| 0: 711 56%|ββββββ | 400/711 56%|ββββββ | 401/711 57%|ββββββ | 402/711 57%|ββββββ | 403/711 57%|ββββββ | 404/711 57%|ββββββ | 405/711 57%|ββββββ | 406/711 57%|ββββββ | 407/711 57%|ββββββ | 408/711 58%|ββββββ | 409/711 58%|ββββββ | 410/711 58%|ββββββ | 410/711 58%|ββββββ | 411/711 58%|ββββββ | 412/711 58%|ββββββ | |
| 0: {'loss': 0.6906, 'grad_norm': 0.8679849121193738, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.35} | |
| 0: | 413/711 58%|ββββββ | 414/711 58%|ββββββ | 415/711 59%|ββββββ | 416/711 59%|ββββββ | 417/711 59%|ββββββ | 418/711 59%|ββββββ | 419/711 59%|ββββββ | 420/711 59%|ββββββ | 420/711 59%|ββββββ | 421/711 59%|ββββββ | 422/711 59%|ββββββ | 423/711 60%|ββββββ | 424/711 60%|ββββββ | 425/711 60%|ββββββ | 426/711 60%|ββββββ | 427/711 60%|οΏ½ | |
| 0: {'loss': 0.6849, 'grad_norm': 0.8279829459119256, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.36} | |
| 0: {'loss': 0.6902, 'grad_norm': 0.870874120336189, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.37} | |
| 0: οΏ½οΏ½βββββ | 428/711 60%|ββββββ | 429/711 60%|ββββββ | 430/711 60%|ββββββ | 430/711 61%|ββββββ | 431/711 61%|ββββββ | 432/711 61%|ββββββ | 433/711 61%|ββββββ | 434/711 61%|ββββββ | 435/711 61%|βββββββ | 436/711 61%|βββββββ | 437/711 62%|βββββββ | 438/711 62%|βββββββ | 439/711 62%|βββββββ | 440/711 62%|βββββββ | 440/711 | |
| 62%|βββββββ | 441/711 62%|βββββββ | 442/711 62%|βββββββ | 443/711 62%|βββββββ | 444/711 63%|βββββββ | 445/711 63%|βββββββ | 446/711 63%|βββββββ | 447/711 63%|βββββββ | 448/711 63%|βββββββ | 449/711 63%|βββββββ | 450/711 63%|βββββββ | 450/711 63%|βββββββ | 451/711 64%|βββββββ | 452/711 64%|βββββββ | 453/711 64%|βββββββ | 454/711 | |
| 0: {'loss': 0.6751, 'grad_norm': 0.9324021579543116, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.39} | |
| 0: 64%|βββββββ | 455/711 64%|βββββββ | 456/711 64%|βββββββ | 457/711 64%|βββββββ | 458/711 65%|βββββββ | 459/711 65%|βββββββ | 460/711 65%|βββββββ | 460/711 65%|βββββββ | 461/711 65%|βββββββ | 462/711 65%|βββββββ | 463/711 65%|βββββββ | 464/711 65%|βββββββ | 465/711 66%|βββββββ | 466/711 66%|βββββββ | 467/711 66%|βββββββ | 468/711 66%|ββοΏ½ | |
| 0: {'loss': 0.681, 'grad_norm': 1.1760742860535562, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.4} | |
| 0: {'loss': 0.6857, 'grad_norm': 0.7580929709516384, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.4} | |
| 0: οΏ½οΏ½ββββ | 469/711 66%|βββββββ | 470/711 66%|βββββββ | 470/711 66%|βββββββ | 471/711 66%|βββββββ | 472/711 67%|βββββββ | 473/711 67%|βββββββ | 474/711 67%|βββββββ | 475/711 67%|βββββββ | 476/711 67%|βββββββ | 477/711 67%|βββββββ | 478/711 67%|βββββββ | 479/711 68%|βββββββ | 480/711 68%|βββββββ | 480/711 68%|βββββββ | 481/711 | |
| 0: {'loss': 0.6551, 'grad_norm': 0.7688484391693519, 'learning_rate': 2e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.41} | |
| 0: 68%|βββββββ | 482/711 68%|βββββββ | 483/711 68%|βββββββ | 484/711 68%|βββββββ | 485/711 68%|βββββββ | 486/711 68%|βββββββ | 487/711 69%|βββββββ | 488/711 69%|βββββββ | 489/711 69%|βββββββ | 490/711 69%|βββββββ | 490/711 69%|βββββββ | 491/711 69%|βββββββ | 492/711 69%|βββββββ | 493/711 69%|βββββββ | 494/711 70%|βββββββ | 495/711 | |
| 70%|βββββββ | 496/711 70%|βββββββ | 497/711 70%|βββββββ | 498/711 70%|βββββββ | 499/711 70%|βββββββ | 500/711 70%|βββββββ | 500/711 70%|βββββββ | 501/711 71%|βββββββ | 502/711 71%|βββββββ | 503/711 71%|βββββββ | 504/711 71%|βββββββ | 505/711 71%|βββββββ | 506/711 71%|ββββββββ | 507/711 71%|ββββββββ | 508/711 72%|ββββββββ | 509/711 | |
| 72%|ββββββββ | 510/711 72%|ββββββββ | 510/711 72%|ββββββββ | 511/711 72%|ββββββββ | 512/711 72%|ββββββββ | 513/711 72%|ββββββββ | 514/711 72%|ββββββββ | 515/711 73%|ββββββββ | 516/711 73%|ββββββββ | 517/711 73%|ββββββββ | 518/711 73%|ββββββββ | 519/711 73%|ββββββββ | 520/711 73%|ββββββββ | 520/711 73%|ββββββββ | 521/711 | |
| 73%|ββββββββ | 522/711 74%|ββββββββ | 523/711 74%|ββββββββ | 524/711 74%|ββββββββ | 525/711 74%|ββββββββ | 526/711 74%|ββββββββ | 527/711 74%|ββββββββ | 528/711 74%|ββββββββ | 529/711 75%|ββββββββ | 530/711 75%|ββββββββ | 530/711 75%|ββββββββ | 531/711 75%|ββββββββ | 532/711 75%|ββββββββ | 533/711 75%|ββββββββ | 534/711 75%|ββββββββ | 535/711 | |
| 0: {'loss': 0.6854, 'grad_norm': 0.8052276313272076, 'learning_rate': 1.9143443472194178e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.46} | |
| 0: 75%|ββββββββ | 536/711 76%|ββββββββ | 537/711 76%|ββββββββ | 538/711 76%|ββββββββ | 539/711 76%|ββββββββ | 540/711 76%|ββββββββ | 540/711 76%|ββββββββ | 541/711 76%|ββββββββ | 542/711 76%|ββββββββ | 543/711 77%|ββββββββ | 544/711 77%|ββββββββ | 545/711 77%|ββββββββ | 546/711 77%|ββββββββ | 547/711 77%|ββββββββ | 548/711 77%|ββββββοΏ½ | |
| 0: {'loss': 0.6813, 'grad_norm': 0.8340810817094527, 'learning_rate': 1.8443725168471054e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.46} | |
| 0: {'loss': 0.6641, 'grad_norm': 0.85639870414892, 'learning_rate': 1.7560717646792704e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.47} | |
| 0: οΏ½οΏ½β | 549/711 77%|ββββββββ | 550/711 77%|ββββββββ | 550/711 77%|ββββββββ | 551/711 78%|ββββββββ | 552/711 78%|ββββββββ | 553/711 78%|ββββββββ | 554/711 78%|ββββββββ | 555/711 78%|ββββββββ | 556/711 78%|ββββββββ | 557/711 78%|ββββββββ | 558/711 79%|ββββββββ | 559/711 79%|ββββββββ | 560/711 79%|ββββββββ | 560/711 79%|βββββοΏ½ | |
| 0: {'loss': 0.6813, 'grad_norm': 0.7779683047257264, 'learning_rate': 1.651616348287679e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.48} | |
| 0: οΏ½ββ | 561/711 79%|ββββββββ | 562/711 79%|ββββββββ | 563/711 79%|ββββββββ | 564/711 79%|ββββββββ | 565/711 80%|ββββββββ | 566/711 80%|ββββββββ | 567/711 80%|ββββββββ | 568/711 80%|ββββββββ | 569/711 80%|ββββββββ | 570/711 80%|ββββββββ | 570/711 80%|ββββββββ | 571/711 80%|ββββββββ | 572/711 81%|ββββββββ | 573/711 81%|ββββββββ | 574/711 81%| | |
| 0: {'loss': 0.6657, 'grad_norm': 0.8763000605487214, 'learning_rate': 1.5335783066915437e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.49} | |
| 0: ββββββββ | 575/711 81%|ββββββββ | 576/711 81%|ββββββββ | 577/711 81%|βββββββββ | 578/711 81%|βββββββββ | 579/711 82%|βββββββββ | 580/711 82%|βββββββββ | 580/711 82%|βββββββββ | 581/711 82%|βββββββββ | 582/711 82%|βββββββββ | 583/711 82%|βββββββββ | 584/711 82%|βββββββββ | 585/711 82%|βββββββββ | 586/711 83%|βββββββββ | 587/711 83%|βββββββββ | |
| 0: {'loss': 0.6539, 'grad_norm': 0.7886914559405968, 'learning_rate': 1.4048641282207624e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.5} | |
| 0: {'loss': 0.6791, 'grad_norm': 0.7967066194263275, 'learning_rate': 1.2686431831271523e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.58, 'epoch': 0.51} | |
| 0: | 588/711 83%|βββββββββ | 589/711 83%|βββββββββ | 590/711 83%|βββββββββ | 590/711 83%|βββββββββ | 591/711 83%|βββββββββ | 592/711 83%|βββββββββ | 593/711 84%|βββββββββ | 594/711 84%|βββββββββ | 595/711 84%|βββββββββ | 596/711 84%|βββββββββ | 597/711 84%|βββββββββ | 598/711 84%|βββββββββ | 599/711 84%|βββββββββ | 600/711 8 | |
| 0: {'loss': 0.6489, 'grad_norm': 0.8076470421353021, 'learning_rate': 1.1282696831703156e-05, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.51} | |
| 0: 4%|βββββββββ | 600/711 85%|βββββββββ | 601/711 85%|βββββββββ | 602/711 85%|βββββββββ | 603/711 85%|βββββββββ | 604/711 85%|βββββββββ | 605/711 85%|βββββββββ | 606/711 85%|βββββββββ | 607/711 86%|βββββββββ | 608/711 86%|βββββββββ | 609/711 86%|βββββββββ | 610/711 86%|βββββββββ | 610/711 86%|βββββββββ | 611/711 86%|βββββββββ | 612/711 86%|ββββββ | |
| 0: {'loss': 0.6488, 'grad_norm': 0.8168905882802561, 'learning_rate': 9.872000897921262e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.52} | |
| 0: βββ | 613/711 86%|βββββββββ | 614/711 86%|βββββββββ | 615/711 87%|βββββββββ | 616/711 87%|βββββββββ | 617/711 87%|βββββββββ | 618/711 87%|βββββββββ | 619/711 87%|βββββββββ | 620/711 87%|βββββββββ | 620/711 87%|βββββββββ | 621/711 87%|βββββββββ | 622/711 88%|βββββββββ | 623/711 88%|βββββββββ | 624/711 88%|βββββββββ | 625/711 88%|βββββββββ | 626/711 | |
| 88%|βββββββββ | 627/711 88%|βββββββββ | 628/711 88%|βββββββββ | 629/711 89%|βββββββββ | 630/711 89%|βββββββββ | 630/711 89%|βββββββββ | 631/711 89%|βββββββββ | 632/711 89%|βββββββββ | 633/711 89%|βββββββββ | 634/711 89%|βββββββββ | 635/711 89%|βββββββββ | 636/711 90%|βββββββββ | 637/711 90%|βββββββββ | 638/711 90%|βββββββββ | 639/711 | |
| 90%|βββββββββ | 640/711 90%|βββββββββ | 640/711 90%|βββββββββ | 641/711 90%|βββββββββ | 642/711 90%|βββββββββ | 643/711 91%|βββββββββ | 644/711 91%|βββββββββ | 645/711 91%|βββββββββ | 646/711 91%|βββββββββ | 647/711 91%|βββββββββ | 648/711 91%|ββββββββββ| 649/711 91%|ββββββββββ| 650/711 91%|ββββββββββ| 650/711 92%|ββββββββ | |
| 0: {'loss': 0.6595, 'grad_norm': 0.7139199193006327, 'learning_rate': 4.839076046641802e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.56} | |
| 0: ββ| 651/711 92%|ββββββββββ| 652/711 92%|ββββββββββ| 653/711 92%|ββββββββββ| 654/711 92%|ββββββββββ| 655/711 92%|ββββββββββ| 656/711 92%|ββββββββββ| 657/711 93%|ββββββββββ| 658/711 93%|ββββββββββ| 659/711 93%|ββββββββββ| 660/711 93%|ββββββββββ| 660/711 93%|ββββββββββ| 661/711 93%|ββββββββββ| 662/711 93%|ββββββββββ| 663/711 93%|βββββοΏ½ | |
| 0: {'loss': 0.6349, 'grad_norm': 0.6774414042634137, 'learning_rate': 3.888604888618787e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.56} | |
| 0: οΏ½ββββ| 664/711 94%|ββββββββββ| 665/711 94%|ββββββββββ| 666/711 94%|ββββββββββ| 667/711 94%|ββββββββββ| 668/711 94%|ββββββββββ| 669/711 94%|ββββββββββ| 670/711 94%|ββββββββββ| 670/711 94%|ββββββββββ| 671/711 95%|ββββββββββ| 672/711 95%|ββββββββββ| 673/711 95%|ββββββββββ| 674/711 95%|ββββββββββ| 675/711 95%|ββββββββββ| 676/711 95%|βββοΏ½ | |
| 0: {'loss': 0.6585, 'grad_norm': 0.6957704121795029, 'learning_rate': 3.11323987960523e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.57} | |
| 0: οΏ½οΏ½ββββββ| 677/711 95%|ββββββββββ| 678/711 95%|ββββββββββ| 679/711 96%|ββββββββββ| 680/711 96%|ββββββββββ| 680/711 96%|ββββββββββ| 681/711 96%|ββββββββββ| 682/711 96%|ββββββββββ| 683/711 96%|ββββββββββ| 684/711 96%|ββββββββββ| 685/711 96%|ββββββββββ| 686/711 97%|ββββββββββ| 687/711 97%|ββββββββββ| 688/711 97%|ββββββββββ| 689/711 97%|β | |
| 0: {'loss': 0.6588, 'grad_norm': 0.7024004831662435, 'learning_rate': 2.532073079411971e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.58} | |
| 0: {'loss': 0.6492, 'grad_norm': 1.1172218022043618, 'learning_rate': 2.159414743441803e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.59} | |
| 0: βββββββββ| 690/711 97%|ββββββββββ| 690/711 97%|ββββββββββ| 691/711 97%|ββββββββββ| 692/711 97%|ββββββββββ| 693/711 98%|ββββββββββ| 694/711 98%|ββββββββββ| 695/711 98%|ββββββββββ| 696/711 98%|ββββββββββ| 697/711 98%|ββββββββββ| 698/711 98%|ββββββββββ| 699/711 98%|ββββββββββ| 700/711 98%|ββββββββββ| 700/711 99%|βββββ | |
| 0: {'loss': 0.6439, 'grad_norm': 0.7226102601600088, 'learning_rate': 2.0044409567084157e-06, 'memory/max_mem_active(gib)': 52.06, 'memory/max_mem_allocated(gib)': 52.06, 'memory/device_mem_reserved(gib)': 60.79, 'epoch': 0.6} | |
| 0: | |
| 99%|ββββββββββ| 702/711 99%|ββββββββββ| 703/711 99%|ββββββββββ| 704/711 99%|ββββββββββ| 705/711 99%|ββββββββββ| 706/711 99%|ββββββββββ| 707/711 100%|ββββββββββ| 708/711 100%|ββββββββββ| 709/711 100%|ββββββββββ| 710/711 100%|ββββββββββ| 710/711 100%|ββββββββββ| 711/711 100%|ββββββββββ| 711/711 100%|βββββββββ | |
| 0: β| 711/711 | |
| 0: | |