[WARNING|2026-06-09 01:11:38] llamafactory.extras.misc:168 >> Version checking has been disabled, may lead to unexpected behaviors. /usr/local/lib/python3.12/dist-packages/jieba/__init__.py:44: SyntaxWarning: invalid escape sequence '\.' re_han_default = re.compile("([\u4E00-\u9FD5a-zA-Z0-9+#&\._%\-]+)", re.U) /usr/local/lib/python3.12/dist-packages/jieba/__init__.py:46: SyntaxWarning: invalid escape sequence '\s' re_skip_default = re.compile("(\r\n|\s)", re.U) /usr/local/lib/python3.12/dist-packages/jieba/finalseg/__init__.py:78: SyntaxWarning: invalid escape sequence '\.' re_skip = re.compile("([a-zA-Z0-9]+(?:\.\d+)?%?)") [INFO] Running in WANDB offline mode [pa_sft_train_si] Injected GENERAL_QUESTION_PROMPT into 2136 samples [pa_sft_train_si] Injected dataset JSON: /content/drive/MyDrive/_prompt_injected_merged_reordered_si.json [pa_sft_train_si] Rewired --dataset_dir to: /content/drive/MyDrive/IAD-R1-main/data/_prompt_injected [INFO|2026-06-09 01:11:53] llamafactory.hparams.parser:383 >> Process rank: -1, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16 [INFO|configuration_utils.py:783] 2026-06-09 01:11:53,911 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/config.json [INFO|configuration_utils.py:859] 2026-06-09 01:11:53,930 >> Model config Qwen3_5Config { "architectures": [ "Qwen3_5ForConditionalGeneration" ], "image_token_id": 248056, "model_type": "qwen3_5", "mtp_num_hidden_layers": 1, "text_config": { "attention_bias": false, "attention_dropout": 0.0, "attn_output_gate": true, "bos_token_id": null, "dtype": "bfloat16", "eos_token_id": 248044, "full_attention_interval": 4, "head_dim": 256, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention" ], "linear_conv_kernel_dim": 4, "linear_key_head_dim": 128, "linear_num_key_heads": 16, "linear_num_value_heads": 32, "linear_value_head_dim": 128, "mamba_ssm_dtype": "float32", "max_position_embeddings": 262144, "mlp_only_layers": [], "model_type": "qwen3_5_text", "mtp_num_hidden_layers": 1, "mtp_use_dedicated_embeddings": false, "num_attention_heads": 16, "num_hidden_layers": 32, "num_key_value_heads": 4, "pad_token_id": null, "partial_rotary_factor": 0.25, "rms_norm_eps": 1e-06, "rope_parameters": { "mrope_interleaved": true, "mrope_section": [ 11, 11, 10 ], "partial_rotary_factor": 0.25, "rope_theta": 10000000, "rope_type": "default" }, "tie_word_embeddings": true, "use_cache": true, "vocab_size": 248320 }, "tie_word_embeddings": true, "transformers_version": "5.10.2", "unsloth_fixed_mtp": true, "video_token_id": 248057, "vision_config": { "deepstack_visual_indexes": [], "depth": 24, "hidden_act": "gelu_pytorch_tanh", "hidden_size": 1024, "in_channels": 3, "initializer_range": 0.02, "intermediate_size": 4096, "model_type": "qwen3_5_vision", "num_heads": 16, "num_position_embeddings": 2304, "out_hidden_size": 2560, "patch_size": 16, "spatial_merge_size": 2, "temporal_patch_size": 2 }, "vision_end_token_id": 248054, "vision_start_token_id": 248053 } [INFO|configuration_utils.py:783] 2026-06-09 01:11:54,013 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/config.json [INFO|configuration_utils.py:859] 2026-06-09 01:11:54,024 >> Model config Qwen3_5Config { "architectures": [ "Qwen3_5ForConditionalGeneration" ], "image_token_id": 248056, "model_type": "qwen3_5", "mtp_num_hidden_layers": 1, "text_config": { "attention_bias": false, "attention_dropout": 0.0, "attn_output_gate": true, "bos_token_id": null, "dtype": "bfloat16", "eos_token_id": 248044, "full_attention_interval": 4, "head_dim": 256, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention" ], "linear_conv_kernel_dim": 4, "linear_key_head_dim": 128, "linear_num_key_heads": 16, "linear_num_value_heads": 32, "linear_value_head_dim": 128, "mamba_ssm_dtype": "float32", "max_position_embeddings": 262144, "mlp_only_layers": [], "model_type": "qwen3_5_text", "mtp_num_hidden_layers": 1, "mtp_use_dedicated_embeddings": false, "num_attention_heads": 16, "num_hidden_layers": 32, "num_key_value_heads": 4, "pad_token_id": null, "partial_rotary_factor": 0.25, "rms_norm_eps": 1e-06, "rope_parameters": { "mrope_interleaved": true, "mrope_section": [ 11, 11, 10 ], "partial_rotary_factor": 0.25, "rope_theta": 10000000, "rope_type": "default" }, "tie_word_embeddings": true, "use_cache": true, "vocab_size": 248320 }, "tie_word_embeddings": true, "transformers_version": "5.10.2", "unsloth_fixed_mtp": true, "video_token_id": 248057, "vision_config": { "deepstack_visual_indexes": [], "depth": 24, "hidden_act": "gelu_pytorch_tanh", "hidden_size": 1024, "in_channels": 3, "initializer_range": 0.02, "intermediate_size": 4096, "model_type": "qwen3_5_vision", "num_heads": 16, "num_position_embeddings": 2304, "out_hidden_size": 2560, "patch_size": 16, "spatial_merge_size": 2, "temporal_patch_size": 2 }, "vision_end_token_id": 248054, "vision_start_token_id": 248053 } [INFO|processing_utils.py:1387] 2026-06-09 01:11:58,324 >> loading configuration file processor_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/processor_config.json [INFO|processing_utils.py:1387] 2026-06-09 01:11:58,870 >> loading configuration file processor_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/processor_config.json [INFO|image_processing_base.py:344] 2026-06-09 01:11:59,094 >> loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/preprocessor_config.json [INFO|image_processing_base.py:344] 2026-06-09 01:11:59,255 >> loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/preprocessor_config.json [INFO|image_processing_base.py:385] 2026-06-09 01:11:59,255 >> Image processor Qwen2VLImageProcessor { "data_format": "channels_first", "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.5, 0.5, 0.5 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.5, 0.5, 0.5 ], "merge_size": 2, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 16777216, "shortest_edge": 65536 }, "temporal_patch_size": 2 } [INFO|configuration_utils.py:783] 2026-06-09 01:11:59,324 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/config.json [INFO|configuration_utils.py:859] 2026-06-09 01:11:59,334 >> Model config Qwen3_5Config { "architectures": [ "Qwen3_5ForConditionalGeneration" ], "image_token_id": 248056, "model_type": "qwen3_5", "mtp_num_hidden_layers": 1, "text_config": { "attention_bias": false, "attention_dropout": 0.0, "attn_output_gate": true, "bos_token_id": null, "dtype": "bfloat16", "eos_token_id": 248044, "full_attention_interval": 4, "head_dim": 256, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention" ], "linear_conv_kernel_dim": 4, "linear_key_head_dim": 128, "linear_num_key_heads": 16, "linear_num_value_heads": 32, "linear_value_head_dim": 128, "mamba_ssm_dtype": "float32", "max_position_embeddings": 262144, "mlp_only_layers": [], "model_type": "qwen3_5_text", "mtp_num_hidden_layers": 1, "mtp_use_dedicated_embeddings": false, "num_attention_heads": 16, "num_hidden_layers": 32, "num_key_value_heads": 4, "pad_token_id": null, "partial_rotary_factor": 0.25, "rms_norm_eps": 1e-06, "rope_parameters": { "mrope_interleaved": true, "mrope_section": [ 11, 11, 10 ], "partial_rotary_factor": 0.25, "rope_theta": 10000000, "rope_type": "default" }, "tie_word_embeddings": true, "use_cache": true, "vocab_size": 248320 }, "tie_word_embeddings": true, "transformers_version": "5.10.2", "unsloth_fixed_mtp": true, "video_token_id": 248057, "vision_config": { "deepstack_visual_indexes": [], "depth": 24, "hidden_act": "gelu_pytorch_tanh", "hidden_size": 1024, "in_channels": 3, "initializer_range": 0.02, "intermediate_size": 4096, "model_type": "qwen3_5_vision", "num_heads": 16, "num_position_embeddings": 2304, "out_hidden_size": 2560, "patch_size": 16, "spatial_merge_size": 2, "temporal_patch_size": 2 }, "vision_end_token_id": 248054, "vision_start_token_id": 248053 } [INFO|video_processing_utils.py:686] 2026-06-09 01:12:01,220 >> loading configuration file video_preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/video_preprocessor_config.json [INFO|video_processing_utils.py:686] 2026-06-09 01:12:01,439 >> loading configuration file video_preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/video_preprocessor_config.json [INFO|video_processing_utils.py:727] 2026-06-09 01:12:01,439 >> Video processor Qwen3VLVideoProcessor { "data_format": "channels_first", "default_to_square": true, "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "do_sample_frames": true, "fps": 2, "image_mean": [ 0.5, 0.5, 0.5 ], "image_std": [ 0.5, 0.5, 0.5 ], "max_frames": 768, "merge_size": 2, "min_frames": 4, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 25165824, "shortest_edge": 4096 }, "temporal_patch_size": 2, "video_processor_type": "Qwen3VLVideoProcessor" } [INFO|processing_utils.py:1462] 2026-06-09 01:12:01,440 >> Processor Qwen3VLProcessor: - image_processor: Qwen2VLImageProcessor { "data_format": "channels_first", "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.5, 0.5, 0.5 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.5, 0.5, 0.5 ], "merge_size": 2, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 16777216, "shortest_edge": 65536 }, "temporal_patch_size": 2 } - tokenizer: Qwen2Tokenizer(name_or_path='unsloth/Qwen3.5-4B', vocab_size=248044, model_max_length=262144, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|vision_pad|>', 'audio_bos_token': '<|audio_start|>', 'audio_eos_token': '<|audio_end|>', 'audio_token': '<|audio_pad|>', 'image_token': '<|image_pad|>', 'video_token': '<|video_pad|>', 'vision_bos_token': '<|vision_start|>', 'vision_eos_token': '<|vision_end|>'}, added_tokens_decoder={ 248044: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248045: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248046: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248047: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248048: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248049: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248050: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248051: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248052: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248053: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248054: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248055: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248056: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248057: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248058: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248059: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248060: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248061: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248062: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248063: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248064: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248065: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248066: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248067: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248068: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248069: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248070: AddedToken("<|audio_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248071: AddedToken("<|audio_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248072: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248073: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248074: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248075: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248076: AddedToken("<|audio_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }) - video_processor: Qwen3VLVideoProcessor { "data_format": "channels_first", "default_to_square": true, "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "do_sample_frames": true, "fps": 2, "image_mean": [ 0.5, 0.5, 0.5 ], "image_std": [ 0.5, 0.5, 0.5 ], "max_frames": 768, "merge_size": 2, "min_frames": 4, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 25165824, "shortest_edge": 4096 }, "temporal_patch_size": 2, "video_processor_type": "Qwen3VLVideoProcessor" } { "image_processor": { "data_format": "channels_first", "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.5, 0.5, 0.5 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.5, 0.5, 0.5 ], "merge_size": 2, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 16777216, "shortest_edge": 65536 }, "temporal_patch_size": 2 }, "processor_class": "Qwen3VLProcessor", "video_processor": { "data_format": "channels_first", "default_to_square": true, "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "do_sample_frames": true, "fps": 2, "image_mean": [ 0.5, 0.5, 0.5 ], "image_std": [ 0.5, 0.5, 0.5 ], "max_frames": 768, "merge_size": 2, "min_frames": 4, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 25165824, "shortest_edge": 4096 }, "temporal_patch_size": 2, "video_processor_type": "Qwen3VLVideoProcessor" } } Qwen3VLProcessor: - image_processor: Qwen2VLImageProcessor { "data_format": "channels_first", "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.5, 0.5, 0.5 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.5, 0.5, 0.5 ], "merge_size": 2, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 16777216, "shortest_edge": 65536 }, "temporal_patch_size": 2 } - tokenizer: Qwen2Tokenizer(name_or_path='unsloth/Qwen3.5-4B', vocab_size=248044, model_max_length=262144, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|vision_pad|>', 'audio_bos_token': '<|audio_start|>', 'audio_eos_token': '<|audio_end|>', 'audio_token': '<|audio_pad|>', 'image_token': '<|image_pad|>', 'video_token': '<|video_pad|>', 'vision_bos_token': '<|vision_start|>', 'vision_eos_token': '<|vision_end|>'}, added_tokens_decoder={ 248044: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248045: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248046: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248047: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248048: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248049: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248050: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248051: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248052: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248053: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248054: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248055: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248056: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248057: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248058: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248059: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248060: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248061: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248062: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248063: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248064: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248065: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248066: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248067: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248068: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248069: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 248070: AddedToken("<|audio_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248071: AddedToken("<|audio_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248072: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248073: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248074: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248075: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 248076: AddedToken("<|audio_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }) - video_processor: Qwen3VLVideoProcessor { "data_format": "channels_first", "default_to_square": true, "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "do_sample_frames": true, "fps": 2, "image_mean": [ 0.5, 0.5, 0.5 ], "image_std": [ 0.5, 0.5, 0.5 ], "max_frames": 768, "merge_size": 2, "min_frames": 4, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 25165824, "shortest_edge": 4096 }, "temporal_patch_size": 2, "video_processor_type": "Qwen3VLVideoProcessor" } { "image_processor": { "data_format": "channels_first", "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.5, 0.5, 0.5 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.5, 0.5, 0.5 ], "merge_size": 2, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 16777216, "shortest_edge": 65536 }, "temporal_patch_size": 2 }, "processor_class": "Qwen3VLProcessor", "video_processor": { "data_format": "channels_first", "default_to_square": true, "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "do_sample_frames": true, "fps": 2, "image_mean": [ 0.5, 0.5, 0.5 ], "image_std": [ 0.5, 0.5, 0.5 ], "max_frames": 768, "merge_size": 2, "min_frames": 4, "patch_size": 16, "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 25165824, "shortest_edge": 4096 }, "temporal_patch_size": 2, "video_processor_type": "Qwen3VLVideoProcessor" } } [INFO|2026-06-09 01:12:01] llamafactory.data.template:157 >> Add <|im_end|> to stop words. [INFO|2026-06-09 01:12:01] llamafactory.data.loader:157 >> Loading dataset /content/drive/MyDrive/_prompt_injected_merged_reordered_si.json... Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 2136 examples [00:00, 47011.03 examples/s] Converting format of dataset: 0%| | 0/2136 [00:00system You are a helpful assistant.<|im_end|> <|im_start|>user You are an expert in detecting industrial anomalies in images. You will be provided with a query image (query_img) for inspection. Your reasoning and response must strictly follow these constraints based on the specific tags. Inspect the query_img for any anomalies If you find anomalies in the query image, respond with Yes...... If no anomalies are detected in the query image, respond with No <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> Are there any defects in the query image?<|im_end|> <|im_start|>assistant No<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 27, 8944, 39798, 510, 8944, 29, 248046, 198][INFO|configuration_utils.py:783] 2026-06-09 01:24:17,021 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/config.json [INFO|configuration_utils.py:859] 2026-06-09 01:24:17,032 >> Model config Qwen3_5Config { "architectures": [ "Qwen3_5ForConditionalGeneration" ], "image_token_id": 248056, "model_type": "qwen3_5", "mtp_num_hidden_layers": 1, "text_config": { "attention_bias": false, "attention_dropout": 0.0, "attn_output_gate": true, "bos_token_id": null, "dtype": "bfloat16", "eos_token_id": 248044, "full_attention_interval": 4, "head_dim": 256, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 9216, "layer_types": [ "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention", "linear_attention", "linear_attention", "linear_attention", "full_attention" ], "linear_conv_kernel_dim": 4, "linear_key_head_dim": 128, "linear_num_key_heads": 16, "linear_num_value_heads": 32, "linear_value_head_dim": 128, "mamba_ssm_dtype": "float32", "max_position_embeddings": 262144, "mlp_only_layers": [], "model_type": "qwen3_5_text", "mtp_num_hidden_layers": 1, "mtp_use_dedicated_embeddings": false, "num_attention_heads": 16, "num_hidden_layers": 32, "num_key_value_heads": 4, "pad_token_id": null, "partial_rotary_factor": 0.25, "rms_norm_eps": 1e-06, "rope_parameters": { "mrope_interleaved": true, "mrope_section": [ 11, 11, 10 ], "partial_rotary_factor": 0.25, "rope_theta": 10000000, "rope_type": "default" }, "tie_word_embeddings": true, "use_cache": true, "vocab_size": 248320 }, "tie_word_embeddings": true, "transformers_version": "5.10.2", "unsloth_fixed_mtp": true, "video_token_id": 248057, "vision_config": { "deepstack_visual_indexes": [], "depth": 24, "hidden_act": "gelu_pytorch_tanh", "hidden_size": 1024, "in_channels": 3, "initializer_range": 0.02, "intermediate_size": 4096, "model_type": "qwen3_5_vision", "num_heads": 16, "num_position_embeddings": 2304, "out_hidden_size": 2560, "patch_size": 16, "spatial_merge_size": 2, "temporal_patch_size": 2 }, "vision_end_token_id": 248054, "vision_start_token_id": 248053 } [INFO|modeling_utils.py:769] 2026-06-09 01:24:18,747 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/model.safetensors.index.json labels: No<|im_end|> Fetching 2 files: 0%| | 0/2 [00:00> Generate config GenerationConfig { "output_attentions": false, "output_hidden_states": false } [WARNING|logging.py:340] 2026-06-09 01:24:41,902 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d Loading weights: 0%| | 0/723 [00:00> Generation config file not found, using a generation config created from the model config. [INFO|configuration_utils.py:1038] 2026-06-09 01:24:44,870 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--Qwen3.5-4B/snapshots/3764fa359b9082ea5a1e4a5e3ac3aaf6e9671636/config.json [INFO|configuration_utils.py:1085] 2026-06-09 01:24:44,870 >> Generate config GenerationConfig {} [INFO|2026-06-09 01:24:44] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled. [INFO|2026-06-09 01:24:44] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference. [INFO|2026-06-09 01:24:44] llamafactory.model.adapter:157 >> Pure bf16 / BAdam detected, remaining trainable params in half precision. [INFO|2026-06-09 01:24:44] llamafactory.model.adapter:157 >> Fine-tuning method: Full [INFO|2026-06-09 01:24:44] llamafactory.model.model_utils.visual:157 >> Set vision model not trainable: ['model.visual.patch_embed', 'model.visual.pos_embed', 'model.visual.rotary_pos_emb', 'model.visual.blocks']. [INFO|2026-06-09 01:24:44] llamafactory.model.model_utils.visual:157 >> Set multi model projector not trainable: model.visual.merger. [INFO|2026-06-09 01:24:44] llamafactory.model.loader:157 >> trainable params: 4,205,751,296 || all params: 4,539,265,536 || trainable%: 92.6527 [WARNING|trainer_utils.py:1240] 2026-06-09 01:24:46,864 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248055}. [INFO|trainer.py:1218] 2026-06-09 01:24:51,943 >> skipped Embedding(2304, 1024): 2.25M params [INFO|trainer.py:1218] 2026-06-09 01:24:51,943 >> skipped Embedding(248320, 2560): 608.5M params [INFO|trainer.py:1221] 2026-06-09 01:24:51,944 >> skipped: 608.5M params [INFO|trainer.py:1475] 2026-06-09 01:24:51,958 >> ***** Running training ***** [INFO|trainer.py:1476] 2026-06-09 01:24:51,958 >> Num examples = 2,136 [INFO|trainer.py:1477] 2026-06-09 01:24:51,958 >> Num Epochs = 1 [INFO|trainer.py:1478] 2026-06-09 01:24:51,958 >> Num update steps per epoch = 1,068 [INFO|trainer.py:1479] 2026-06-09 01:24:51,958 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1482] 2026-06-09 01:24:51,958 >> Total train batch size (w. parallel, distributed & accumulation) = 2 [INFO|trainer.py:1483] 2026-06-09 01:24:51,958 >> Gradient Accumulation steps = 2 [INFO|trainer.py:1484] 2026-06-09 01:24:51,959 >> Total optimization steps = 1,068 [INFO|trainer.py:1485] 2026-06-09 01:24:51,960 >> Number of trainable parameters = 4,205,751,296 0%| | 0/1068 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. 0%| | 1/1068 [00:06<1:55:42, 6.51s/it] 0%| | 2/1068 [00:10<1:25:00, 4.78s/it] 0%| | 3/1068 [00:13<1:15:24, 4.25s/it] 0%| | 4/1068 [00:17<1:10:59, 4.00s/it] 0%| | 5/1068 [00:20<1:08:20, 3.86s/it] 0%| | 5/1068 [00:20<1:08:20, 3.86s/it] 1%| | 6/1068 [00:24<1:06:44, 3.77s/it] 1%| | 7/1068 [00:28<1:05:16, 3.69s/it] 1%| | 8/1068 [00:31<1:04:29, 3.65s/it] 1%| | 9/1068 [00:35<1:03:55, 3.62s/it] 1%| | 10/1068 [00:38<1:03:19, 3.59s/it] 1%| | 10/1068 [00:38<1:03:19, 3.59s/it] 1%| | 11/1068 [00:42<1:02:47, 3.56s/it] 1%| | 12/1068 [00:45<1:02:34, 3.56s/it] 1%| | 13/1068 [00:49<1:02:18, 3.54s/it] 1%|▏ | 14/1068 [00:52<1:02:01, 3.53s/it] 1%|▏ | 15/1068 [00:56<1:01:57, 3.53s/it] 1%|▏ | 15/1068 [00:56<1:01:57, 3.53s/it] 1%|▏ | 16/1068 [00:59<1:02:01, 3.54s/it] 2%|▏ | 17/1068 [01:03<1:01:52, 3.53s/it] 2%|▏ | 18/1068 [01:06<1:01:38, 3.52s/it] 2%|▏ | 19/1068 [01:10<1:01:29, 3.52s/it] 2%|▏ | 20/1068 [01:13<1:01:29, 3.52s/it] 2%|▏ | 20/1068 [01:13<1:01:29, 3.52s/it] 2%|▏ | 21/1068 [01:17<1:01:18, 3.51s/it] 2%|▏ | 22/1068 [01:20<1:01:15, 3.51s/it] 2%|▏ | 23/1068 [01:24<1:01:14, 3.52s/it] 2%|▏ | 24/1068 [01:27<1:01:19, 3.52s/it] 2%|▏ | 25/1068 [01:31<1:01:12, 3.52s/it] 2%|▏ | 25/1068 [01:31<1:01:12, 3.52s/it] 2%|▏ | 26/1068 [01:34<1:01:06, 3.52s/it] 3%|▎ | 27/1068 [01:38<1:01:20, 3.54s/it] 3%|▎ | 28/1068 [01:42<1:01:16, 3.53s/it] 3%|▎ | 29/1068 [01:45<1:01:06, 3.53s/it] 3%|▎ | 30/1068 [01:49<1:01:02, 3.53s/it] 3%|▎ | 30/1068 [01:49<1:01:02, 3.53s/it] 3%|▎ | 31/1068 [01:52<1:01:08, 3.54s/it] 3%|▎ | 32/1068 [01:56<1:00:56, 3.53s/it] 3%|▎ | 33/1068 [01:59<1:00:45, 3.52s/it] 3%|▎ | 34/1068 [02:03<1:00:53, 3.53s/it] 3%|▎ | 35/1068 [02:06<1:00:42, 3.53s/it] 3%|▎ | 35/1068 [02:06<1:00:42, 3.53s/it] 3%|▎ | 36/1068 [02:10<1:00:31, 3.52s/it] 3%|▎ | 37/1068 [02:13<1:00:27, 3.52s/it] 4%|▎ | 38/1068 [02:17<1:00:32, 3.53s/it] 4%|▎ | 39/1068 [02:20<1:00:23, 3.52s/it] 4%|▎ | 40/1068 [02:24<1:00:15, 3.52s/it] 4%|▎ | 40/1068 [02:24<1:00:15, 3.52s/it] 4%|▍ | 41/1068 [02:27<1:00:13, 3.52s/it] 4%|▍ | 42/1068 [02:31<1:00:13, 3.52s/it] 4%|▍ | 43/1068 [02:34<1:00:10, 3.52s/it] 4%|▍ | 44/1068 [02:38<1:00:03, 3.52s/it] 4%|▍ | 45/1068 [02:42<1:00:28, 3.55s/it] 4%|▍ | 45/1068 [02:42<1:00:28, 3.55s/it] 4%|▍ | 46/1068 [02:45<1:00:14, 3.54s/it] 4%|▍ | 47/1068 [02:49<1:00:02, 3.53s/it] 4%|▍ | 48/1068 [02:52<1:00:23, 3.55s/it] 5%|▍ | 49/1068 [02:56<1:00:27, 3.56s/it] 5%|▍ | 50/1068 [02:59<1:00:09, 3.55s/it] 5%|▍ | 50/1068 [02:59<1:00:09, 3.55s/it] 5%|▍ | 51/1068 [03:03<59:59, 3.54s/it] 5%|▍ | 52/1068 [03:06<1:00:30, 3.57s/it] 5%|▍ | 53/1068 [03:10<1:00:16, 3.56s/it] 5%|▌ | 54/1068 [03:14<1:00:02, 3.55s/it] 5%|▌ | 55/1068 [03:17<1:00:08, 3.56s/it] 5%|▌ | 55/1068 [03:17<1:00:08, 3.56s/it] 5%|▌ | 56/1068 [03:21<1:00:00, 3.56s/it] 5%|▌ | 57/1068 [03:24<59:47, 3.55s/it] 5%|▌ | 58/1068 [03:28<59:40, 3.55s/it] 6%|▌ | 59/1068 [03:31<1:00:01, 3.57s/it] 6%|▌ | 60/1068 [03:35<59:44, 3.56s/it] 6%|▌ | 60/1068 [03:35<59:44, 3.56s/it] 6%|▌ | 61/1068 [03:38<59:29, 3.54s/it] 6%|▌ | 62/1068 [03:42<59:28, 3.55s/it] 6%|▌ | 63/1068 [03:46<59:26, 3.55s/it] 6%|▌ | 64/1068 [03:49<59:07, 3.53s/it] 6%|▌ | 65/1068 [03:52<58:51, 3.52s/it] 6%|▌ | 65/1068 [03:52<58:51, 3.52s/it] 6%|▌ | 66/1068 [03:56<58:57, 3.53s/it] 6%|▋ | 67/1068 [04:00<58:44, 3.52s/it] 6%|▋ | 68/1068 [04:03<58:37, 3.52s/it] 6%|▋ | 69/1068 [04:07<58:32, 3.52s/it] 7%|▋ | 70/1068 [04:10<58:43, 3.53s/it] 7%|▋ | 70/1068 [04:10<58:43, 3.53s/it] 7%|▋ | 71/1068 [04:14<58:35, 3.53s/it] 7%|▋ | 72/1068 [04:17<58:25, 3.52s/it] 7%|▋ | 73/1068 [04:21<58:31, 3.53s/it] 7%|▋ | 74/1068 [04:24<58:15, 3.52s/it] 7%|▋ | 75/1068 [04:28<58:18, 3.52s/it] 7%|▋ | 75/1068 [04:28<58:18, 3.52s/it] 7%|▋ | 76/1068 [04:31<58:27, 3.54s/it] 7%|▋ | 77/1068 [04:35<58:35, 3.55s/it] 7%|▋ | 78/1068 [04:38<58:24, 3.54s/it] 7%|▋ | 79/1068 [04:42<58:19, 3.54s/it] 7%|▋ | 80/1068 [04:46<58:30, 3.55s/it] 7%|▋ | 80/1068 [04:46<58:30, 3.55s/it] 8%|▊ | 81/1068 [04:49<58:16, 3.54s/it] 8%|▊ | 82/1068 [04:53<58:08, 3.54s/it] 8%|▊ | 83/1068 [04:56<58:06, 3.54s/it] 8%|▊ | 84/1068 [05:00<58:17, 3.55s/it] 8%|▊ | 85/1068 [05:03<58:04, 3.55s/it] 8%|▊ | 85/1068 [05:03<58:04, 3.55s/it] 8%|▊ | 86/1068 [05:07<57:51, 3.53s/it] 8%|▊ | 87/1068 [05:10<58:05, 3.55s/it] 8%|▊ | 88/1068 [05:14<57:54, 3.55s/it] 8%|▊ | 89/1068 [05:17<57:42, 3.54s/it] 8%|▊ | 90/1068 [05:21<57:35, 3.53s/it] 8%|▊ | 90/1068 [05:21<57:35, 3.53s/it] 9%|▊ | 91/1068 [05:24<57:50, 3.55s/it] 9%|▊ | 92/1068 [05:28<57:42, 3.55s/it] 9%|▊ | 93/1068 [05:32<57:34, 3.54s/it] 9%|▉ | 94/1068 [05:35<57:41, 3.55s/it] 9%|▉ | 95/1068 [05:39<57:34, 3.55s/it] 9%|▉ | 95/1068 [05:39<57:34, 3.55s/it] 9%|▉ | 96/1068 [05:42<57:30, 3.55s/it] 9%|▉ | 97/1068 [05:46<57:18, 3.54s/it] 9%|▉ | 98/1068 [05:49<57:29, 3.56s/it] 9%|▉ | 99/1068 [05:53<57:20, 3.55s/it] 9%|▉ | 100/1068 [05:56<57:18, 3.55s/it] 9%|▉ | 100/1068 [05:56<57:18, 3.55s/it] 9%|▉ | 101/1068 [06:00<57:23, 3.56s/it] 10%|▉ | 102/1068 [06:04<57:16, 3.56s/it] 10%|▉ | 103/1068 [06:07<57:02, 3.55s/it] 10%|▉ | 104/1068 [06:11<56:52, 3.54s/it] 10%|▉ | 105/1068 [06:14<57:17, 3.57s/it] 10%|▉ | 105/1068 [06:14<57:17, 3.57s/it] 10%|▉ | 106/1068 [06:18<56:59, 3.55s/it] 10%|█ | 107/1068 [06:21<56:46, 3.54s/it] 10%|█ | 108/1068 [06:25<56:53, 3.56s/it] 10%|█ | 109/1068 [06:28<56:57, 3.56s/it] 10%|█ | 110/1068 [06:32<56:37, 3.55s/it] 10%|█ | 110/1068 [06:32<56:37, 3.55s/it] 10%|█ | 111/1068 [06:35<56:29, 3.54s/it] 10%|█ | 112/1068 [06:39<56:48, 3.56s/it] 11%|█ | 113/1068 [06:43<56:29, 3.55s/it] 11%|█ | 114/1068 [06:46<56:20, 3.54s/it] 11%|█ | 115/1068 [06:50<56:26, 3.55s/it] 11%|█ | 115/1068 [06:50<56:26, 3.55s/it] 11%|█ | 116/1068 [06:53<56:29, 3.56s/it] 11%|█ | 117/1068 [06:57<56:16, 3.55s/it] 11%|█ | 118/1068 [07:00<56:09, 3.55s/it] 11%|█ | 119/1068 [07:04<56:33, 3.58s/it] 11%|█ | 120/1068 [07:08<56:15, 3.56s/it] 11%|█ | 120/1068 [07:08<56:15, 3.56s/it] 11%|█▏ | 121/1068 [07:11<56:04, 3.55s/it] 11%|█▏ | 122/1068 [07:15<56:09, 3.56s/it] 12%|█▏ | 123/1068 [07:18<56:00, 3.56s/it] 12%|█▏ | 124/1068 [07:22<55:43, 3.54s/it] 12%|█▏ | 125/1068 [07:25<55:38, 3.54s/it] 12%|█▏ | 125/1068 [07:25<55:38, 3.54s/it] 12%|█▏ | 126/1068 [07:29<55:46, 3.55s/it] 12%|█▏ | 127/1068 [07:32<55:34, 3.54s/it] 12%|█▏ | 128/1068 [07:36<55:25, 3.54s/it] 12%|█▏ | 129/1068 [07:39<55:27, 3.54s/it] 12%|█▏ | 130/1068 [07:43<55:28, 3.55s/it] 12%|█▏ | 130/1068 [07:43<55:28, 3.55s/it] 12%|█▏ | 131/1068 [07:46<55:12, 3.54s/it] 12%|█▏ | 132/1068 [07:50<55:07, 3.53s/it] 12%|█▏ | 133/1068 [07:54<55:31, 3.56s/it] 13%|█▎ | 134/1068 [07:57<55:13, 3.55s/it] 13%|█▎ | 135/1068 [08:01<54:59, 3.54s/it] 13%|█▎ | 135/1068 [08:01<54:59, 3.54s/it] 13%|█▎ | 136/1068 [08:04<54:58, 3.54s/it] 13%|█▎ | 137/1068 [08:08<55:10, 3.56s/it] 13%|█▎ | 138/1068 [08:11<55:02, 3.55s/it] 13%|█▎ | 139/1068 [08:15<54:49, 3.54s/it] 13%|█▎ | 140/1068 [08:18<54:57, 3.55s/it] 13%|█▎ | 140/1068 [08:18<54:57, 3.55s/it] 13%|█▎ | 141/1068 [08:22<54:48, 3.55s/it] 13%|█▎ | 142/1068 [08:26<54:38, 3.54s/it] 13%|█▎ | 143/1068 [08:29<54:31, 3.54s/it] 13%|█▎ | 144/1068 [08:33<54:33, 3.54s/it] 14%|█▎ | 145/1068 [08:36<54:27, 3.54s/it] 14%|█▎ | 145/1068 [08:36<54:27, 3.54s/it] 14%|█▎ | 146/1068 [08:40<54:15, 3.53s/it] 14%|█▍ | 147/1068 [08:43<54:28, 3.55s/it] 14%|█▍ | 148/1068 [08:47<54:13, 3.54s/it] 14%|█▍ | 149/1068 [08:50<53:54, 3.52s/it] 14%|█▍ | 150/1068 [08:54<53:56, 3.53s/it] 14%|█▍ | 150/1068 [08:54<53:56, 3.53s/it] 14%|█▍ | 151/1068 [08:57<54:11, 3.55s/it] 14%|█▍ | 152/1068 [09:01<53:55, 3.53s/it] 14%|█▍ | 153/1068 [09:04<53:45, 3.53s/it] 14%|█▍ | 154/1068 [09:08<53:52, 3.54s/it] 15%|█▍ | 155/1068 [09:11<53:54, 3.54s/it] 15%|█▍ | 155/1068 [09:11<53:54, 3.54s/it] 15%|█▍ | 156/1068 [09:15<53:46, 3.54s/it] 15%|█▍ | 157/1068 [09:19<53:39, 3.53s/it] 15%|█▍ | 158/1068 [09:22<53:47, 3.55s/it] 15%|█▍ | 159/1068 [09:26<53:38, 3.54s/it] 15%|█▍ | 160/1068 [09:29<53:28, 3.53s/it] 15%|█▍ | 160/1068 [09:29<53:28, 3.53s/it] 15%|█▌ | 161/1068 [09:33<53:25, 3.53s/it] 15%|█▌ | 162/1068 [09:36<53:29, 3.54s/it] 15%|█▌ | 163/1068 [09:40<53:19, 3.53s/it] 15%|█▌ | 164/1068 [09:43<53:12, 3.53s/it] 15%|█▌ | 165/1068 [09:47<53:21, 3.55s/it] 15%|█▌ | 165/1068 [09:47<53:21, 3.55s/it] 16%|█▌ | 166/1068 [09:50<53:10, 3.54s/it] 16%|█▌ | 167/1068 [09:54<52:59, 3.53s/it] 16%|█▌ | 168/1068 [09:57<53:02, 3.54s/it] 16%|█▌ | 169/1068 [10:01<53:08, 3.55s/it] 16%|█▌ | 170/1068 [10:05<52:57, 3.54s/it] 16%|█▌ | 170/1068 [10:05<52:57, 3.54s/it] 16%|█▌ | 171/1068 [10:08<52:48, 3.53s/it] 16%|█▌ | 172/1068 [10:12<53:10, 3.56s/it] 16%|█▌ | 173/1068 [10:15<52:56, 3.55s/it] 16%|█▋ | 174/1068 [10:19<52:42, 3.54s/it] 16%|█▋ | 175/1068 [10:22<52:49, 3.55s/it] 16%|█▋ | 175/1068 [10:22<52:49, 3.55s/it] 16%|█▋ | 176/1068 [10:26<52:47, 3.55s/it] 17%|█▋ | 177/1068 [10:29<52:28, 3.53s/it] 17%|█▋ | 178/1068 [10:33<52:17, 3.53s/it] 17%|█▋ | 179/1068 [10:36<52:31, 3.55s/it] 17%|█▋ | 180/1068 [10:40<52:16, 3.53s/it] 17%|█▋ | 180/1068 [10:40<52:16, 3.53s/it] 17%|█▋ | 181/1068 [10:43<52:11, 3.53s/it] 17%|█▋ | 182/1068 [10:47<52:14, 3.54s/it] 17%|█▋ | 183/1068 [10:51<52:16, 3.54s/it] 17%|█▋ | 184/1068 [10:54<52:06, 3.54s/it] 17%|█▋ | 185/1068 [10:58<52:01, 3.54s/it] 17%|█▋ | 185/1068 [10:58<52:01, 3.54s/it] 17%|█▋ | 186/1068 [11:01<52:22, 3.56s/it] 18%|█▊ | 187/1068 [11:05<52:11, 3.55s/it] 18%|█▊ | 188/1068 [11:08<52:02, 3.55s/it] 18%|█▊ | 189/1068 [11:12<52:05, 3.56s/it] 18%|█▊ | 190/1068 [11:15<52:06, 3.56s/it] 18%|█▊ | 190/1068 [11:15<52:06, 3.56s/it] 18%|█▊ | 191/1068 [11:19<51:52, 3.55s/it] 18%|█▊ | 192/1068 [11:23<51:46, 3.55s/it] 18%|█▊ | 193/1068 [11:26<51:53, 3.56s/it] 18%|█▊ | 194/1068 [11:30<51:38, 3.55s/it] 18%|█▊ | 195/1068 [11:33<51:27, 3.54s/it] 18%|█▊ | 195/1068 [11:33<51:27, 3.54s/it] 18%|█▊ | 196/1068 [11:37<51:25, 3.54s/it] 18%|█▊ | 197/1068 [11:40<51:34, 3.55s/it] 19%|█▊ | 198/1068 [11:44<51:19, 3.54s/it] 19%|█▊ | 199/1068 [11:47<51:20, 3.55s/it] 19%|█▊ | 200/1068 [11:51<51:31, 3.56s/it] 19%|█▊ | 200/1068 [11:51<51:31, 3.56s/it] 19%|█▉ | 201/1068 [11:54<51:16, 3.55s/it] 19%|█▉ | 202/1068 [11:58<51:05, 3.54s/it] 19%|█▉ | 203/1068 [12:02<50:56, 3.53s/it] 19%|█▉ | 204/1068 [12:05<51:00, 3.54s/it] 19%|█▉ | 205/1068 [12:09<50:54, 3.54s/it] 19%|█▉ | 205/1068 [12:09<50:54, 3.54s/it] 19%|█▉ | 206/1068 [12:12<50:46, 3.53s/it] 19%|█▉ | 207/1068 [12:16<50:53, 3.55s/it] 19%|█▉ | 208/1068 [12:19<50:44, 3.54s/it] 20%|█▉ | 209/1068 [12:23<50:34, 3.53s/it] 20%|█▉ | 210/1068 [12:26<50:25, 3.53s/it] 20%|█▉ | 210/1068 [12:26<50:25, 3.53s/it] 20%|█▉ | 211/1068 [12:30<50:33, 3.54s/it] 20%|█▉ | 212/1068 [12:33<50:29, 3.54s/it] 20%|█▉ | 213/1068 [12:37<50:16, 3.53s/it] 20%|██ | 214/1068 [12:40<50:29, 3.55s/it] 20%|██ | 215/1068 [12:44<50:19, 3.54s/it] 20%|██ | 215/1068 [12:44<50:19, 3.54s/it] 20%|██ | 216/1068 [12:47<50:08, 3.53s/it] 20%|██ | 217/1068 [12:51<50:02, 3.53s/it] 20%|██ | 218/1068 [12:55<50:19, 3.55s/it] 21%|██ | 219/1068 [12:58<50:06, 3.54s/it] 21%|██ | 220/1068 [13:02<49:59, 3.54s/it] 21%|██ | 220/1068 [13:02<49:59, 3.54s/it] 21%|██ | 221/1068 [13:05<50:06, 3.55s/it] 21%|██ | 222/1068 [13:09<50:07, 3.55s/it] 21%|██ | 223/1068 [13:12<49:57, 3.55s/it] 21%|██ | 224/1068 [13:16<49:46, 3.54s/it] 21%|██ | 225/1068 [13:19<49:49, 3.55s/it] 21%|██ | 225/1068 [13:19<49:49, 3.55s/it] 21%|██ | 226/1068 [13:23<49:33, 3.53s/it] 21%|██▏ | 227/1068 [13:26<49:25, 3.53s/it] 21%|██▏ | 228/1068 [13:30<49:18, 3.52s/it] 21%|██▏ | 229/1068 [13:33<49:18, 3.53s/it] 22%|██▏ | 230/1068 [13:37<49:09, 3.52s/it] 22%|██▏ | 230/1068 [13:37<49:09, 3.52s/it] 22%|██▏ | 231/1068 [13:40<49:02, 3.52s/it] 22%|██▏ | 232/1068 [13:44<49:19, 3.54s/it] 22%|██▏ | 233/1068 [13:48<49:11, 3.53s/it] 22%|██▏ | 234/1068 [13:51<49:06, 3.53s/it] 22%|██▏ | 235/1068 [13:55<49:10, 3.54s/it] 22%|██▏ | 235/1068 [13:55<49:10, 3.54s/it] 22%|██▏ | 236/1068 [13:58<49:14, 3.55s/it] 22%|██▏ | 237/1068 [14:02<49:07, 3.55s/it] 22%|██▏ | 238/1068 [14:05<48:58, 3.54s/it] 22%|██▏ | 239/1068 [14:09<49:10, 3.56s/it] 22%|██▏ | 240/1068 [14:12<48:57, 3.55s/it] 22%|██▏ | 240/1068 [14:12<48:57, 3.55s/it] 23%|██▎ | 241/1068 [14:16<48:49, 3.54s/it] 23%|██▎ | 242/1068 [14:20<48:47, 3.54s/it] 23%|██▎ | 243/1068 [14:23<48:52, 3.56s/it] 23%|██▎ | 244/1068 [14:27<48:41, 3.55s/it] 23%|██▎ | 245/1068 [14:30<48:35, 3.54s/it] 23%|██▎ | 245/1068 [14:30<48:35, 3.54s/it] 23%|██▎ | 246/1068 [14:34<48:55, 3.57s/it] 23%|██▎ | 247/1068 [14:37<48:42, 3.56s/it] 23%|██▎ | 248/1068 [14:41<48:32, 3.55s/it] 23%|██▎ | 249/1068 [14:44<48:25, 3.55s/it] 23%|██▎ | 250/1068 [14:48<48:31, 3.56s/it] 23%|██▎ | 250/1068 [14:48<48:31, 3.56s/it] 24%|██▎ | 251/1068 [14:52<48:23, 3.55s/it] 24%|██▎ | 252/1068 [14:55<48:09, 3.54s/it] 24%|██▎ | 253/1068 [14:59<48:18, 3.56s/it] 24%|██▍ | 254/1068 [15:02<48:07, 3.55s/it] 24%|██▍ | 255/1068 [15:06<47:55, 3.54s/it] 24%|██▍ | 255/1068 [15:06<47:55, 3.54s/it] 24%|██▍ | 256/1068 [15:09<47:55, 3.54s/it] 24%|██▍ | 257/1068 [15:13<48:00, 3.55s/it] 24%|██▍ | 258/1068 [15:16<47:51, 3.55s/it] 24%|██▍ | 259/1068 [15:20<47:45, 3.54s/it] 24%|██▍ | 260/1068 [15:23<47:53, 3.56s/it] 24%|██▍ | 260/1068 [15:23<47:53, 3.56s/it] 24%|██▍ | 261/1068 [15:27<47:42, 3.55s/it] 25%|██▍ | 262/1068 [15:31<47:31, 3.54s/it] 25%|██▍ | 263/1068 [15:34<47:28, 3.54s/it] 25%|██▍ | 264/1068 [15:38<47:36, 3.55s/it] 25%|██▍ | 265/1068 [15:41<47:29, 3.55s/it] 25%|██▍ | 265/1068 [15:41<47:29, 3.55s/it] 25%|██▍ | 266/1068 [15:45<47:24, 3.55s/it] 25%|██▌ | 267/1068 [15:48<47:37, 3.57s/it] 25%|██▌ | 268/1068 [15:52<47:30, 3.56s/it] 25%|██▌ | 269/1068 [15:55<47:16, 3.55s/it] 25%|██▌ | 270/1068 [15:59<47:04, 3.54s/it] 25%|██▌ | 270/1068 [15:59<47:04, 3.54s/it] 25%|██▌ | 271/1068 [16:03<47:12, 3.55s/it] 25%|██▌ | 272/1068 [16:06<46:57, 3.54s/it] 26%|██▌ | 273/1068 [16:10<46:51, 3.54s/it] 26%|██▌ | 274/1068 [16:13<46:57, 3.55s/it] 26%|██▌ | 275/1068 [16:17<46:55, 3.55s/it] 26%|██▌ | 275/1068 [16:17<46:55, 3.55s/it] 26%|██▌ | 276/1068 [16:20<46:39, 3.54s/it] 26%|██▌ | 277/1068 [16:24<46:33, 3.53s/it] 26%|██▌ | 278/1068 [16:27<46:57, 3.57s/it] 26%|██▌ | 279/1068 [16:31<47:03, 3.58s/it] 26%|██▌ | 280/1068 [16:35<47:09, 3.59s/it] 26%|██▌ | 280/1068 [16:35<47:09, 3.59s/it] 26%|██▋ | 281/1068 [16:38<47:13, 3.60s/it] 26%|██▋ | 282/1068 [16:42<47:12, 3.60s/it] 26%|██▋ | 283/1068 [16:45<47:07, 3.60s/it] 27%|██▋ | 284/1068 [16:49<47:03, 3.60s/it] 27%|██▋ | 285/1068 [16:53<47:16, 3.62s/it] 27%|██▋ | 285/1068 [16:53<47:16, 3.62s/it] 27%|██▋ | 286/1068 [16:56<47:10, 3.62s/it] 27%|██▋ | 287/1068 [17:00<46:50, 3.60s/it] 27%|██▋ | 288/1068 [17:03<46:47, 3.60s/it] 27%|██▋ | 289/1068 [17:07<46:44, 3.60s/it] 27%|██▋ | 290/1068 [17:11<46:38, 3.60s/it] 27%|██▋ | 290/1068 [17:11<46:38, 3.60s/it] 27%|██▋ | 291/1068 [17:14<46:42, 3.61s/it] 27%|██▋ | 292/1068 [17:18<46:39, 3.61s/it] 27%|██▋ | 293/1068 [17:21<46:35, 3.61s/it] 28%|██▊ | 294/1068 [17:25<46:36, 3.61s/it] 28%|██▊ | 295/1068 [17:29<46:41, 3.62s/it] 28%|██▊ | 295/1068 [17:29<46:41, 3.62s/it] 28%|██▊ | 296/1068 [17:32<46:40, 3.63s/it] 28%|██▊ | 297/1068 [17:36<46:32, 3.62s/it] 28%|██▊ | 298/1068 [17:40<46:33, 3.63s/it] 28%|██▊ | 299/1068 [17:43<46:28, 3.63s/it] 28%|██▊ | 300/1068 [17:47<46:22, 3.62s/it] 28%|██▊ | 300/1068 [17:47<46:22, 3.62s/it] 28%|██▊ | 301/1068 [17:50<46:11, 3.61s/it] 28%|██▊ | 302/1068 [17:54<47:23, 3.71s/it] 28%|██▊ | 303/1068 [17:58<46:52, 3.68s/it] 28%|██▊ | 304/1068 [18:02<46:31, 3.65s/it] 29%|██▊ | 305/1068 [18:05<46:23, 3.65s/it] 29%|██▊ | 305/1068 [18:05<46:23, 3.65s/it] 29%|██▊ | 306/1068 [18:09<46:17, 3.64s/it] 29%|██▊ | 307/1068 [18:12<46:04, 3.63s/it] 29%|██▉ | 308/1068 [18:16<45:51, 3.62s/it] 29%|██▉ | 309/1068 [18:20<45:50, 3.62s/it] 29%|██▉ | 310/1068 [18:23<45:35, 3.61s/it] 29%|██▉ | 310/1068 [18:23<45:35, 3.61s/it] 29%|██▉ | 311/1068 [18:27<45:25, 3.60s/it] 29%|██▉ | 312/1068 [18:30<45:25, 3.60s/it] 29%|██▉ | 313/1068 [18:34<45:25, 3.61s/it] 29%|██▉ | 314/1068 [18:38<45:22, 3.61s/it] 29%|██▉ | 315/1068 [18:41<45:18, 3.61s/it] 29%|██▉ | 315/1068 [18:41<45:18, 3.61s/it] 30%|██▉ | 316/1068 [18:45<45:32, 3.63s/it] 30%|██▉ | 317/1068 [18:49<45:23, 3.63s/it] 30%|██▉ | 318/1068 [18:52<45:14, 3.62s/it] 30%|██▉ | 319/1068 [18:56<45:11, 3.62s/it] 30%|██▉ | 320/1068 [18:59<45:02, 3.61s/it] 30%|██▉ | 320/1068 [18:59<45:02, 3.61s/it] 30%|███ | 321/1068 [19:03<44:53, 3.61s/it] 30%|███ | 322/1068 [19:07<44:46, 3.60s/it] 30%|███ | 323/1068 [19:10<44:51, 3.61s/it] 30%|███ | 324/1068 [19:14<44:45, 3.61s/it] 30%|███ | 325/1068 [19:17<44:41, 3.61s/it] 30%|███ | 325/1068 [19:17<44:41, 3.61s/it] 31%|███ | 326/1068 [19:21<44:38, 3.61s/it] 31%|███ | 327/1068 [19:25<44:37, 3.61s/it] 31%|███ | 328/1068 [19:28<44:37, 3.62s/it] 31%|███ | 329/1068 [19:32<44:30, 3.61s/it] 31%|███ | 330/1068 [19:36<44:26, 3.61s/it] 31%|███ | 330/1068 [19:36<44:26, 3.61s/it] 31%|███ | 331/1068 [19:39<44:21, 3.61s/it] 31%|███ | 332/1068 [19:43<44:15, 3.61s/it] 31%|███ | 333/1068 [19:46<44:14, 3.61s/it] 31%|███▏ | 334/1068 [19:50<44:07, 3.61s/it] 31%|███▏ | 335/1068 [19:54<44:09, 3.61s/it] 31%|███▏ | 335/1068 [19:54<44:09, 3.61s/it] 31%|███▏ | 336/1068 [19:57<44:09, 3.62s/it] 32%|███▏ | 337/1068 [20:01<44:06, 3.62s/it] 32%|███▏ | 338/1068 [20:04<43:59, 3.62s/it] 32%|███▏ | 339/1068 [20:08<43:49, 3.61s/it] 32%|███▏ | 340/1068 [20:12<43:57, 3.62s/it] 32%|███▏ | 340/1068 [20:12<43:57, 3.62s/it] 32%|███▏ | 341/1068 [20:15<43:42, 3.61s/it] 32%|███▏ | 342/1068 [20:19<43:31, 3.60s/it] 32%|███▏ | 343/1068 [20:22<43:23, 3.59s/it] 32%|███▏ | 344/1068 [20:26<43:34, 3.61s/it] 32%|███▏ | 345/1068 [20:30<43:28, 3.61s/it] 32%|███▏ | 345/1068 [20:30<43:28, 3.61s/it] 32%|███▏ | 346/1068 [20:33<43:20, 3.60s/it] 32%|███▏ | 347/1068 [20:37<43:20, 3.61s/it] 33%|███▎ | 348/1068 [20:40<43:16, 3.61s/it] 33%|███▎ | 349/1068 [20:44<43:11, 3.60s/it] 33%|███▎ | 350/1068 [20:48<43:08, 3.60s/it] 33%|███▎ | 350/1068 [20:48<43:08, 3.60s/it] 33%|███▎ | 351/1068 [20:51<43:10, 3.61s/it] 33%|███▎ | 352/1068 [20:55<43:05, 3.61s/it] 33%|███▎ | 353/1068 [20:59<42:59, 3.61s/it] 33%|███▎ | 354/1068 [21:02<43:04, 3.62s/it] 33%|███▎ | 355/1068 [21:06<42:48, 3.60s/it] 33%|███▎ | 355/1068 [21:06<42:48, 3.60s/it] 33%|███▎ | 356/1068 [21:09<42:42, 3.60s/it] 33%|███▎ | 357/1068 [21:13<42:38, 3.60s/it] 34%|███▎ | 358/1068 [21:17<42:39, 3.61s/it] 34%|███▎ | 359/1068 [21:20<42:36, 3.61s/it] 34%|███▎ | 360/1068 [21:24<42:34, 3.61s/it] 34%|███▎ | 360/1068 [21:24<42:34, 3.61s/it] 34%|███▍ | 361/1068 [21:27<42:35, 3.61s/it] 34%|███▍ | 362/1068 [21:31<42:19, 3.60s/it] 34%|███▍ | 363/1068 [21:35<42:08, 3.59s/it] 34%|███▍ | 364/1068 [21:38<42:13, 3.60s/it] 34%|███▍ | 365/1068 [21:42<42:07, 3.60s/it] 34%|███▍ | 365/1068 [21:42<42:07, 3.60s/it] 34%|███▍ | 366/1068 [21:45<41:58, 3.59s/it] 34%|███▍ | 367/1068 [21:49<41:47, 3.58s/it] 34%|███▍ | 368/1068 [21:52<41:56, 3.59s/it] 35%|███▍ | 369/1068 [21:56<41:46, 3.59s/it] 35%|███▍ | 370/1068 [22:00<41:35, 3.58s/it] 35%|███▍ | 370/1068 [22:00<41:35, 3.58s/it] 35%|███▍ | 371/1068 [22:03<41:32, 3.58s/it] 35%|███▍ | 372/1068 [22:07<41:41, 3.59s/it] 35%|███▍ | 373/1068 [22:10<41:36, 3.59s/it] 35%|███▌ | 374/1068 [22:14<41:30, 3.59s/it] 35%|███▌ | 375/1068 [22:18<41:34, 3.60s/it] 35%|███▌ | 375/1068 [22:18<41:34, 3.60s/it] 35%|███▌ | 376/1068 [22:21<41:26, 3.59s/it] 35%|███▌ | 377/1068 [22:25<41:16, 3.58s/it] 35%|███▌ | 378/1068 [22:28<41:19, 3.59s/it] 35%|███▌ | 379/1068 [22:32<41:14, 3.59s/it] 36%|███▌ | 380/1068 [22:36<41:07, 3.59s/it] 36%|███▌ | 380/1068 [22:36<41:07, 3.59s/it] 36%|███▌ | 381/1068 [22:39<41:01, 3.58s/it] 36%|███▌ | 382/1068 [22:43<41:05, 3.59s/it] 36%|███▌ | 383/1068 [22:46<40:58, 3.59s/it] 36%|███▌ | 384/1068 [22:50<40:46, 3.58s/it] 36%|███▌ | 385/1068 [22:53<40:45, 3.58s/it] 36%|███▌ | 385/1068 [22:53<40:45, 3.58s/it] 36%|███▌ | 386/1068 [22:57<40:41, 3.58s/it] 36%|███▌ | 387/1068 [23:01<40:33, 3.57s/it] 36%|███▋ | 388/1068 [23:04<40:27, 3.57s/it] 36%|███▋ | 389/1068 [23:08<40:23, 3.57s/it] 37%|███▋ | 390/1068 [23:11<40:23, 3.58s/it] 37%|███▋ | 390/1068 [23:11<40:23, 3.58s/it] 37%|███▋ | 391/1068 [23:15<40:22, 3.58s/it] 37%|███▋ | 392/1068 [23:18<40:14, 3.57s/it] 37%|███▋ | 393/1068 [23:22<40:09, 3.57s/it] 37%|███▋ | 394/1068 [23:26<40:00, 3.56s/it] 37%|███▋ | 395/1068 [23:29<39:52, 3.55s/it] 37%|███▋ | 395/1068 [23:29<39:52, 3.55s/it] 37%|███▋ | 396/1068 [23:33<39:51, 3.56s/it] 37%|███▋ | 397/1068 [23:36<39:46, 3.56s/it] 37%|███▋ | 398/1068 [23:40<39:39, 3.55s/it] 37%|███▋ | 399/1068 [23:43<39:38, 3.56s/it] 37%|███▋ | 400/1068 [23:47<39:36, 3.56s/it] 37%|███▋ | 400/1068 [23:47<39:36, 3.56s/it] 38%|███▊ | 401/1068 [23:50<39:25, 3.55s/it] 38%|███▊ | 402/1068 [23:54<39:18, 3.54s/it] 38%|███▊ | 403/1068 [23:58<39:22, 3.55s/it] 38%|███▊ | 404/1068 [24:01<39:12, 3.54s/it] 38%|███▊ | 405/1068 [24:05<39:02, 3.53s/it] 38%|███▊ | 405/1068 [24:05<39:02, 3.53s/it] 38%|███▊ | 406/1068 [24:08<38:56, 3.53s/it] 38%|███▊ | 407/1068 [24:12<38:53, 3.53s/it] 38%|███▊ | 408/1068 [24:15<38:48, 3.53s/it] 38%|███▊ | 409/1068 [24:19<38:42, 3.52s/it] 38%|███▊ | 410/1068 [24:22<38:52, 3.54s/it] 38%|███▊ | 410/1068 [24:22<38:52, 3.54s/it] 38%|███▊ | 411/1068 [24:26<38:45, 3.54s/it] 39%|███▊ | 412/1068 [24:29<38:37, 3.53s/it] 39%|███▊ | 413/1068 [24:33<38:32, 3.53s/it] 39%|███▉ | 414/1068 [24:36<38:31, 3.53s/it] 39%|███▉ | 415/1068 [24:40<38:31, 3.54s/it] 39%|███▉ | 415/1068 [24:40<38:31, 3.54s/it] 39%|███▉ | 416/1068 [24:43<38:19, 3.53s/it] 39%|███▉ | 417/1068 [24:47<38:20, 3.53s/it] 39%|███▉ | 418/1068 [24:50<38:13, 3.53s/it] 39%|███▉ | 419/1068 [24:54<38:08, 3.53s/it] 39%|███▉ | 420/1068 [24:57<38:02, 3.52s/it] 39%|███▉ | 420/1068 [24:57<38:02, 3.52s/it] 39%|███▉ | 421/1068 [25:01<38:01, 3.53s/it] 40%|███▉ | 422/1068 [25:05<37:55, 3.52s/it] 40%|███▉ | 423/1068 [25:08<37:49, 3.52s/it] 40%|███▉ | 424/1068 [25:12<37:51, 3.53s/it] 40%|███▉ | 425/1068 [25:15<37:44, 3.52s/it] 40%|███▉ | 425/1068 [25:15<37:44, 3.52s/it] 40%|███▉ | 426/1068 [25:19<37:39, 3.52s/it] 40%|███▉ | 427/1068 [25:22<37:35, 3.52s/it] 40%|████ | 428/1068 [25:26<37:46, 3.54s/it] 40%|████ | 429/1068 [25:29<37:38, 3.53s/it] 40%|████ | 430/1068 [25:33<37:35, 3.54s/it] 40%|████ | 430/1068 [25:33<37:35, 3.54s/it] 40%|████ | 431/1068 [25:36<37:48, 3.56s/it] 40%|████ | 432/1068 [25:40<37:52, 3.57s/it] 41%|████ | 433/1068 [25:44<37:43, 3.56s/it] 41%|████ | 434/1068 [25:47<37:38, 3.56s/it] 41%|████ | 435/1068 [25:51<37:40, 3.57s/it] 41%|████ | 435/1068 [25:51<37:40, 3.57s/it] 41%|████ | 436/1068 [25:54<37:30, 3.56s/it] 41%|████ | 437/1068 [25:58<37:22, 3.55s/it] 41%|████ | 438/1068 [26:01<37:19, 3.55s/it] 41%|████ | 439/1068 [26:05<37:10, 3.55s/it] 41%|████ | 440/1068 [26:08<37:00, 3.54s/it] 41%|████ | 440/1068 [26:08<37:00, 3.54s/it] 41%|████▏ | 441/1068 [26:12<36:55, 3.53s/it] 41%|████▏ | 442/1068 [26:15<36:53, 3.54s/it] 41%|████▏ | 443/1068 [26:19<36:41, 3.52s/it] 42%|████▏ | 444/1068 [26:22<36:33, 3.52s/it] 42%|████▏ | 445/1068 [26:26<36:46, 3.54s/it] 42%|████▏ | 445/1068 [26:26<36:46, 3.54s/it] 42%|████▏ | 446/1068 [26:30<36:43, 3.54s/it] 42%|████▏ | 447/1068 [26:33<36:35, 3.54s/it] 42%|████▏ | 448/1068 [26:37<36:26, 3.53s/it] 42%|████▏ | 449/1068 [26:40<36:37, 3.55s/it] 42%|████▏ | 450/1068 [26:44<36:28, 3.54s/it] 42%|████▏ | 450/1068 [26:44<36:28, 3.54s/it] 42%|████▏ | 451/1068 [26:47<36:17, 3.53s/it] 42%|████▏ | 452/1068 [26:51<36:12, 3.53s/it] 42%|████▏ | 453/1068 [26:54<36:13, 3.53s/it] 43%|████▎ | 454/1068 [26:58<36:10, 3.53s/it] 43%|████▎ | 455/1068 [27:01<36:04, 3.53s/it] 43%|████▎ | 455/1068 [27:01<36:04, 3.53s/it] 43%|████▎ | 456/1068 [27:05<36:14, 3.55s/it] 43%|████▎ | 457/1068 [27:08<36:02, 3.54s/it] 43%|████▎ | 458/1068 [27:12<35:57, 3.54s/it] 43%|████▎ | 459/1068 [27:16<35:58, 3.54s/it] 43%|████▎ | 460/1068 [27:19<35:57, 3.55s/it] {'loss': '2.655', 'grad_norm': '296', 'learning_rate': '4e-07', 'epoch': '0.004682'} {'loss': '2.346', 'grad_norm': '324', 'learning_rate': '9e-07', 'epoch': '0.009363'} {'loss': '1.099', 'grad_norm': '302', 'learning_rate': '1.4e-06', 'epoch': '0.01404'} {'loss': '0.1942', 'grad_norm': '52.75', 'learning_rate': '1.9e-06', 'epoch': '0.01873'} {'loss': '0.2491', 'grad_norm': '150', 'learning_rate': '2.4e-06', 'epoch': '0.02341'} {'loss': '0.2163', 'grad_norm': '57.25', 'learning_rate': '2.9e-06', 'epoch': '0.02809'} {'loss': '0.2401', 'grad_norm': '55', 'learning_rate': '3.4e-06', 'epoch': '0.03277'} {'loss': '0.1531', 'grad_norm': '61.5', 'learning_rate': '3.9e-06', 'epoch': '0.03745'} {'loss': '0.2452', 'grad_norm': '42.25', 'learning_rate': '4.4e-06', 'epoch': '0.04213'} {'loss': '0.162', 'grad_norm': '10.19', 'learning_rate': '4.9e-06', 'epoch': '0.04682'} {'loss': '0.0961', 'grad_norm': '18.25', 'learning_rate': '5.4e-06', 'epoch': '0.0515'} {'loss': '0.2541', 'grad_norm': '19.12', 'learning_rate': '5.9e-06', 'epoch': '0.05618'} {'loss': '0.2135', 'grad_norm': '35.5', 'learning_rate': '6.4e-06', 'epoch': '0.06086'} {'loss': '0.1375', 'grad_norm': '48', 'learning_rate': '6.9e-06', 'epoch': '0.06554'} {'loss': '0.1714', 'grad_norm': '46.5', 'learning_rate': '7.4e-06', 'epoch': '0.07022'} {'loss': '0.1436', 'grad_norm': '4.688', 'learning_rate': '7.9e-06', 'epoch': '0.07491'} {'loss': '0.2568', 'grad_norm': '38.75', 'learning_rate': '8.4e-06', 'epoch': '0.07959'} {'loss': '0.3186', 'grad_norm': '126.5', 'learning_rate': '8.9e-06', 'epoch': '0.08427'} {'loss': '0.1503', 'grad_norm': '21', 'learning_rate': '9.4e-06', 'epoch': '0.08895'} {'loss': '0.168', 'grad_norm': '8.688', 'learning_rate': '9.9e-06', 'epoch': '0.09363'} {'loss': '0.1444', 'grad_norm': '35', 'learning_rate': '1e-05', 'epoch': '0.09831'} {'loss': '0.1706', 'grad_norm': '3.609', 'learning_rate': '9.998e-06', 'epoch': '0.103'} {'loss': '0.2116', 'grad_norm': '27.88', 'learning_rate': '9.995e-06', 'epoch': '0.1077'} {'loss': '0.1151', 'grad_norm': '18.38', 'learning_rate': '9.99e-06', 'epoch': '0.1124'} {'loss': '0.1667', 'grad_norm': '18.62', 'learning_rate': '9.985e-06', 'epoch': '0.117'} {'loss': '0.1669', 'grad_norm': '26.75', 'learning_rate': '9.978e-06', 'epoch': '0.1217'} {'loss': '0.1685', 'grad_norm': '13.56', 'learning_rate': '9.97e-06', 'epoch': '0.1264'} {'loss': '0.1635', 'grad_norm': '15.5', 'learning_rate': '9.96e-06', 'epoch': '0.1311'} {'loss': '0.1063', 'grad_norm': '8.875', 'learning_rate': '9.949e-06', 'epoch': '0.1358'} {'loss': '0.112', 'grad_norm': '19.12', 'learning_rate': '9.937e-06', 'epoch': '0.1404'} {'loss': '0.09351', 'grad_norm': '7', 'learning_rate': '9.923e-06', 'epoch': '0.1451'} {'loss': '0.1277', 'grad_norm': '27', 'learning_rate': '9.909e-06', 'epoch': '0.1498'} {'loss': '0.2295', 'grad_norm': '60', 'learning_rate': '9.893e-06', 'epoch': '0.1545'} {'loss': '0.1029', 'grad_norm': '7.812', 'learning_rate': '9.875e-06', 'epoch': '0.1592'} {'loss': '0.101', 'grad_norm': '24.88', 'learning_rate': '9.856e-06', 'epoch': '0.1639'} {'loss': '0.1634', 'grad_norm': '29.62', 'learning_rate': '9.837e-06', 'epoch': '0.1685'} {'loss': '0.1518', 'grad_norm': '6.531', 'learning_rate': '9.815e-06', 'epoch': '0.1732'} {'loss': '0.1536', 'grad_norm': '24.62', 'learning_rate': '9.793e-06', 'epoch': '0.1779'} {'loss': '0.1907', 'grad_norm': '13.62', 'learning_rate': '9.769e-06', 'epoch': '0.1826'} {'loss': '0.106', 'grad_norm': '15.5', 'learning_rate': '9.744e-06', 'epoch': '0.1873'} {'loss': '0.1073', 'grad_norm': '1.57', 'learning_rate': '9.718e-06', 'epoch': '0.1919'} {'loss': '0.1338', 'grad_norm': '15.62', 'learning_rate': '9.69e-06', 'epoch': '0.1966'} {'loss': '0.1441', 'grad_norm': '14.25', 'learning_rate': '9.662e-06', 'epoch': '0.2013'} {'loss': '0.1082', 'grad_norm': '10.44', 'learning_rate': '9.632e-06', 'epoch': '0.206'} {'loss': '0.08477', 'grad_norm': '10.75', 'learning_rate': '9.601e-06', 'epoch': '0.2107'} {'loss': '0.1129', 'grad_norm': '10.31', 'learning_rate': '9.568e-06', 'epoch': '0.2154'} {'loss': '0.1042', 'grad_norm': '9.438', 'learning_rate': '9.535e-06', 'epoch': '0.22'} {'loss': '0.1184', 'grad_norm': '14.69', 'learning_rate': '9.5e-06', 'epoch': '0.2247'} {'loss': '0.2219', 'grad_norm': '16.75', 'learning_rate': '9.464e-06', 'epoch': '0.2294'} {'loss': '0.1003', 'grad_norm': '9.562', 'learning_rate': '9.427e-06', 'epoch': '0.2341'} {'loss': '0.1331', 'grad_norm': '27.5', 'learning_rate': '9.388e-06', 'epoch': '0.2388'} {'loss': '0.1224', 'grad_norm': '8.062', 'learning_rate': '9.349e-06', 'epoch': '0.2434'} {'loss': '0.135', 'grad_norm': '12.12', 'learning_rate': '9.308e-06', 'epoch': '0.2481'} {'loss': '0.08243', 'grad_norm': '3.391', 'learning_rate': '9.267e-06', 'epoch': '0.2528'} {'loss': '0.0796', 'grad_norm': '1.047', 'learning_rate': '9.224e-06', 'epoch': '0.2575'} {'loss': '0.08371', 'grad_norm': '8.25', 'learning_rate': '9.18e-06', 'epoch': '0.2622'} {'loss': '0.1502', 'grad_norm': '10.25', 'learning_rate': '9.135e-06', 'epoch': '0.2669'} {'loss': '0.1022', 'grad_norm': '5.875', 'learning_rate': '9.089e-06', 'epoch': '0.2715'} {'loss': '0.1205', 'grad_norm': '16.12', 'learning_rate': '9.041e-06', 'epoch': '0.2762'} {'loss': '0.0316', 'grad_norm': '0.5547', 'learning_rate': '8.993e-06', 'epoch': '0.2809'} {'loss': '0.0497', 'grad_norm': '9.438', 'learning_rate': '8.944e-06', 'epoch': '0.2856'} {'loss': '0.101', 'grad_norm': '2.906', 'learning_rate': '8.893e-06', 'epoch': '0.2903'} {'loss': '0.1765', 'grad_norm': '0.05054', 'learning_rate': '8.842e-06', 'epoch': '0.2949'} {'loss': '0.08483', 'grad_norm': '0.0835', 'learning_rate': '8.789e-06', 'epoch': '0.2996'} {'loss': '0.21', 'grad_norm': '11.44', 'learning_rate': '8.736e-06', 'epoch': '0.3043'} {'loss': '0.1382', 'grad_norm': '2.391', 'learning_rate': '8.682e-06', 'epoch': '0.309'} {'loss': '0.0964', 'grad_norm': '6.406', 'learning_rate': '8.626e-06', 'epoch': '0.3137'} {'loss': '0.08136', 'grad_norm': '2.312', 'learning_rate': '8.57e-06', 'epoch': '0.3184'} {'loss': '0.07721', 'grad_norm': '9.25', 'learning_rate': '8.513e-06', 'epoch': '0.323'} {'loss': '0.03132', 'grad_norm': '1.414', 'learning_rate': '8.454e-06', 'epoch': '0.3277'} {'loss': '0.2177', 'grad_norm': '8.375', 'learning_rate': '8.395e-06', 'epoch': '0.3324'} {'loss': '0.1092', 'grad_norm': '35', 'learning_rate': '8.335e-06', 'epoch': '0.3371'} {'loss': '0.03281', 'grad_norm': '2.984', 'learning_rate': '8.274e-06', 'epoch': '0.3418'} {'loss': '0.07253', 'grad_norm': '0.2559', 'learning_rate': '8.213e-06', 'epoch': '0.3464'} {'loss': '0.01816', 'grad_norm': '0.6953', 'learning_rate': '8.15e-06', 'epoch': '0.3511'} {'loss': '0.02961', 'grad_norm': '6.25', 'learning_rate': '8.087e-06', 'epoch': '0.3558'} {'loss': '0.1412', 'grad_norm': '0.1416', 'learning_rate': '8.022e-06', 'epoch': '0.3605'} {'loss': '0.05659', 'grad_norm': '14.06', 'learning_rate': '7.957e-06', 'epoch': '0.3652'} {'loss': '0.1281', 'grad_norm': '8.438', 'learning_rate': '7.891e-06', 'epoch': '0.3699'} {'loss': '0.05523', 'grad_norm': '3.75', 'learning_rate': '7.825e-06', 'epoch': '0.3745'} {'loss': '0.08489', 'grad_norm': '9.375', 'learning_rate': '7.758e-06', 'epoch': '0.3792'} {'loss': '0.09931', 'grad_norm': '30.38', 'learning_rate': '7.69e-06', 'epoch': '0.3839'} {'loss': '0.09664', 'grad_norm': '25.25', 'learning_rate': '7.621e-06', 'epoch': '0.3886'} {'loss': '0.1445', 'grad_norm': '10', 'learning_rate': '7.551e-06', 'epoch': '0.3933'} {'loss': '0.04019', 'grad_norm': '5.219', 'learning_rate': '7.481e-06', 'epoch': '0.3979'} {'loss': '0.1422', 'grad_norm': '13.44', 'learning_rate': '7.41e-06', 'epoch': '0.4026'} {'loss': '0.1425', 'grad_norm': '1.25', 'learning_rate': '7.339e-06', 'epoch': '0.4073'} {'loss': '0.09832', 'grad_norm': '12.12', 'learning_rate': '7.267e-06', 'epoch': '0.412'} {'loss': '0.02307', 'grad_norm': '9.438', 'learning_rate': '7.194e-06', 'epoch': '0.4167'} {'loss': '0.1185', 'grad_norm': '14.5', 'learning_rate': '7.121e-06', 'epoch': '0.4213'} {'loss': '0.08526', 'grad_norm': '0.3457', 'learning_rate': '7.048e-06', 'epoch': '0.426'} {'loss': '0.1406', 'grad_norm': '48.25', 'learning_rate': '6.973e-06', 'epoch': '0.4307'} 43%|████▎ | 460/1068 [27:19<35:57, 3.55s/it] 43%|████▎ | 461/1068 [27:23<35:48, 3.54s/it] 43%|████▎ | 462/1068 [27:26<35:47, 3.54s/it] 43%|████▎ | 463/1068 [27:30<35:59, 3.57s/it] 43%|████▎ | 464/1068 [27:33<35:43, 3.55s/it] 44%|████▎ | 465/1068 [27:37<35:37, 3.54s/it] 44%|████▎ | 465/1068 [27:37<35:37, 3.54s/it] 44%|████▎ | 466/1068 [27:40<35:39, 3.55s/it] 44%|████▎ | 467/1068 [27:44<35:44, 3.57s/it] 44%|████▍ | 468/1068 [27:48<35:28, 3.55s/it] 44%|████▍ | 469/1068 [27:51<35:22, 3.54s/it] 44%|████▍ | 470/1068 [27:55<35:28, 3.56s/it] 44%|████▍ | 470/1068 [27:55<35:28, 3.56s/it] 44%|████▍ | 471/1068 [27:58<35:16, 3.55s/it] 44%|████▍ | 472/1068 [28:02<35:07, 3.54s/it] 44%|████▍ | 473/1068 [28:05<35:06, 3.54s/it] 44%|████▍ | 474/1068 [28:09<35:07, 3.55s/it] 44%|████▍ | 475/1068 [28:12<35:00, 3.54s/it] 44%|████▍ | 475/1068 [28:12<35:00, 3.54s/it] 45%|████▍ | 476/1068 [28:16<35:02, 3.55s/it] 45%|████▍ | 477/1068 [28:20<35:08, 3.57s/it] 45%|████▍ | 478/1068 [28:23<34:54, 3.55s/it] 45%|████▍ | 479/1068 [28:26<34:38, 3.53s/it] 45%|████▍ | 480/1068 [28:30<34:30, 3.52s/it] 45%|████▍ | 480/1068 [28:30<34:30, 3.52s/it] 45%|████▌ | 481/1068 [28:34<34:40, 3.54s/it] 45%|████▌ | 482/1068 [28:37<34:34, 3.54s/it] 45%|████▌ | 483/1068 [28:41<34:29, 3.54s/it] 45%|████▌ | 484/1068 [28:44<34:39, 3.56s/it] 45%|████▌ | 485/1068 [28:48<34:36, 3.56s/it] 45%|████▌ | 485/1068 [28:48<34:36, 3.56s/it] 46%|████▌ | 486/1068 [28:51<34:24, 3.55s/it] 46%|████▌ | 487/1068 [28:55<34:17, 3.54s/it] 46%|████▌ | 488/1068 [28:58<34:20, 3.55s/it] 46%|████▌ | 489/1068 [29:02<34:09, 3.54s/it] 46%|████▌ | 490/1068 [29:06<34:08, 3.54s/it] 46%|████▌ | 490/1068 [29:06<34:08, 3.54s/it] 46%|████▌ | 491/1068 [29:09<34:03, 3.54s/it] 46%|████▌ | 492/1068 [29:13<33:58, 3.54s/it] 46%|████▌ | 493/1068 [29:16<33:51, 3.53s/it] 46%|████▋ | 494/1068 [29:20<33:45, 3.53s/it] 46%|████▋ | 495/1068 [29:23<33:52, 3.55s/it] 46%|████▋ | 495/1068 [29:23<33:52, 3.55s/it] 46%|████▋ | 496/1068 [29:27<33:46, 3.54s/it] 47%|████▋ | 497/1068 [29:30<33:38, 3.54s/it] 47%|████▋ | 498/1068 [29:34<33:35, 3.54s/it] 47%|████▋ | 499/1068 [29:37<33:33, 3.54s/it] 47%|████▋ | 500/1068 [29:41<33:23, 3.53s/it] 47%|████▋ | 500/1068 [29:41<33:23, 3.53s/it][INFO|trainer.py:3838] 2026-06-09 01:54:33,314 >> Saving model checkpoint to /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500 [INFO|configuration_utils.py:545] 2026-06-09 01:54:33,327 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/config.json [INFO|configuration_utils.py:874] 2026-06-09 01:54:33,327 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/generation_config.json {'loss': '0.1615', 'grad_norm': '13.25', 'learning_rate': '6.898e-06', 'epoch': '0.4354'} {'loss': '0.03648', 'grad_norm': '1.766', 'learning_rate': '6.823e-06', 'epoch': '0.4401'} {'loss': '0.05077', 'grad_norm': '4.562', 'learning_rate': '6.747e-06', 'epoch': '0.4448'} {'loss': '0.09761', 'grad_norm': '9.875', 'learning_rate': '6.671e-06', 'epoch': '0.4494'} {'loss': '0.04425', 'grad_norm': '9.375', 'learning_rate': '6.594e-06', 'epoch': '0.4541'} {'loss': '0.08306', 'grad_norm': '2.391', 'learning_rate': '6.517e-06', 'epoch': '0.4588'} {'loss': '0.03448', 'grad_norm': '23', 'learning_rate': '6.44e-06', 'epoch': '0.4635'} {'loss': '0.113', 'grad_norm': '3.203', 'learning_rate': '6.362e-06', 'epoch': '0.4682'} Writing model shards: 0%| | 0/1 [00:00> Model weights saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/model.safetensors [INFO|tokenization_utils_base.py:3302] 2026-06-09 01:55:05,638 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 01:55:05,638 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:3302] 2026-06-09 01:55:39,101 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 01:55:39,101 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/tokenizer_config.json [INFO|processing_utils.py:1141] 2026-06-09 01:55:39,296 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/chat_template.jinja [INFO|processing_utils.py:1162] 2026-06-09 01:55:39,297 >> processor saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-500/processor_config.json /usr/local/lib/python3.12/dist-packages/torch/utils/checkpoint.py:232: UserWarning: None of the inputs have requires_grad=True. Gradients will be None check_backward_validity(args) 47%|████▋ | 501/1068 [30:50<3:40:45, 23.36s/it] 47%|████▋ | 502/1068 [30:54<2:44:28, 17.44s/it] 47%|████▋ | 503/1068 [30:58<2:04:58, 13.27s/it] 47%|████▋ | 504/1068 [31:01<1:37:15, 10.35s/it] 47%|████▋ | 505/1068 [31:05<1:17:54, 8.30s/it] 47%|████▋ | 505/1068 [31:05<1:17:54, 8.30s/it] 47%|████▋ | 506/1068 [31:08<1:04:23, 6.87s/it] 47%|████▋ | 507/1068 [31:12<54:54, 5.87s/it] 48%|████▊ | 508/1068 [31:15<48:24, 5.19s/it] 48%|████▊ | 509/1068 [31:19<43:43, 4.69s/it] 48%|████▊ | 510/1068 [31:22<40:20, 4.34s/it] 48%|████▊ | 510/1068 [31:22<40:20, 4.34s/it] 48%|████▊ | 511/1068 [31:26<38:01, 4.10s/it] 48%|████▊ | 512/1068 [31:29<36:21, 3.92s/it] 48%|████▊ | 513/1068 [31:33<35:07, 3.80s/it] 48%|████▊ | 514/1068 [31:36<34:14, 3.71s/it] 48%|████▊ | 515/1068 [31:40<33:44, 3.66s/it] 48%|████▊ | 515/1068 [31:40<33:44, 3.66s/it] 48%|████▊ | 516/1068 [31:44<33:14, 3.61s/it] 48%|████▊ | 517/1068 [31:47<32:52, 3.58s/it] 49%|████▊ | 518/1068 [31:51<32:38, 3.56s/it] 49%|████▊ | 519/1068 [31:54<32:32, 3.56s/it] 49%|████▊ | 520/1068 [31:58<32:21, 3.54s/it] 49%|████▊ | 520/1068 [31:58<32:21, 3.54s/it] 49%|████▉ | 521/1068 [32:01<32:11, 3.53s/it] 49%|████▉ | 522/1068 [32:05<32:08, 3.53s/it] 49%|████▉ | 523/1068 [32:08<32:00, 3.52s/it] 49%|████▉ | 524/1068 [32:12<31:57, 3.52s/it] 49%|████▉ | 525/1068 [32:15<31:49, 3.52s/it] 49%|████▉ | 525/1068 [32:15<31:49, 3.52s/it] 49%|████▉ | 526/1068 [32:19<31:56, 3.54s/it] 49%|████▉ | 527/1068 [32:22<31:49, 3.53s/it] 49%|████▉ | 528/1068 [32:26<31:40, 3.52s/it] 50%|████▉ | 529/1068 [32:29<31:36, 3.52s/it] 50%|████▉ | 530/1068 [32:33<31:32, 3.52s/it] 50%|████▉ | 530/1068 [32:33<31:32, 3.52s/it] 50%|████▉ | 531/1068 [32:36<31:27, 3.51s/it] 50%|████▉ | 532/1068 [32:40<31:22, 3.51s/it] 50%|████▉ | 533/1068 [32:43<31:26, 3.53s/it] 50%|█████ | 534/1068 [32:47<31:19, 3.52s/it] 50%|█████ | 535/1068 [32:50<31:16, 3.52s/it] 50%|█████ | 535/1068 [32:50<31:16, 3.52s/it] 50%|█████ | 536/1068 [32:54<31:13, 3.52s/it] 50%|█████ | 537/1068 [32:58<31:22, 3.55s/it] 50%|█████ | 538/1068 [33:01<31:31, 3.57s/it] 50%|█████ | 539/1068 [33:05<31:28, 3.57s/it] 51%|█████ | 540/1068 [33:08<31:34, 3.59s/it] 51%|█████ | 540/1068 [33:08<31:34, 3.59s/it] 51%|█████ | 541/1068 [33:12<31:33, 3.59s/it] 51%|█████ | 542/1068 [33:15<31:15, 3.57s/it] 51%|█████ | 543/1068 [33:19<31:04, 3.55s/it] 51%|█████ | 544/1068 [33:23<30:59, 3.55s/it] 51%|█████ | 545/1068 [33:26<30:53, 3.54s/it] 51%|█████ | 545/1068 [33:26<30:53, 3.54s/it] 51%|█████ | 546/1068 [33:30<30:43, 3.53s/it] 51%|█████ | 547/1068 [33:33<30:52, 3.56s/it] 51%|█████▏ | 548/1068 [33:37<30:41, 3.54s/it] 51%|█████▏ | 549/1068 [33:40<30:36, 3.54s/it] 51%|█████▏ | 550/1068 [33:44<30:36, 3.55s/it] 51%|█████▏ | 550/1068 [33:44<30:36, 3.55s/it] 52%|█████▏ | 551/1068 [33:47<30:28, 3.54s/it] 52%|█████▏ | 552/1068 [33:51<30:20, 3.53s/it] 52%|█████▏ | 553/1068 [33:54<30:12, 3.52s/it] 52%|█████▏ | 554/1068 [33:58<30:11, 3.52s/it] 52%|█████▏ | 555/1068 [34:01<30:03, 3.51s/it] 52%|█████▏ | 555/1068 [34:01<30:03, 3.51s/it] 52%|█████▏ | 556/1068 [34:05<29:56, 3.51s/it] 52%|█████▏ | 557/1068 [34:08<29:51, 3.51s/it] 52%|█████▏ | 558/1068 [34:12<29:56, 3.52s/it] 52%|█████▏ | 559/1068 [34:15<29:49, 3.52s/it] 52%|█████▏ | 560/1068 [34:19<29:42, 3.51s/it] 52%|█████▏ | 560/1068 [34:19<29:42, 3.51s/it] 53%|█████▎ | 561/1068 [34:22<29:41, 3.51s/it] 53%|█████▎ | 562/1068 [34:26<29:39, 3.52s/it] 53%|█████▎ | 563/1068 [34:29<29:34, 3.51s/it] 53%|█████▎ | 564/1068 [34:33<29:31, 3.51s/it] 53%|█████▎ | 565/1068 [34:37<29:35, 3.53s/it] 53%|█████▎ | 565/1068 [34:37<29:35, 3.53s/it] 53%|█████▎ | 566/1068 [34:40<29:29, 3.52s/it] 53%|█████▎ | 567/1068 [34:44<29:24, 3.52s/it] 53%|█████▎ | 568/1068 [34:47<29:27, 3.54s/it] 53%|█████▎ | 569/1068 [34:51<29:24, 3.54s/it] 53%|█████▎ | 570/1068 [34:54<29:19, 3.53s/it] 53%|█████▎ | 570/1068 [34:54<29:19, 3.53s/it] 53%|█████▎ | 571/1068 [34:58<29:14, 3.53s/it] 54%|█████▎ | 572/1068 [35:01<29:13, 3.54s/it] 54%|█████▎ | 573/1068 [35:05<29:08, 3.53s/it] 54%|█████▎ | 574/1068 [35:08<29:00, 3.52s/it] 54%|█████▍ | 575/1068 [35:12<29:03, 3.54s/it] 54%|█████▍ | 575/1068 [35:12<29:03, 3.54s/it] 54%|█████▍ | 576/1068 [35:15<28:58, 3.53s/it] 54%|█████▍ | 577/1068 [35:19<28:52, 3.53s/it] 54%|█████▍ | 578/1068 [35:22<28:45, 3.52s/it] 54%|█████▍ | 579/1068 [35:26<28:51, 3.54s/it] 54%|█████▍ | 580/1068 [35:29<28:41, 3.53s/it] 54%|█████▍ | 580/1068 [35:29<28:41, 3.53s/it] 54%|█████▍ | 581/1068 [35:33<28:33, 3.52s/it] 54%|█████▍ | 582/1068 [35:36<28:31, 3.52s/it] 55%|█████▍ | 583/1068 [35:40<28:27, 3.52s/it] 55%|█████▍ | 584/1068 [35:44<28:22, 3.52s/it] 55%|█████▍ | 585/1068 [35:47<28:14, 3.51s/it] 55%|█████▍ | 585/1068 [35:47<28:14, 3.51s/it] 55%|█████▍ | 586/1068 [35:51<28:15, 3.52s/it] 55%|█████▍ | 587/1068 [35:54<28:08, 3.51s/it] 55%|█████▌ | 588/1068 [35:58<28:04, 3.51s/it] 55%|█████▌ | 589/1068 [36:01<28:01, 3.51s/it] 55%|█████▌ | 590/1068 [36:05<28:01, 3.52s/it] 55%|█████▌ | 590/1068 [36:05<28:01, 3.52s/it] 55%|█████▌ | 591/1068 [36:08<27:56, 3.51s/it] 55%|█████▌ | 592/1068 [36:12<27:53, 3.52s/it] 56%|█████▌ | 593/1068 [36:15<27:56, 3.53s/it] 56%|█████▌ | 594/1068 [36:19<27:53, 3.53s/it] 56%|█████▌ | 595/1068 [36:22<27:46, 3.52s/it] 56%|█████▌ | 595/1068 [36:22<27:46, 3.52s/it] 56%|█████▌ | 596/1068 [36:26<27:44, 3.53s/it] 56%|█████▌ | 597/1068 [36:29<27:47, 3.54s/it] 56%|█████▌ | 598/1068 [36:33<27:42, 3.54s/it] 56%|█████▌ | 599/1068 [36:36<27:37, 3.53s/it] 56%|█████▌ | 600/1068 [36:40<27:47, 3.56s/it] 56%|█████▌ | 600/1068 [36:40<27:47, 3.56s/it] 56%|█████▋ | 601/1068 [36:44<27:39, 3.55s/it] 56%|█████▋ | 602/1068 [36:47<27:30, 3.54s/it] 56%|█████▋ | 603/1068 [36:51<27:25, 3.54s/it] 57%|█████▋ | 604/1068 [36:54<27:27, 3.55s/it] 57%|█████▋ | 605/1068 [36:58<27:19, 3.54s/it] 57%|█████▋ | 605/1068 [36:58<27:19, 3.54s/it] 57%|█████▋ | 606/1068 [37:01<27:13, 3.53s/it] 57%|█████▋ | 607/1068 [37:05<27:15, 3.55s/it] 57%|█████▋ | 608/1068 [37:08<27:10, 3.55s/it] 57%|█████▋ | 609/1068 [37:12<27:05, 3.54s/it] 57%|█████▋ | 610/1068 [37:15<27:05, 3.55s/it] 57%|█████▋ | 610/1068 [37:15<27:05, 3.55s/it] 57%|█████▋ | 611/1068 [37:19<27:06, 3.56s/it] 57%|█████▋ | 612/1068 [37:23<26:55, 3.54s/it] 57%|█████▋ | 613/1068 [37:26<26:51, 3.54s/it] 57%|█████▋ | 614/1068 [37:30<26:53, 3.55s/it] 58%|█████▊ | 615/1068 [37:33<26:45, 3.54s/it] 58%|█████▊ | 615/1068 [37:33<26:45, 3.54s/it] 58%|█████▊ | 616/1068 [37:37<26:36, 3.53s/it] 58%|█████▊ | 617/1068 [37:40<26:32, 3.53s/it] 58%|█████▊ | 618/1068 [37:44<26:36, 3.55s/it] 58%|█████▊ | 619/1068 [37:47<26:28, 3.54s/it] 58%|█████▊ | 620/1068 [37:51<26:22, 3.53s/it] 58%|█████▊ | 620/1068 [37:51<26:22, 3.53s/it] 58%|█████▊ | 621/1068 [37:54<26:28, 3.55s/it] 58%|█████▊ | 622/1068 [37:58<26:26, 3.56s/it] 58%|█████▊ | 623/1068 [38:02<26:18, 3.55s/it] 58%|█████▊ | 624/1068 [38:05<26:12, 3.54s/it] 59%|█████▊ | 625/1068 [38:09<26:14, 3.55s/it] 59%|█████▊ | 625/1068 [38:09<26:14, 3.55s/it] 59%|█████▊ | 626/1068 [38:12<26:05, 3.54s/it] 59%|█████▊ | 627/1068 [38:16<25:59, 3.54s/it] 59%|█████▉ | 628/1068 [38:19<25:59, 3.54s/it] 59%|█████▉ | 629/1068 [38:23<25:55, 3.54s/it] 59%|█████▉ | 630/1068 [38:26<25:49, 3.54s/it] 59%|█████▉ | 630/1068 [38:26<25:49, 3.54s/it] 59%|█████▉ | 631/1068 [38:30<25:44, 3.53s/it] 59%|█████▉ | 632/1068 [38:33<25:48, 3.55s/it] 59%|█████▉ | 633/1068 [38:37<25:41, 3.54s/it] 59%|█████▉ | 634/1068 [38:40<25:35, 3.54s/it] 59%|█████▉ | 635/1068 [38:44<25:32, 3.54s/it] 59%|█████▉ | 635/1068 [38:44<25:32, 3.54s/it] 60%|█████▉ | 636/1068 [38:48<25:28, 3.54s/it] 60%|█████▉ | 637/1068 [38:51<25:26, 3.54s/it] 60%|█████▉ | 638/1068 [38:55<25:19, 3.53s/it] 60%|█████▉ | 639/1068 [38:58<25:24, 3.55s/it] 60%|█████▉ | 640/1068 [39:02<25:18, 3.55s/it] 60%|█████▉ | 640/1068 [39:02<25:18, 3.55s/it] 60%|██████ | 641/1068 [39:05<25:11, 3.54s/it] 60%|██████ | 642/1068 [39:09<25:04, 3.53s/it] 60%|██████ | 643/1068 [39:12<25:01, 3.53s/it] 60%|██████ | 644/1068 [39:16<24:55, 3.53s/it] 60%|██████ | 645/1068 [39:19<24:47, 3.52s/it] 60%|██████ | 645/1068 [39:19<24:47, 3.52s/it] 60%|██████ | 646/1068 [39:23<24:49, 3.53s/it] 61%|██████ | 647/1068 [39:26<24:44, 3.53s/it] 61%|██████ | 648/1068 [39:30<24:39, 3.52s/it] 61%|██████ | 649/1068 [39:33<24:38, 3.53s/it] 61%|██████ | 650/1068 [39:37<24:34, 3.53s/it] 61%|██████ | 650/1068 [39:37<24:34, 3.53s/it] 61%|██████ | 651/1068 [39:40<24:27, 3.52s/it] 61%|██████ | 652/1068 [39:44<24:21, 3.51s/it] 61%|██████ | 653/1068 [39:48<24:21, 3.52s/it] 61%|██████ | 654/1068 [39:51<24:17, 3.52s/it] 61%|██████▏ | 655/1068 [39:55<24:10, 3.51s/it] 61%|██████▏ | 655/1068 [39:55<24:10, 3.51s/it] 61%|██████▏ | 656/1068 [39:58<24:08, 3.52s/it] 62%|██████▏ | 657/1068 [40:02<24:05, 3.52s/it] 62%|██████▏ | 658/1068 [40:05<24:00, 3.51s/it] 62%|██████▏ | 659/1068 [40:09<23:53, 3.51s/it] 62%|██████▏ | 660/1068 [40:12<23:56, 3.52s/it] 62%|██████▏ | 660/1068 [40:12<23:56, 3.52s/it] 62%|██████▏ | 661/1068 [40:16<23:51, 3.52s/it] 62%|██████▏ | 662/1068 [40:19<23:48, 3.52s/it] 62%|██████▏ | 663/1068 [40:23<23:44, 3.52s/it] 62%|██████▏ | 664/1068 [40:26<23:42, 3.52s/it] 62%|██████▏ | 665/1068 [40:30<23:36, 3.52s/it] 62%|██████▏ | 665/1068 [40:30<23:36, 3.52s/it] 62%|██████▏ | 666/1068 [40:33<23:31, 3.51s/it] 62%|██████▏ | 667/1068 [40:37<23:28, 3.51s/it] 63%|██████▎ | 668/1068 [40:40<23:25, 3.51s/it] 63%|██████▎ | 669/1068 [40:44<23:21, 3.51s/it] 63%|██████▎ | 670/1068 [40:47<23:16, 3.51s/it] 63%|██████▎ | 670/1068 [40:47<23:16, 3.51s/it] 63%|██████▎ | 671/1068 [40:51<23:15, 3.52s/it] 63%|██████▎ | 672/1068 [40:54<23:11, 3.51s/it] 63%|██████▎ | 673/1068 [40:58<23:07, 3.51s/it] 63%|██████▎ | 674/1068 [41:01<23:04, 3.51s/it] 63%|██████▎ | 675/1068 [41:05<23:03, 3.52s/it] 63%|██████▎ | 675/1068 [41:05<23:03, 3.52s/it] 63%|██████▎ | 676/1068 [41:08<22:57, 3.51s/it] 63%|██████▎ | 677/1068 [41:12<22:54, 3.52s/it] 63%|██████▎ | 678/1068 [41:15<22:55, 3.53s/it] 64%|██████▎ | 679/1068 [41:19<22:50, 3.52s/it] 64%|██████▎ | 680/1068 [41:22<22:44, 3.52s/it] 64%|██████▎ | 680/1068 [41:22<22:44, 3.52s/it] 64%|██████▍ | 681/1068 [41:26<22:44, 3.52s/it] 64%|██████▍ | 682/1068 [41:29<22:41, 3.53s/it] 64%|██████▍ | 683/1068 [41:33<22:34, 3.52s/it] 64%|██████▍ | 684/1068 [41:37<22:30, 3.52s/it] 64%|██████▍ | 685/1068 [41:40<22:30, 3.53s/it] 64%|██████▍ | 685/1068 [41:40<22:30, 3.53s/it] 64%|██████▍ | 686/1068 [41:44<22:25, 3.52s/it] 64%|██████▍ | 687/1068 [41:47<22:20, 3.52s/it] 64%|██████▍ | 688/1068 [41:51<22:16, 3.52s/it] 65%|██████▍ | 689/1068 [41:54<22:15, 3.52s/it] 65%|██████▍ | 690/1068 [41:58<22:11, 3.52s/it] 65%|██████▍ | 690/1068 [41:58<22:11, 3.52s/it] 65%|██████▍ | 691/1068 [42:01<22:08, 3.52s/it] 65%|██████▍ | 692/1068 [42:05<22:06, 3.53s/it] 65%|██████▍ | 693/1068 [42:08<22:00, 3.52s/it] 65%|██████▍ | 694/1068 [42:12<21:55, 3.52s/it] 65%|██████▌ | 695/1068 [42:15<21:52, 3.52s/it] 65%|██████▌ | 695/1068 [42:15<21:52, 3.52s/it] 65%|██████▌ | 696/1068 [42:19<21:52, 3.53s/it] 65%|██████▌ | 697/1068 [42:22<21:47, 3.52s/it] 65%|██████▌ | 698/1068 [42:26<21:42, 3.52s/it] 65%|██████▌ | 699/1068 [42:29<21:43, 3.53s/it] 66%|██████▌ | 700/1068 [42:33<21:39, 3.53s/it] 66%|██████▌ | 700/1068 [42:33<21:39, 3.53s/it] 66%|██████▌ | 701/1068 [42:36<21:33, 3.52s/it] 66%|██████▌ | 702/1068 [42:40<21:27, 3.52s/it] 66%|██████▌ | 703/1068 [42:43<21:26, 3.52s/it] 66%|██████▌ | 704/1068 [42:47<21:21, 3.52s/it] 66%|██████▌ | 705/1068 [42:50<21:16, 3.52s/it] 66%|██████▌ | 705/1068 [42:50<21:16, 3.52s/it] 66%|██████▌ | 706/1068 [42:54<21:14, 3.52s/it] 66%|██████▌ | 707/1068 [42:58<21:13, 3.53s/it] 66%|██████▋ | 708/1068 [43:01<21:09, 3.53s/it] 66%|██████▋ | 709/1068 [43:05<21:05, 3.52s/it] 66%|██████▋ | 710/1068 [43:08<21:02, 3.53s/it] 66%|██████▋ | 710/1068 [43:08<21:02, 3.53s/it] 67%|██████▋ | 711/1068 [43:12<20:58, 3.53s/it] 67%|██████▋ | 712/1068 [43:15<20:56, 3.53s/it] 67%|██████▋ | 713/1068 [43:19<20:52, 3.53s/it] 67%|██████▋ | 714/1068 [43:22<20:48, 3.53s/it] 67%|██████▋ | 715/1068 [43:26<20:43, 3.52s/it] 67%|██████▋ | 715/1068 [43:26<20:43, 3.52s/it] 67%|██████▋ | 716/1068 [43:29<20:39, 3.52s/it] 67%|██████▋ | 717/1068 [43:33<20:42, 3.54s/it] 67%|██████▋ | 718/1068 [43:36<20:35, 3.53s/it] 67%|██████▋ | 719/1068 [43:40<20:27, 3.52s/it] 67%|██████▋ | 720/1068 [43:43<20:24, 3.52s/it] 67%|██████▋ | 720/1068 [43:43<20:24, 3.52s/it] 68%|██████▊ | 721/1068 [43:47<20:23, 3.53s/it] 68%|██████▊ | 722/1068 [43:50<20:16, 3.52s/it] 68%|██████▊ | 723/1068 [43:54<20:11, 3.51s/it] 68%|██████▊ | 724/1068 [43:57<20:12, 3.52s/it] 68%|██████▊ | 725/1068 [44:01<20:05, 3.51s/it] 68%|██████▊ | 725/1068 [44:01<20:05, 3.51s/it] 68%|██████▊ | 726/1068 [44:04<19:59, 3.51s/it] 68%|██████▊ | 727/1068 [44:08<19:55, 3.51s/it] 68%|██████▊ | 728/1068 [44:12<19:57, 3.52s/it] 68%|██████▊ | 729/1068 [44:15<19:53, 3.52s/it] 68%|██████▊ | 730/1068 [44:19<19:48, 3.52s/it] 68%|██████▊ | 730/1068 [44:19<19:48, 3.52s/it] 68%|██████▊ | 731/1068 [44:22<19:45, 3.52s/it] 69%|██████▊ | 732/1068 [44:26<19:43, 3.52s/it] 69%|██████▊ | 733/1068 [44:29<19:38, 3.52s/it] 69%|██████▊ | 734/1068 [44:33<19:33, 3.51s/it] 69%|██████▉ | 735/1068 [44:36<19:34, 3.53s/it] 69%|██████▉ | 735/1068 [44:36<19:34, 3.53s/it] 69%|██████▉ | 736/1068 [44:40<19:29, 3.52s/it] 69%|██████▉ | 737/1068 [44:43<19:24, 3.52s/it] 69%|██████▉ | 738/1068 [44:47<19:20, 3.52s/it] 69%|██████▉ | 739/1068 [44:50<19:16, 3.52s/it] 69%|██████▉ | 740/1068 [44:54<19:13, 3.52s/it] 69%|██████▉ | 740/1068 [44:54<19:13, 3.52s/it] 69%|██████▉ | 741/1068 [44:57<19:09, 3.51s/it] 69%|██████▉ | 742/1068 [45:01<19:09, 3.52s/it] 70%|██████▉ | 743/1068 [45:04<19:03, 3.52s/it] 70%|██████▉ | 744/1068 [45:08<18:58, 3.51s/it] 70%|██████▉ | 745/1068 [45:11<18:55, 3.52s/it] 70%|██████▉ | 745/1068 [45:11<18:55, 3.52s/it] 70%|██████▉ | 746/1068 [45:15<18:52, 3.52s/it] 70%|██████▉ | 747/1068 [45:18<18:49, 3.52s/it] 70%|███████ | 748/1068 [45:22<18:44, 3.51s/it] 70%|███████ | 749/1068 [45:25<18:45, 3.53s/it] 70%|███████ | 750/1068 [45:29<18:41, 3.53s/it] 70%|███████ | 750/1068 [45:29<18:41, 3.53s/it] 70%|███████ | 751/1068 [45:32<18:36, 3.52s/it] 70%|███████ | 752/1068 [45:36<18:31, 3.52s/it] 71%|███████ | 753/1068 [45:39<18:28, 3.52s/it] 71%|███████ | 754/1068 [45:43<18:22, 3.51s/it] 71%|███████ | 755/1068 [45:46<18:17, 3.51s/it] 71%|███████ | 755/1068 [45:46<18:17, 3.51s/it] 71%|███████ | 756/1068 [45:50<18:17, 3.52s/it] 71%|███████ | 757/1068 [45:54<18:13, 3.52s/it] 71%|███████ | 758/1068 [45:57<18:09, 3.51s/it] 71%|███████ | 759/1068 [46:01<18:08, 3.52s/it] 71%|███████ | 760/1068 [46:04<18:05, 3.53s/it] 71%|███████ | 760/1068 [46:04<18:05, 3.53s/it] 71%|███████▏ | 761/1068 [46:08<18:00, 3.52s/it] 71%|███████▏ | 762/1068 [46:11<17:57, 3.52s/it] 71%|███████▏ | 763/1068 [46:15<17:55, 3.53s/it] 72%|███████▏ | 764/1068 [46:18<17:53, 3.53s/it] 72%|███████▏ | 765/1068 [46:22<17:47, 3.52s/it] 72%|███████▏ | 765/1068 [46:22<17:47, 3.52s/it] 72%|███████▏ | 766/1068 [46:25<17:45, 3.53s/it] 72%|███████▏ | 767/1068 [46:29<17:46, 3.54s/it] 72%|███████▏ | 768/1068 [46:32<17:40, 3.54s/it] 72%|███████▏ | 769/1068 [46:36<17:36, 3.53s/it] 72%|███████▏ | 770/1068 [46:39<17:37, 3.55s/it] 72%|███████▏ | 770/1068 [46:39<17:37, 3.55s/it] 72%|███████▏ | 771/1068 [46:43<17:33, 3.55s/it] 72%|███████▏ | 772/1068 [46:47<17:27, 3.54s/it] 72%|███████▏ | 773/1068 [46:50<17:21, 3.53s/it] 72%|███████▏ | 774/1068 [46:54<17:25, 3.55s/it] 73%|███████▎ | 775/1068 [46:57<17:19, 3.55s/it] 73%|███████▎ | 775/1068 [46:57<17:19, 3.55s/it] 73%|███████▎ | 776/1068 [47:01<17:12, 3.54s/it] 73%|███████▎ | 777/1068 [47:04<17:12, 3.55s/it] 73%|███████▎ | 778/1068 [47:08<17:08, 3.55s/it] 73%|███████▎ | 779/1068 [47:11<17:03, 3.54s/it] 73%|███████▎ | 780/1068 [47:15<16:59, 3.54s/it] 73%|███████▎ | 780/1068 [47:15<16:59, 3.54s/it] 73%|███████▎ | 781/1068 [47:19<17:07, 3.58s/it] 73%|███████▎ | 782/1068 [47:22<16:58, 3.56s/it] 73%|███████▎ | 783/1068 [47:26<16:52, 3.55s/it] 73%|███████▎ | 784/1068 [47:29<16:50, 3.56s/it] 74%|███████▎ | 785/1068 [47:33<16:45, 3.55s/it] 74%|███████▎ | 785/1068 [47:33<16:45, 3.55s/it] 74%|███████▎ | 786/1068 [47:36<16:39, 3.54s/it] 74%|███████▎ | 787/1068 [47:40<16:34, 3.54s/it] 74%|███████▍ | 788/1068 [47:43<16:38, 3.57s/it] 74%|███████▍ | 789/1068 [47:47<16:30, 3.55s/it] 74%|███████▍ | 790/1068 [47:50<16:24, 3.54s/it] 74%|███████▍ | 790/1068 [47:50<16:24, 3.54s/it] 74%|███████▍ | 791/1068 [47:54<16:22, 3.55s/it] 74%|███████▍ | 792/1068 [47:58<16:19, 3.55s/it] 74%|███████▍ | 793/1068 [48:01<16:14, 3.54s/it] 74%|███████▍ | 794/1068 [48:05<16:08, 3.53s/it] 74%|███████▍ | 795/1068 [48:08<16:11, 3.56s/it] 74%|███████▍ | 795/1068 [48:08<16:11, 3.56s/it] 75%|███████▍ | 796/1068 [48:12<16:05, 3.55s/it] 75%|███████▍ | 797/1068 [48:15<16:00, 3.55s/it] 75%|███████▍ | 798/1068 [48:19<16:01, 3.56s/it] 75%|███████▍ | 799/1068 [48:22<15:58, 3.56s/it] 75%|███████▍ | 800/1068 [48:26<15:52, 3.55s/it] 75%|███████▍ | 800/1068 [48:26<15:52, 3.55s/it] 75%|███████▌ | 801/1068 [48:29<15:46, 3.55s/it] 75%|███████▌ | 802/1068 [48:33<15:45, 3.56s/it] 75%|███████▌ | 803/1068 [48:37<15:39, 3.55s/it] 75%|███████▌ | 804/1068 [48:40<15:35, 3.54s/it] 75%|███████▌ | 805/1068 [48:44<15:30, 3.54s/it] 75%|███████▌ | 805/1068 [48:44<15:30, 3.54s/it] 75%|███████▌ | 806/1068 [48:47<15:27, 3.54s/it] 76%|███████▌ | 807/1068 [48:51<15:22, 3.53s/it] 76%|███████▌ | 808/1068 [48:54<15:18, 3.53s/it] 76%|███████▌ | 809/1068 [48:58<15:19, 3.55s/it] 76%|███████▌ | 810/1068 [49:01<15:14, 3.55s/it] 76%|███████▌ | 810/1068 [49:01<15:14, 3.55s/it] 76%|███████▌ | 811/1068 [49:05<15:09, 3.54s/it] 76%|███████▌ | 812/1068 [49:08<15:06, 3.54s/it] 76%|███████▌ | 813/1068 [49:12<15:08, 3.56s/it] 76%|███████▌ | 814/1068 [49:16<15:01, 3.55s/it] 76%|███████▋ | 815/1068 [49:19<14:57, 3.55s/it] 76%|███████▋ | 815/1068 [49:19<14:57, 3.55s/it] 76%|███████▋ | 816/1068 [49:23<14:54, 3.55s/it] 76%|███████▋ | 817/1068 [49:26<14:49, 3.54s/it] 77%|███████▋ | 818/1068 [49:30<14:43, 3.54s/it] 77%|███████▋ | 819/1068 [49:33<14:40, 3.54s/it] 77%|███████▋ | 820/1068 [49:37<14:39, 3.55s/it] 77%|███████▋ | 820/1068 [49:37<14:39, 3.55s/it] 77%|███████▋ | 821/1068 [49:40<14:34, 3.54s/it] 77%|███████▋ | 822/1068 [49:44<14:29, 3.53s/it] 77%|███████▋ | 823/1068 [49:47<14:28, 3.54s/it] 77%|███████▋ | 824/1068 [49:51<14:24, 3.54s/it] 77%|███████▋ | 825/1068 [49:55<14:18, 3.53s/it] 77%|███████▋ | 825/1068 [49:55<14:18, 3.53s/it] 77%|███████▋ | 826/1068 [49:58<14:14, 3.53s/it] 77%|███████▋ | 827/1068 [50:02<14:13, 3.54s/it] 78%|███████▊ | 828/1068 [50:05<14:07, 3.53s/it] 78%|███████▊ | 829/1068 [50:09<14:02, 3.52s/it] 78%|███████▊ | 830/1068 [50:12<14:01, 3.53s/it] 78%|███████▊ | 830/1068 [50:12<14:01, 3.53s/it] 78%|███████▊ | 831/1068 [50:16<13:57, 3.53s/it] 78%|███████▊ | 832/1068 [50:19<13:53, 3.53s/it] 78%|███████▊ | 833/1068 [50:23<13:47, 3.52s/it] 78%|███████▊ | 834/1068 [50:26<13:47, 3.54s/it] 78%|███████▊ | 835/1068 [50:30<13:42, 3.53s/it] 78%|███████▊ | 835/1068 [50:30<13:42, 3.53s/it] 78%|███████▊ | 836/1068 [50:33<13:37, 3.53s/it] 78%|███████▊ | 837/1068 [50:37<13:37, 3.54s/it] 78%|███████▊ | 838/1068 [50:40<13:33, 3.54s/it] 79%|███████▊ | 839/1068 [50:44<13:29, 3.53s/it] 79%|███████▊ | 840/1068 [50:47<13:23, 3.53s/it] 79%|███████▊ | 840/1068 [50:47<13:23, 3.53s/it] 79%|███████▊ | 841/1068 [50:51<13:22, 3.53s/it] 79%|███████▉ | 842/1068 [50:55<13:17, 3.53s/it] 79%|███████▉ | 843/1068 [50:58<13:12, 3.52s/it] 79%|███████▉ | 844/1068 [51:02<13:09, 3.52s/it] 79%|███████▉ | 845/1068 [51:05<13:07, 3.53s/it] 79%|███████▉ | 845/1068 [51:05<13:07, 3.53s/it] 79%|███████▉ | 846/1068 [51:09<13:02, 3.52s/it] 79%|███████▉ | 847/1068 [51:12<12:58, 3.52s/it] 79%|███████▉ | 848/1068 [51:16<12:59, 3.54s/it] 79%|███████▉ | 849/1068 [51:19<12:53, 3.53s/it] 80%|███████▉ | 850/1068 [51:23<12:49, 3.53s/it] 80%|███████▉ | 850/1068 [51:23<12:49, 3.53s/it] 80%|███████▉ | 851/1068 [51:26<12:48, 3.54s/it] 80%|███████▉ | 852/1068 [51:30<12:45, 3.54s/it] 80%|███████▉ | 853/1068 [51:33<12:40, 3.54s/it] 80%|███████▉ | 854/1068 [51:37<12:34, 3.53s/it] 80%|████████ | 855/1068 [51:40<12:33, 3.54s/it] 80%|████████ | 855/1068 [51:40<12:33, 3.54s/it] 80%|████████ | 856/1068 [51:44<12:26, 3.52s/it] 80%|████████ | 857/1068 [51:47<12:22, 3.52s/it] 80%|████████ | 858/1068 [51:51<12:18, 3.52s/it] 80%|████████ | 859/1068 [51:55<12:16, 3.52s/it] 81%|████████ | 860/1068 [51:58<12:10, 3.51s/it] 81%|████████ | 860/1068 [51:58<12:10, 3.51s/it] 81%|████████ | 861/1068 [52:02<12:07, 3.51s/it] 81%|████████ | 862/1068 [52:05<12:04, 3.52s/it] 81%|████████ | 863/1068 [52:09<11:59, 3.51s/it] 81%|████████ | 864/1068 [52:12<11:56, 3.51s/it] 81%|████████ | 865/1068 [52:16<11:53, 3.52s/it] 81%|████████ | 865/1068 [52:16<11:53, 3.52s/it] 81%|████████ | 866/1068 [52:19<11:53, 3.53s/it] 81%|████████ | 867/1068 [52:23<11:48, 3.53s/it] 81%|████████▏ | 868/1068 [52:26<11:42, 3.51s/it] 81%|████████▏ | 869/1068 [52:30<11:39, 3.51s/it] 81%|████████▏ | 870/1068 [52:33<11:36, 3.52s/it] 81%|████████▏ | 870/1068 [52:33<11:36, 3.52s/it] 82%|████████▏ | 871/1068 [52:37<11:33, 3.52s/it] 82%|████████▏ | 872/1068 [52:40<11:34, 3.54s/it] 82%|████████▏ | 873/1068 [52:44<11:39, 3.59s/it] 82%|████████▏ | 874/1068 [52:48<11:36, 3.59s/it] 82%|████████▏ | 875/1068 [52:51<11:35, 3.60s/it] 82%|████████▏ | 875/1068 [52:51<11:35, 3.60s/it] 82%|████████▏ | 876/1068 [52:55<11:34, 3.62s/it] 82%|████████▏ | 877/1068 [52:58<11:30, 3.61s/it] 82%|████████▏ | 878/1068 [53:02<11:24, 3.60s/it] 82%|████████▏ | 879/1068 [53:06<11:20, 3.60s/it] 82%|████████▏ | 880/1068 [53:09<11:20, 3.62s/it] 82%|████████▏ | 880/1068 [53:09<11:20, 3.62s/it] 82%|████████▏ | 881/1068 [53:13<11:16, 3.62s/it] 83%|████████▎ | 882/1068 [53:17<11:11, 3.61s/it] 83%|████████▎ | 883/1068 [53:20<11:07, 3.61s/it] 83%|████████▎ | 884/1068 [53:24<11:01, 3.60s/it] 83%|████████▎ | 885/1068 [53:27<10:58, 3.60s/it] 83%|████████▎ | 885/1068 [53:27<10:58, 3.60s/it] 83%|████████▎ | 886/1068 [53:31<10:54, 3.60s/it] 83%|████████▎ | 887/1068 [53:35<10:53, 3.61s/it] 83%|████████▎ | 888/1068 [53:38<10:48, 3.60s/it] 83%|████████▎ | 889/1068 [53:42<10:45, 3.61s/it] 83%|████████▎ | 890/1068 [53:45<10:46, 3.63s/it] 83%|████████▎ | 890/1068 [53:45<10:46, 3.63s/it] 83%|████████▎ | 891/1068 [53:49<10:42, 3.63s/it] 84%|████████▎ | 892/1068 [53:53<10:38, 3.63s/it] 84%|████████▎ | 893/1068 [53:56<10:36, 3.64s/it] 84%|████████▎ | 894/1068 [54:00<10:30, 3.62s/it] 84%|████████▍ | 895/1068 [54:03<10:23, 3.60s/it] 84%|████████▍ | 895/1068 [54:03<10:23, 3.60s/it] 84%|████████▍ | 896/1068 [54:07<10:18, 3.60s/it] 84%|████████▍ | 897/1068 [54:11<10:17, 3.61s/it] 84%|████████▍ | 898/1068 [54:14<10:12, 3.61s/it] 84%|████████▍ | 899/1068 [54:18<10:10, 3.61s/it] 84%|████████▍ | 900/1068 [54:22<10:08, 3.62s/it] 84%|████████▍ | 900/1068 [54:22<10:08, 3.62s/it] 84%|████████▍ | 901/1068 [54:25<10:03, 3.61s/it] 84%|████████▍ | 902/1068 [54:29<09:57, 3.60s/it] 85%|████████▍ | 903/1068 [54:32<09:53, 3.60s/it] 85%|████████▍ | 904/1068 [54:36<09:51, 3.61s/it] 85%|████████▍ | 905/1068 [54:40<09:47, 3.60s/it] 85%|████████▍ | 905/1068 [54:40<09:47, 3.60s/it] 85%|████████▍ | 906/1068 [54:43<09:43, 3.60s/it] 85%|████████▍ | 907/1068 [54:47<09:40, 3.60s/it] 85%|████████▌ | 908/1068 [54:50<09:36, 3.60s/it] 85%|████████▌ | 909/1068 [54:54<09:32, 3.60s/it] 85%|████████▌ | 910/1068 [54:58<09:28, 3.60s/it] 85%|████████▌ | 910/1068 [54:58<09:28, 3.60s/it] 85%|████████▌ | 911/1068 [55:01<09:25, 3.60s/it] 85%|████████▌ | 912/1068 [55:05<09:22, 3.61s/it] 85%|████████▌ | 913/1068 [55:08<09:17, 3.60s/it] 86%|████████▌ | 914/1068 [55:12<09:17, 3.62s/it] 86%|████████▌ | 915/1068 [55:16<09:13, 3.62s/it] 86%|████████▌ | 915/1068 [55:16<09:13, 3.62s/it] 86%|████████▌ | 916/1068 [55:19<09:09, 3.62s/it] 86%|████████▌ | 917/1068 [55:23<09:06, 3.62s/it] 86%|████████▌ | 918/1068 [55:27<09:05, 3.64s/it] 86%|████████▌ | 919/1068 [55:30<09:00, 3.63s/it] 86%|████████▌ | 920/1068 [55:34<08:55, 3.62s/it] 86%|████████▌ | 920/1068 [55:34<08:55, 3.62s/it] 86%|████████▌ | 921/1068 [55:37<08:55, 3.64s/it] 86%|████████▋ | 922/1068 [55:41<08:51, 3.64s/it] 86%|████████▋ | 923/1068 [55:45<08:46, 3.63s/it] 87%|████████▋ | 924/1068 [55:48<08:42, 3.63s/it] 87%|████████▋ | 925/1068 [55:52<08:40, 3.64s/it] 87%|████████▋ | 925/1068 [55:52<08:40, 3.64s/it] 87%|████████▋ | 926/1068 [55:56<08:36, 3.64s/it] 87%|████████▋ | 927/1068 [55:59<08:31, 3.62s/it] 87%|████████▋ | 928/1068 [56:03<08:29, 3.64s/it] 87%|████████▋ | 929/1068 [56:07<08:24, 3.63s/it] 87%|████████▋ | 930/1068 [56:10<08:19, 3.62s/it] 87%|████████▋ | 930/1068 [56:10<08:19, 3.62s/it] 87%|████████▋ | 931/1068 [56:14<08:17, 3.63s/it] 87%|████████▋ | 932/1068 [56:17<08:13, 3.63s/it] 87%|████████▋ | 933/1068 [56:21<08:10, 3.63s/it] 87%|████████▋ | 934/1068 [56:25<08:05, 3.62s/it] 88%|████████▊ | 935/1068 [56:28<08:05, 3.65s/it] 88%|████████▊ | 935/1068 [56:28<08:05, 3.65s/it] 88%|████████▊ | 936/1068 [56:32<07:58, 3.63s/it] 88%|████████▊ | 937/1068 [56:35<07:53, 3.61s/it] 88%|████████▊ | 938/1068 [56:39<07:51, 3.63s/it] 88%|████████▊ | 939/1068 [56:43<07:48, 3.63s/it] 88%|████████▊ | 940/1068 [56:46<07:44, 3.63s/it] 88%|████████▊ | 940/1068 [56:46<07:44, 3.63s/it] 88%|████████▊ | 941/1068 [56:50<07:40, 3.62s/it] 88%|████████▊ | 942/1068 [56:54<07:40, 3.66s/it] 88%|████████▊ | 943/1068 [56:57<07:34, 3.64s/it] 88%|████████▊ | 944/1068 [57:01<07:30, 3.63s/it] 88%|████████▊ | 945/1068 [57:05<07:26, 3.63s/it] 88%|████████▊ | 945/1068 [57:05<07:26, 3.63s/it] 89%|████████▊ | 946/1068 [57:08<07:23, 3.63s/it] 89%|████████▊ | 947/1068 [57:12<07:17, 3.62s/it] 89%|████████▉ | 948/1068 [57:15<07:14, 3.62s/it] 89%|████████▉ | 949/1068 [57:19<07:13, 3.65s/it] 89%|████████▉ | 950/1068 [57:23<07:09, 3.64s/it] 89%|████████▉ | 950/1068 [57:23<07:09, 3.64s/it] 89%|████████▉ | 951/1068 [57:26<07:04, 3.63s/it] 89%|████████▉ | 952/1068 [57:30<07:01, 3.64s/it] 89%|████████▉ | 953/1068 [57:34<06:57, 3.63s/it] 89%|████████▉ | 954/1068 [57:37<06:53, 3.63s/it] 89%|████████▉ | 955/1068 [57:41<06:49, 3.62s/it] {'loss': '0.09019', 'grad_norm': '4.25', 'learning_rate': '6.284e-06', 'epoch': '0.4728'} {'loss': '0.08499', 'grad_norm': '8.062', 'learning_rate': '6.205e-06', 'epoch': '0.4775'} {'loss': '0.04704', 'grad_norm': '4.844', 'learning_rate': '6.126e-06', 'epoch': '0.4822'} {'loss': '0.05289', 'grad_norm': '0.7578', 'learning_rate': '6.047e-06', 'epoch': '0.4869'} {'loss': '0.1428', 'grad_norm': '15.19', 'learning_rate': '5.967e-06', 'epoch': '0.4916'} {'loss': '0.116', 'grad_norm': '9.812', 'learning_rate': '5.888e-06', 'epoch': '0.4963'} {'loss': '0.1481', 'grad_norm': '14.19', 'learning_rate': '5.808e-06', 'epoch': '0.5009'} {'loss': '0.1075', 'grad_norm': '1.023', 'learning_rate': '5.728e-06', 'epoch': '0.5056'} {'loss': '0.08008', 'grad_norm': '12.19', 'learning_rate': '5.647e-06', 'epoch': '0.5103'} {'loss': '0.0823', 'grad_norm': '2.938', 'learning_rate': '5.567e-06', 'epoch': '0.515'} {'loss': '0.02545', 'grad_norm': '8.938', 'learning_rate': '5.486e-06', 'epoch': '0.5197'} {'loss': '0.1552', 'grad_norm': '12.62', 'learning_rate': '5.405e-06', 'epoch': '0.5243'} {'loss': '0.005913', 'grad_norm': '3.625', 'learning_rate': '5.324e-06', 'epoch': '0.529'} {'loss': '0.005589', 'grad_norm': '1.055', 'learning_rate': '5.243e-06', 'epoch': '0.5337'} {'loss': '0.02253', 'grad_norm': '14.62', 'learning_rate': '5.162e-06', 'epoch': '0.5384'} {'loss': '0.02249', 'grad_norm': '1.602', 'learning_rate': '5.081e-06', 'epoch': '0.5431'} {'loss': '0.04539', 'grad_norm': '6.375', 'learning_rate': '5e-06', 'epoch': '0.5478'} {'loss': '0.09941', 'grad_norm': '22.62', 'learning_rate': '4.919e-06', 'epoch': '0.5524'} {'loss': '0.04504', 'grad_norm': '0.06641', 'learning_rate': '4.838e-06', 'epoch': '0.5571'} {'loss': '0.09638', 'grad_norm': '0.1543', 'learning_rate': '4.757e-06', 'epoch': '0.5618'} {'loss': '0.08984', 'grad_norm': '29', 'learning_rate': '4.676e-06', 'epoch': '0.5665'} {'loss': '0.04058', 'grad_norm': '9.375', 'learning_rate': '4.595e-06', 'epoch': '0.5712'} {'loss': '0.1009', 'grad_norm': '13.5', 'learning_rate': '4.514e-06', 'epoch': '0.5758'} {'loss': '0.02461', 'grad_norm': '10.25', 'learning_rate': '4.433e-06', 'epoch': '0.5805'} {'loss': '0.009086', 'grad_norm': '0.2559', 'learning_rate': '4.353e-06', 'epoch': '0.5852'} {'loss': '0.08497', 'grad_norm': '6.062', 'learning_rate': '4.272e-06', 'epoch': '0.5899'} {'loss': '0.04257', 'grad_norm': '23.88', 'learning_rate': '4.192e-06', 'epoch': '0.5946'} {'loss': '0.104', 'grad_norm': '26', 'learning_rate': '4.112e-06', 'epoch': '0.5993'} {'loss': '0.02363', 'grad_norm': '1.414', 'learning_rate': '4.033e-06', 'epoch': '0.6039'} {'loss': '0.09691', 'grad_norm': '4.281', 'learning_rate': '3.953e-06', 'epoch': '0.6086'} {'loss': '0.07459', 'grad_norm': '18.25', 'learning_rate': '3.874e-06', 'epoch': '0.6133'} {'loss': '0.129', 'grad_norm': '16.5', 'learning_rate': '3.795e-06', 'epoch': '0.618'} {'loss': '0.01345', 'grad_norm': '13.12', 'learning_rate': '3.716e-06', 'epoch': '0.6227'} {'loss': '0.1152', 'grad_norm': '13.88', 'learning_rate': '3.638e-06', 'epoch': '0.6273'} {'loss': '0.06459', 'grad_norm': '40.5', 'learning_rate': '3.56e-06', 'epoch': '0.632'} {'loss': '0.1248', 'grad_norm': '21.88', 'learning_rate': '3.483e-06', 'epoch': '0.6367'} {'loss': '0.09551', 'grad_norm': '10.88', 'learning_rate': '3.406e-06', 'epoch': '0.6414'} {'loss': '0.06345', 'grad_norm': '5.812', 'learning_rate': '3.329e-06', 'epoch': '0.6461'} {'loss': '0.01515', 'grad_norm': '0.2812', 'learning_rate': '3.253e-06', 'epoch': '0.6507'} {'loss': '0.1331', 'grad_norm': '14.62', 'learning_rate': '3.177e-06', 'epoch': '0.6554'} {'loss': '0.03629', 'grad_norm': '0.3281', 'learning_rate': '3.102e-06', 'epoch': '0.6601'} {'loss': '0.003291', 'grad_norm': '2.031', 'learning_rate': '3.027e-06', 'epoch': '0.6648'} {'loss': '0.05277', 'grad_norm': '9.938', 'learning_rate': '2.952e-06', 'epoch': '0.6695'} {'loss': '0.05755', 'grad_norm': '0.7969', 'learning_rate': '2.879e-06', 'epoch': '0.6742'} {'loss': '0.04519', 'grad_norm': '12.69', 'learning_rate': '2.806e-06', 'epoch': '0.6788'} {'loss': '0.1142', 'grad_norm': '0.05859', 'learning_rate': '2.733e-06', 'epoch': '0.6835'} {'loss': '0.06839', 'grad_norm': '14.56', 'learning_rate': '2.661e-06', 'epoch': '0.6882'} {'loss': '0.008566', 'grad_norm': '0.1885', 'learning_rate': '2.59e-06', 'epoch': '0.6929'} {'loss': '0.02529', 'grad_norm': '9.562', 'learning_rate': '2.519e-06', 'epoch': '0.6976'} {'loss': '0.1139', 'grad_norm': '19', 'learning_rate': '2.449e-06', 'epoch': '0.7022'} {'loss': '0.02908', 'grad_norm': '2.344', 'learning_rate': '2.379e-06', 'epoch': '0.7069'} {'loss': '0.1418', 'grad_norm': '0.7891', 'learning_rate': '2.31e-06', 'epoch': '0.7116'} {'loss': '0.01457', 'grad_norm': '17.5', 'learning_rate': '2.242e-06', 'epoch': '0.7163'} {'loss': '0.103', 'grad_norm': '0.1768', 'learning_rate': '2.175e-06', 'epoch': '0.721'} {'loss': '0.05994', 'grad_norm': '14.62', 'learning_rate': '2.109e-06', 'epoch': '0.7257'} {'loss': '0.1303', 'grad_norm': '11.62', 'learning_rate': '2.043e-06', 'epoch': '0.7303'} {'loss': '0.07348', 'grad_norm': '4.031', 'learning_rate': '1.978e-06', 'epoch': '0.735'} {'loss': '0.1164', 'grad_norm': '38.25', 'learning_rate': '1.913e-06', 'epoch': '0.7397'} {'loss': '0.1103', 'grad_norm': '36.5', 'learning_rate': '1.85e-06', 'epoch': '0.7444'} {'loss': '0.09385', 'grad_norm': '2.484', 'learning_rate': '1.787e-06', 'epoch': '0.7491'} {'loss': '0.0605', 'grad_norm': '9.625', 'learning_rate': '1.726e-06', 'epoch': '0.7537'} {'loss': '0.03029', 'grad_norm': '0.6328', 'learning_rate': '1.665e-06', 'epoch': '0.7584'} {'loss': '0.04979', 'grad_norm': '4.031', 'learning_rate': '1.605e-06', 'epoch': '0.7631'} {'loss': '0.09577', 'grad_norm': '19.62', 'learning_rate': '1.546e-06', 'epoch': '0.7678'} {'loss': '0.1388', 'grad_norm': '5.656', 'learning_rate': '1.487e-06', 'epoch': '0.7725'} {'loss': '0.07737', 'grad_norm': '1.055', 'learning_rate': '1.43e-06', 'epoch': '0.7772'} {'loss': '0.06386', 'grad_norm': '14.94', 'learning_rate': '1.374e-06', 'epoch': '0.7818'} {'loss': '0.005019', 'grad_norm': '0.2832', 'learning_rate': '1.318e-06', 'epoch': '0.7865'} {'loss': '0.09718', 'grad_norm': '22.38', 'learning_rate': '1.264e-06', 'epoch': '0.7912'} {'loss': '0.03934', 'grad_norm': '10.31', 'learning_rate': '1.211e-06', 'epoch': '0.7959'} {'loss': '0.05719', 'grad_norm': '4.031', 'learning_rate': '1.158e-06', 'epoch': '0.8006'} {'loss': '0.03712', 'grad_norm': '4.875', 'learning_rate': '1.107e-06', 'epoch': '0.8052'} {'loss': '0.07051', 'grad_norm': '11', 'learning_rate': '1.056e-06', 'epoch': '0.8099'} {'loss': '0.1173', 'grad_norm': '21.38', 'learning_rate': '1.007e-06', 'epoch': '0.8146'} {'loss': '0.05019', 'grad_norm': '20.88', 'learning_rate': '9.587e-07', 'epoch': '0.8193'} {'loss': '0.03209', 'grad_norm': '9.25', 'learning_rate': '9.115e-07', 'epoch': '0.824'} {'loss': '0.06944', 'grad_norm': '1.539', 'learning_rate': '8.653e-07', 'epoch': '0.8287'} {'loss': '0.03107', 'grad_norm': '0.8672', 'learning_rate': '8.203e-07', 'epoch': '0.8333'} {'loss': '0.09885', 'grad_norm': '1.734', 'learning_rate': '7.763e-07', 'epoch': '0.838'} {'loss': '0.03675', 'grad_norm': '0.2676', 'learning_rate': '7.334e-07', 'epoch': '0.8427'} {'loss': '0.03701', 'grad_norm': '0.5391', 'learning_rate': '6.917e-07', 'epoch': '0.8474'} {'loss': '0.1006', 'grad_norm': '1.102', 'learning_rate': '6.511e-07', 'epoch': '0.8521'} {'loss': '0.07337', 'grad_norm': '20.12', 'learning_rate': '6.116e-07', 'epoch': '0.8567'} {'loss': '0.02071', 'grad_norm': '23.25', 'learning_rate': '5.733e-07', 'epoch': '0.8614'} {'loss': '0.03934', 'grad_norm': '8', 'learning_rate': '5.362e-07', 'epoch': '0.8661'} {'loss': '0.06495', 'grad_norm': '1.406', 'learning_rate': '5.002e-07', 'epoch': '0.8708'} {'loss': '0.0436', 'grad_norm': '0.3027', 'learning_rate': '4.654e-07', 'epoch': '0.8755'} {'loss': '0.06408', 'grad_norm': '17.5', 'learning_rate': '4.318e-07', 'epoch': '0.8801'} {'loss': '0.1168', 'grad_norm': '10.69', 'learning_rate': '3.995e-07', 'epoch': '0.8848'} {'loss': '0.02691', 'grad_norm': '1.062', 'learning_rate': '3.683e-07', 'epoch': '0.8895'} {'loss': '0.08515', 'grad_norm': '28.75', 'learning_rate': '3.383e-07', 'epoch': '0.8942'} 89%|████████▉ | 955/1068 [57:41<06:49, 3.62s/it] 90%|████████▉ | 956/1068 [57:45<06:48, 3.65s/it] 90%|████████▉ | 957/1068 [57:48<06:43, 3.63s/it] 90%|████████▉ | 958/1068 [57:52<06:39, 3.63s/it] 90%|████████▉ | 959/1068 [57:55<06:36, 3.64s/it] 90%|████████▉ | 960/1068 [57:59<06:30, 3.61s/it] 90%|████████▉ | 960/1068 [57:59<06:30, 3.61s/it] 90%|████████▉ | 961/1068 [58:03<06:24, 3.59s/it] 90%|█████████ | 962/1068 [58:06<06:20, 3.59s/it] 90%|█████████ | 963/1068 [58:10<06:16, 3.59s/it] 90%|█████████ | 964/1068 [58:13<06:11, 3.57s/it] 90%|█████████ | 965/1068 [58:17<06:06, 3.56s/it] 90%|█████████ | 965/1068 [58:17<06:06, 3.56s/it] 90%|█████████ | 966/1068 [58:20<06:04, 3.57s/it] 91%|█████████ | 967/1068 [58:24<06:01, 3.57s/it] 91%|█████████ | 968/1068 [58:28<05:56, 3.57s/it] 91%|█████████ | 969/1068 [58:31<05:52, 3.56s/it] 91%|█████████ | 970/1068 [58:35<05:49, 3.57s/it] 91%|█████████ | 970/1068 [58:35<05:49, 3.57s/it] 91%|█████████ | 971/1068 [58:38<05:45, 3.56s/it] 91%|█████████ | 972/1068 [58:42<05:41, 3.56s/it] 91%|█████████ | 973/1068 [58:45<05:37, 3.55s/it] 91%|█████████ | 974/1068 [58:49<05:33, 3.55s/it] 91%|█████████▏| 975/1068 [58:52<05:31, 3.57s/it] 91%|█████████▏| 975/1068 [58:52<05:31, 3.57s/it] 91%|█████████▏| 976/1068 [58:56<05:27, 3.56s/it] 91%|█████████▏| 977/1068 [59:00<05:24, 3.56s/it] 92%|█████████▏| 978/1068 [59:03<05:19, 3.55s/it] 92%|█████████▏| 979/1068 [59:07<05:15, 3.54s/it] 92%|█████████▏| 980/1068 [59:10<05:12, 3.55s/it] 92%|█████████▏| 980/1068 [59:10<05:12, 3.55s/it] 92%|█████████▏| 981/1068 [59:14<05:09, 3.56s/it] 92%|█████████▏| 982/1068 [59:17<05:05, 3.55s/it] 92%|█████████▏| 983/1068 [59:21<05:02, 3.56s/it] 92%|█████████▏| 984/1068 [59:25<05:01, 3.59s/it] 92%|█████████▏| 985/1068 [59:28<04:57, 3.58s/it] 92%|█████████▏| 985/1068 [59:28<04:57, 3.58s/it] 92%|█████████▏| 986/1068 [59:32<04:53, 3.58s/it] 92%|█████████▏| 987/1068 [59:35<04:50, 3.59s/it] 93%|█████████▎| 988/1068 [59:39<04:47, 3.59s/it] 93%|█████████▎| 989/1068 [59:42<04:42, 3.58s/it] 93%|█████████▎| 990/1068 [59:46<04:38, 3.57s/it] 93%|█████████▎| 990/1068 [59:46<04:38, 3.57s/it] 93%|█████████▎| 991/1068 [59:50<04:36, 3.59s/it] 93%|█████████▎| 992/1068 [59:53<04:32, 3.58s/it] 93%|█████████▎| 993/1068 [59:57<04:27, 3.57s/it] 93%|█████████▎| 994/1068 [1:00:00<04:24, 3.58s/it] 93%|█████████▎| 995/1068 [1:00:04<04:20, 3.57s/it] 93%|█████████▎| 995/1068 [1:00:04<04:20, 3.57s/it] 93%|█████████▎| 996/1068 [1:00:07<04:15, 3.55s/it] 93%|█████████▎| 997/1068 [1:00:11<04:11, 3.54s/it] 93%|█████████▎| 998/1068 [1:00:14<04:07, 3.54s/it] 94%|█████████▎| 999/1068 [1:00:18<04:04, 3.54s/it] 94%|█████████▎| 1000/1068 [1:00:21<04:00, 3.54s/it] 94%|█████████▎| 1000/1068 [1:00:21<04:00, 3.54s/it][INFO|trainer.py:3838] 2026-06-09 02:25:13,926 >> Saving model checkpoint to /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000 [INFO|configuration_utils.py:545] 2026-06-09 02:25:13,940 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/config.json [INFO|configuration_utils.py:874] 2026-06-09 02:25:13,940 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/generation_config.json {'loss': '0.05582', 'grad_norm': '8.375', 'learning_rate': '3.096e-07', 'epoch': '0.8989'} {'loss': '0.006413', 'grad_norm': '5.688', 'learning_rate': '2.821e-07', 'epoch': '0.9036'} {'loss': '0.02513', 'grad_norm': '9.125', 'learning_rate': '2.559e-07', 'epoch': '0.9082'} {'loss': '0.08036', 'grad_norm': '0.1348', 'learning_rate': '2.309e-07', 'epoch': '0.9129'} {'loss': '0.05922', 'grad_norm': '36.5', 'learning_rate': '2.071e-07', 'epoch': '0.9176'} {'loss': '0.06302', 'grad_norm': '22.62', 'learning_rate': '1.847e-07', 'epoch': '0.9223'} {'loss': '0.08195', 'grad_norm': '5.281', 'learning_rate': '1.634e-07', 'epoch': '0.927'} {'loss': '0.05552', 'grad_norm': '8.562', 'learning_rate': '1.435e-07', 'epoch': '0.9316'} {'loss': '0.0576', 'grad_norm': '11.5', 'learning_rate': '1.248e-07', 'epoch': '0.9363'} Writing model shards: 0%| | 0/1 [00:00> Model weights saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/model.safetensors [INFO|tokenization_utils_base.py:3302] 2026-06-09 02:25:45,867 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 02:25:45,867 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:3302] 2026-06-09 02:26:20,189 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 02:26:20,190 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/tokenizer_config.json [INFO|processing_utils.py:1141] 2026-06-09 02:26:20,386 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/chat_template.jinja [INFO|processing_utils.py:1162] 2026-06-09 02:26:20,387 >> processor saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1000/processor_config.json /usr/local/lib/python3.12/dist-packages/torch/utils/checkpoint.py:232: UserWarning: None of the inputs have requires_grad=True. Gradients will be None check_backward_validity(args) 94%|█████████▎| 1001/1068 [1:01:32<26:14, 23.50s/it] 94%|█████████▍| 1002/1068 [1:01:35<19:15, 17.50s/it] 94%|█████████▍| 1003/1068 [1:01:39<14:24, 13.31s/it] 94%|█████████▍| 1004/1068 [1:01:42<11:04, 10.38s/it] 94%|█████████▍| 1005/1068 [1:01:46<08:44, 8.32s/it] 94%|█████████▍| 1005/1068 [1:01:46<08:44, 8.32s/it] 94%|█████████▍| 1006/1068 [1:01:49<07:06, 6.88s/it] 94%|█████████▍| 1007/1068 [1:01:53<05:58, 5.88s/it] 94%|█████████▍| 1008/1068 [1:01:56<05:10, 5.18s/it] 94%|█████████▍| 1009/1068 [1:02:00<04:36, 4.69s/it] 95%|█████████▍| 1010/1068 [1:02:03<04:11, 4.33s/it] 95%|█████████▍| 1010/1068 [1:02:03<04:11, 4.33s/it] 95%|█████████▍| 1011/1068 [1:02:07<03:53, 4.09s/it] 95%|█████████▍| 1012/1068 [1:02:10<03:39, 3.92s/it] 95%|█████████▍| 1013/1068 [1:02:14<03:29, 3.80s/it] 95%|█████████▍| 1014/1068 [1:02:17<03:20, 3.72s/it] 95%|█████████▌| 1015/1068 [1:02:21<03:14, 3.67s/it] 95%|█████████▌| 1015/1068 [1:02:21<03:14, 3.67s/it] 95%|█████████▌| 1016/1068 [1:02:24<03:08, 3.63s/it] 95%|█████████▌| 1017/1068 [1:02:28<03:03, 3.61s/it] 95%|█████████▌| 1018/1068 [1:02:32<02:59, 3.59s/it] 95%|█████████▌| 1019/1068 [1:02:35<02:55, 3.58s/it] 96%|█████████▌| 1020/1068 [1:02:39<02:50, 3.56s/it] 96%|█████████▌| 1020/1068 [1:02:39<02:50, 3.56s/it] 96%|█████████▌| 1021/1068 [1:02:42<02:47, 3.55s/it] 96%|█████████▌| 1022/1068 [1:02:46<02:43, 3.55s/it] 96%|█████████▌| 1023/1068 [1:02:49<02:39, 3.54s/it] 96%|█████████▌| 1024/1068 [1:02:53<02:35, 3.53s/it] 96%|█████████▌| 1025/1068 [1:02:56<02:32, 3.54s/it] 96%|█████████▌| 1025/1068 [1:02:56<02:32, 3.54s/it] 96%|█████████▌| 1026/1068 [1:03:00<02:28, 3.54s/it] 96%|█████████▌| 1027/1068 [1:03:03<02:24, 3.53s/it] 96%|█████████▋| 1028/1068 [1:03:07<02:21, 3.53s/it] 96%|█████████▋| 1029/1068 [1:03:10<02:17, 3.54s/it] 96%|█████████▋| 1030/1068 [1:03:14<02:14, 3.53s/it] 96%|█████████▋| 1030/1068 [1:03:14<02:14, 3.53s/it] 97%|█████████▋| 1031/1068 [1:03:17<02:10, 3.53s/it] 97%|█████████▋| 1032/1068 [1:03:21<02:07, 3.53s/it] 97%|█████████▋| 1033/1068 [1:03:25<02:03, 3.53s/it] 97%|█████████▋| 1034/1068 [1:03:28<01:59, 3.53s/it] 97%|█████████▋| 1035/1068 [1:03:32<01:56, 3.53s/it] 97%|█████████▋| 1035/1068 [1:03:32<01:56, 3.53s/it] 97%|█████████▋| 1036/1068 [1:03:35<01:53, 3.53s/it] 97%|█████████▋| 1037/1068 [1:03:39<01:49, 3.52s/it] 97%|█████████▋| 1038/1068 [1:03:42<01:45, 3.52s/it] 97%|█████████▋| 1039/1068 [1:03:46<01:42, 3.52s/it] 97%|█████████▋| 1040/1068 [1:03:49<01:38, 3.53s/it] 97%|█████████▋| 1040/1068 [1:03:49<01:38, 3.53s/it] 97%|█████████▋| 1041/1068 [1:03:53<01:35, 3.53s/it] 98%|█████████▊| 1042/1068 [1:03:56<01:31, 3.52s/it] 98%|█████████▊| 1043/1068 [1:04:00<01:28, 3.53s/it] 98%|█████████▊| 1044/1068 [1:04:03<01:24, 3.52s/it] 98%|█████████▊| 1045/1068 [1:04:07<01:20, 3.52s/it] 98%|█████████▊| 1045/1068 [1:04:07<01:20, 3.52s/it] 98%|█████████▊| 1046/1068 [1:04:10<01:17, 3.52s/it] 98%|█████████▊| 1047/1068 [1:04:14<01:14, 3.53s/it] 98%|█████████▊| 1048/1068 [1:04:17<01:10, 3.53s/it] 98%|█████████▊| 1049/1068 [1:04:21<01:07, 3.54s/it] 98%|█████████▊| 1050/1068 [1:04:25<01:03, 3.54s/it] 98%|█████████▊| 1050/1068 [1:04:25<01:03, 3.54s/it] 98%|█████████▊| 1051/1068 [1:04:28<01:00, 3.53s/it] 99%|█████████▊| 1052/1068 [1:04:32<00:56, 3.53s/it] 99%|█████████▊| 1053/1068 [1:04:35<00:52, 3.52s/it] 99%|█████████▊| 1054/1068 [1:04:39<00:49, 3.52s/it] 99%|█████████▉| 1055/1068 [1:04:42<00:45, 3.52s/it] 99%|█████████▉| 1055/1068 [1:04:42<00:45, 3.52s/it] 99%|█████████▉| 1056/1068 [1:04:46<00:42, 3.52s/it] 99%|█████████▉| 1057/1068 [1:04:49<00:38, 3.54s/it] 99%|█████████▉| 1058/1068 [1:04:53<00:35, 3.56s/it] 99%|█████████▉| 1059/1068 [1:04:56<00:31, 3.55s/it] 99%|█████████▉| 1060/1068 [1:05:00<00:28, 3.54s/it] 99%|█████████▉| 1060/1068 [1:05:00<00:28, 3.54s/it] 99%|█████████▉| 1061/1068 [1:05:03<00:24, 3.56s/it] 99%|█████████▉| 1062/1068 [1:05:07<00:21, 3.54s/it] 100%|█████████▉| 1063/1068 [1:05:10<00:17, 3.54s/it] 100%|█████████▉| 1064/1068 [1:05:14<00:14, 3.55s/it] 100%|█████████▉| 1065/1068 [1:05:18<00:10, 3.55s/it] 100%|█████████▉| 1065/1068 [1:05:18<00:10, 3.55s/it] 100%|█████████▉| 1066/1068 [1:05:21<00:07, 3.55s/it] 100%|█████████▉| 1067/1068 [1:05:25<00:03, 3.54s/it] 100%|██████████| 1068/1068 [1:05:28<00:00, 3.56s/it][INFO|trainer.py:3838] 2026-06-09 02:30:20,770 >> Saving model checkpoint to /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068 [INFO|configuration_utils.py:545] 2026-06-09 02:30:20,783 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/config.json [INFO|configuration_utils.py:874] 2026-06-09 02:30:20,783 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/generation_config.json {'loss': '0.008602', 'grad_norm': '0.4805', 'learning_rate': '1.075e-07', 'epoch': '0.941'} {'loss': '0.03031', 'grad_norm': '11.5', 'learning_rate': '9.138e-08', 'epoch': '0.9457'} {'loss': '0.04413', 'grad_norm': '6', 'learning_rate': '7.659e-08', 'epoch': '0.9504'} {'loss': '0.04109', 'grad_norm': '0.3848', 'learning_rate': '6.309e-08', 'epoch': '0.9551'} {'loss': '0.04798', 'grad_norm': '12.5', 'learning_rate': '5.089e-08', 'epoch': '0.9597'} {'loss': '0.03563', 'grad_norm': '0.4102', 'learning_rate': '4e-08', 'epoch': '0.9644'} {'loss': '0.1642', 'grad_norm': '11.19', 'learning_rate': '3.041e-08', 'epoch': '0.9691'} {'loss': '0.09519', 'grad_norm': '14.19', 'learning_rate': '2.213e-08', 'epoch': '0.9738'} {'loss': '0.1083', 'grad_norm': '0.1064', 'learning_rate': '1.516e-08', 'epoch': '0.9785'} {'loss': '0.02581', 'grad_norm': '4.844', 'learning_rate': '9.503e-09', 'epoch': '0.9831'} {'loss': '0.032', 'grad_norm': '36.5', 'learning_rate': '5.16e-09', 'epoch': '0.9878'} {'loss': '0.06172', 'grad_norm': '8.562', 'learning_rate': '2.133e-09', 'epoch': '0.9925'} {'loss': '0.00999', 'grad_norm': '1.289', 'learning_rate': '4.213e-10', 'epoch': '0.9972'} Writing model shards: 0%| | 0/1 [00:00> Model weights saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/model.safetensors [INFO|tokenization_utils_base.py:3302] 2026-06-09 02:30:52,512 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 02:30:52,513 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/tokenizer_config.json [INFO|tokenization_utils_base.py:3302] 2026-06-09 02:31:30,651 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 02:31:30,651 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/tokenizer_config.json [INFO|processing_utils.py:1141] 2026-06-09 02:31:30,854 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/chat_template.jinja [INFO|processing_utils.py:1162] 2026-06-09 02:31:30,855 >> processor saved in /content/PA-SFT_output/Qwen3_5_4B_si/checkpoint-1068/processor_config.json [INFO|trainer.py:1829] 2026-06-09 02:31:30,855 >> Training completed. Do not forget to share your model on huggingface.co/models =) 100%|██████████| 1068/1068 [1:06:38<00:00, 3.56s/it] 100%|██████████| 1068/1068 [1:06:38<00:00, 3.74s/it] [INFO|tokenization_utils_base.py:3302] 2026-06-09 02:31:30,864 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 02:31:30,865 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/tokenizer_config.json [INFO|processing_utils.py:1141] 2026-06-09 02:31:31,061 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/chat_template.jinja [INFO|processing_utils.py:1162] 2026-06-09 02:31:31,062 >> processor saved in /content/PA-SFT_output/Qwen3_5_4B_si/processor_config.json [INFO|trainer.py:3838] 2026-06-09 02:31:31,062 >> Saving model checkpoint to /content/PA-SFT_output/Qwen3_5_4B_si [INFO|configuration_utils.py:545] 2026-06-09 02:31:31,076 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/config.json [INFO|configuration_utils.py:874] 2026-06-09 02:31:31,076 >> Configuration saved in /content/PA-SFT_output/Qwen3_5_4B_si/generation_config.json {'train_runtime': '3999', 'train_samples_per_second': '0.534', 'train_steps_per_second': '0.267', 'train_loss': '0.1214', 'epoch': '1'} Writing model shards: 0%| | 0/1 [00:00> Model weights saved in /content/PA-SFT_output/Qwen3_5_4B_si/model.safetensors [INFO|tokenization_utils_base.py:3302] 2026-06-09 02:32:02,845 >> chat template saved in /content/PA-SFT_output/Qwen3_5_4B_si/chat_template.jinja [INFO|tokenization_utils_base.py:2115] 2026-06-09 02:32:02,845 >> tokenizer config file saved in /content/PA-SFT_output/Qwen3_5_4B_si/tokenizer_config.json ***** train metrics ***** epoch = 1.0 total_flos = 18854816GF train_loss = 0.1214 train_runtime = 1:06:38.89 train_samples_per_second = 0.534 train_steps_per_second = 0.267 Figure saved at: /content/PA-SFT_output/Qwen3_5_4B_si/training_loss.png [WARNING|2026-06-09 02:32:03] llamafactory.extras.ploting:162 >> No metric eval_loss to plot. [WARNING|2026-06-09 02:32:03] llamafactory.extras.ploting:162 >> No metric eval_accuracy to plot. [INFO|modelcard.py:264] 2026-06-09 02:32:03,251 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}