2024-08-30 21:54:12.390238: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-08-30 21:54:12.408272: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-08-30 21:54:12.429605: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-08-30 21:54:12.436048: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-08-30 21:54:12.451309: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-08-30 21:54:13.743493: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1494: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( 08/30/2024 21:54:15 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 08/30/2024 21:54:15 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=epoch, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/content/dissertation/scripts/ner/output/tb, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=f1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=10.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/content/dissertation/scripts/ner/output, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=/content/dissertation/scripts/ner/output, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=500, save_strategy=epoch, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=None, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) Downloading builder script: 0%| | 0.00/3.61k [00:00> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/config.json [INFO|configuration_utils.py:800] 2024-08-30 21:54:27,966 >> Model config BertConfig { "_name_or_path": "IVN-RIN/bioBIT", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "ner", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "O", "1": "B-FARMACO", "2": "I-FARMACO" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "B-FARMACO": 1, "I-FARMACO": 2, "O": 0 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "torch_dtype": "float32", "transformers_version": "4.42.4", "type_vocab_size": 2, "use_cache": true, "vocab_size": 31102 } [INFO|tokenization_utils_base.py:2161] 2024-08-30 21:54:29,333 >> loading file vocab.txt from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/vocab.txt [INFO|tokenization_utils_base.py:2161] 2024-08-30 21:54:29,334 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/tokenizer.json [INFO|tokenization_utils_base.py:2161] 2024-08-30 21:54:29,334 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2161] 2024-08-30 21:54:29,334 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/special_tokens_map.json [INFO|tokenization_utils_base.py:2161] 2024-08-30 21:54:29,334 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/tokenizer_config.json [INFO|modeling_utils.py:3556] 2024-08-30 21:54:40,888 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/model.safetensors [INFO|modeling_utils.py:4354] 2024-08-30 21:54:40,995 >> Some weights of the model checkpoint at IVN-RIN/bioBIT were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:4366] 2024-08-30 21:54:40,995 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at IVN-RIN/bioBIT and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Map: 0%| | 0/27198 [00:00> The following columns in the training set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, tokens, ner_tags. If id, tokens, ner_tags are not expected by `BertForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2128] 2024-08-30 21:54:48,041 >> ***** Running training ***** [INFO|trainer.py:2129] 2024-08-30 21:54:48,041 >> Num examples = 27,198 [INFO|trainer.py:2130] 2024-08-30 21:54:48,041 >> Num Epochs = 10 [INFO|trainer.py:2131] 2024-08-30 21:54:48,041 >> Instantaneous batch size per device = 32 [INFO|trainer.py:2134] 2024-08-30 21:54:48,041 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:2135] 2024-08-30 21:54:48,041 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2136] 2024-08-30 21:54:48,041 >> Total optimization steps = 4,250 [INFO|trainer.py:2137] 2024-08-30 21:54:48,042 >> Number of trainable parameters = 109,339,395 0%| | 0/4250 [00:00> The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, tokens, ner_tags. If id, tokens, ner_tags are not expected by `BertForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:3788] 2024-08-30 21:56:36,658 >> ***** Running Evaluation ***** [INFO|trainer.py:3790] 2024-08-30 21:56:36,658 >> Num examples = 6798 [INFO|trainer.py:3793] 2024-08-30 21:56:36,658 >> Batch size = 8 0%| | 0/850 [00:00> Saving model checkpoint to /content/dissertation/scripts/ner/output/checkpoint-425 [INFO|configuration_utils.py:472] 2024-08-30 21:56:50,914 >> Configuration saved in /content/dissertation/scripts/ner/output/checkpoint-425/config.json [INFO|modeling_utils.py:2690] 2024-08-30 21:56:52,125 >> Model weights saved in /content/dissertation/scripts/ner/output/checkpoint-425/model.safetensors [INFO|tokenization_utils_base.py:2574] 2024-08-30 21:56:52,126 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/checkpoint-425/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-08-30 21:56:52,126 >> Special tokens file saved in /content/dissertation/scripts/ner/output/checkpoint-425/special_tokens_map.json [INFO|tokenization_utils_base.py:2574] 2024-08-30 21:56:54,071 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-08-30 21:56:54,071 >> Special tokens file saved in /content/dissertation/scripts/ner/output/special_tokens_map.json 10%|█ | 426/4250 [02:06<5:49:07, 5.48s/it] 10%|█ | 427/4250 [02:06<4:08:55, 3.91s/it] 10%|█ | 428/4250 [02:06<2:58:35, 2.80s/it] 10%|█ | 429/4250 [02:07<2:09:47, 2.04s/it] 10%|█ | 430/4250 [02:07<1:34:28, 1.48s/it] 10%|█ | 431/4250 [02:07<1:10:13, 1.10s/it] 10%|█ | 432/4250 [02:07<52:57, 1.20it/s] 10%|█ | 433/4250 [02:08<44:00, 1.45it/s] 10%|█ | 434/4250 [02:08<35:08, 1.81it/s] 10%|█ | 435/4250 [02:08<29:02, 2.19it/s] 10%|█ | 436/4250 [02:08<24:36, 2.58it/s] 10%|█ | 437/4250 [02:08<21:08, 3.01it/s] 10%|█ | 438/4250 [02:09<18:48, 3.38it/s] 10%|█ | 439/4250 [02:09<17:50, 3.56it/s] 10%|█ | 440/4250 [02:09<16:07, 3.94it/s] 10%|█ | 441/4250 [02:09<15:15, 4.16it/s] 10%|█ | 442/4250 [02:10<15:13, 4.17it/s] 10%|█ | 443/4250 [02:10<13:33, 4.68it/s] 10%|█ | 444/4250 [02:10<13:28, 4.71it/s] 10%|█ | 445/4250 [02:10<14:04, 4.50it/s] 10%|█ | 446/4250 [02:10<15:40, 4.04it/s] 11%|█ | 447/4250 [02:11<15:52, 3.99it/s] 11%|█ | 448/4250 [02:11<14:48, 4.28it/s] 11%|█ | 449/4250 [02:11<16:28, 3.85it/s] 11%|█ | 450/4250 [02:11<15:36, 4.06it/s] 11%|█ | 451/4250 [02:12<16:05, 3.93it/s] 11%|█ | 452/4250 [02:12<14:50, 4.27it/s] 11%|█ | 453/4250 [02:12<16:32, 3.83it/s] 11%|█ | 454/4250 [02:12<15:02, 4.20it/s] 11%|█ | 455/4250 [02:13<14:32, 4.35it/s] 11%|█ | 456/4250 [02:13<14:43, 4.29it/s] 11%|█ | 457/4250 [02:13<14:25, 4.38it/s] 11%|█ | 458/4250 [02:13<14:56, 4.23it/s] 11%|█ | 459/4250 [02:14<15:10, 4.16it/s] 11%|█ | 460/4250 [02:14<15:13, 4.15it/s] 11%|█ | 461/4250 [02:14<14:27, 4.37it/s] 11%|█ | 462/4250 [02:14<14:50, 4.25it/s] 11%|█ | 463/4250 [02:14<14:23, 4.38it/s] 11%|█ | 464/4250 [02:15<15:27, 4.08it/s] 11%|█ | 465/4250 [02:15<15:11, 4.15it/s] 11%|█ | 466/4250 [02:15<13:40, 4.61it/s] 11%|█ | 467/4250 [02:15<14:22, 4.39it/s] 11%|█ | 468/4250 [02:16<15:43, 4.01it/s] 11%|█ | 469/4250 [02:16<15:14, 4.14it/s] 11%|█ | 470/4250 [02:16<15:34, 4.05it/s] 11%|█ | 471/4250 [02:16<15:31, 4.06it/s] 11%|█ | 472/4250 [02:17<16:26, 3.83it/s] 11%|█ | 473/4250 [02:17<19:00, 3.31it/s] 11%|█ | 474/4250 [02:17<18:14, 3.45it/s] 11%|█ | 475/4250 [02:18<20:13, 3.11it/s] 11%|█ | 476/4250 [02:18<18:00, 3.49it/s] 11%|█ | 477/4250 [02:18<16:28, 3.82it/s] 11%|█ | 478/4250 [02:18<17:09, 3.66it/s] 11%|█▏ | 479/4250 [02:19<15:33, 4.04it/s] 11%|█▏ | 480/4250 [02:19<14:39, 4.29it/s] 11%|█▏ | 481/4250 [02:19<16:24, 3.83it/s] 11%|█▏ | 482/4250 [02:19<16:23, 3.83it/s] 11%|█▏ | 483/4250 [02:20<16:34, 3.79it/s] 11%|█▏ | 484/4250 [02:20<16:50, 3.73it/s] 11%|█▏ | 485/4250 [02:20<16:14, 3.86it/s] 11%|█▏ | 486/4250 [02:20<14:51, 4.22it/s] 11%|█▏ | 487/4250 [02:21<14:57, 4.19it/s] 11%|█▏ | 488/4250 [02:21<15:03, 4.16it/s]