--- license: cc-by-nc-4.0 base_model: - prajjwal1/bert-tiny --- This is a 'tiny' masked language model fine-tuned on synthetic oncology clinical text from prajjwal1/bert-tiny as a preparatory step to training TinyBertOncoTagger. Training data: https://huggingface.co/datasets/ksg-dfci/mmai-synthetic/blob/main/all_synthetic_notes.parquet Training script: https://github.com/kenlkehl/matchminer-ai-training/blob/main/3b_train_tiny_oncbert.py Training script call: accelerate launch 3b_train_tiny_oncbert.py \ --data trial_space_lineitems.csv:trial_text \ trial_space_lineitems.csv:this_space \ trial_space_lineitems.csv:trial_boilerplate_text \ all_synthetic_notes.parquet:synthetic_note \ --output_dir ./onc_bert_tiny \ --per_device_train_batch_size 64