Model save

3816c55 verified 4 months ago

4.87 kB

library_name: transformers
license: apache-2.0
base_model: facebook/bart-large
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: bart-large-wiki-doc-full
    results: []

bart-large-wiki-doc-full

This model is a fine-tuned version of facebook/bart-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.9843
Sari: 51.0933
Sari Add: 12.516
Sari Keep: 45.354
Sari Del: 95.4099
Fkgl: 6.642
Bleu: 19.5688
D Sari: 0.4555
D Sari Keep: 0.3857
D Sari Del: 0.8046
D Sari Add: 0.1763

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.06
num_epochs: 15
label_smoothing_factor: 0.3

Training results

Training Loss	Epoch	Step	Validation Loss	Sari	Sari Add	Sari Keep	Sari Del	Fkgl	Bleu	D Sari	D Sari Keep	D Sari Del	D Sari Add
5.6237	0.2862	500	5.1875	47.043	5.9566	40.0258	95.1466	7.2471	12.9924	0.4057	0.3169	0.7965	0.1038
5.0682	0.5725	1000	5.1042	48.192	9.0538	40.5192	95.0031	7.2139	15.7017	0.4288	0.3477	0.8055	0.1332
5.0363	0.8587	1500	5.0920	48.4536	9.2349	41.6306	94.4954	7.6174	20.3138	0.4245	0.3505	0.7871	0.1358
4.9284	1.1448	2000	5.0709	49.1161	9.9088	42.5012	94.9385	7.3117	18.5046	0.4339	0.3599	0.7961	0.1455
4.8644	1.4311	2500	5.0343	49.604	10.9617	42.8337	95.0164	7.1277	18.8397	0.4403	0.3681	0.7991	0.1536
4.8532	1.7173	3000	5.0192	49.3028	10.3908	42.1003	95.4172	6.9579	14.4955	0.4389	0.3588	0.8116	0.1462
4.8343	2.0034	3500	5.0058	49.2487	11.0341	42.1651	94.5467	7.2777	21.4311	0.4401	0.3693	0.7926	0.1583
4.719	2.2897	4000	5.0043	49.8632	11.2301	42.9018	95.4578	6.5783	15.364	0.4509	0.3735	0.8138	0.1654
4.6898	2.5759	4500	4.9904	49.9624	11.3392	43.3503	95.1978	6.9214	18.1433	0.449	0.3764	0.8068	0.1637
4.7061	2.8622	5000	4.9840	49.9367	11.8606	42.6481	95.3013	6.9934	16.9996	0.4516	0.3748	0.8158	0.1643
4.6392	3.1483	5500	4.9898	50.5502	11.928	44.3611	95.3614	6.9017	18.7169	0.4523	0.3813	0.8091	0.1665
4.578	3.4345	6000	4.9893	50.39	12.0634	43.6891	95.4175	6.8384	17.0137	0.4567	0.3837	0.8154	0.171
4.5981	3.7208	6500	4.9843	51.0933	12.516	45.354	95.4099	6.642	19.5688	0.4555	0.3857	0.8046	0.1763
4.5826	4.0069	7000	4.9818	50.788	12.8273	44.4249	95.1118	6.6555	21.2107	0.4594	0.3942	0.8049	0.1791
4.4927	4.2931	7500	5.0009	50.4098	12.2269	43.9055	95.0968	6.754	20.3637	0.4602	0.3948	0.8096	0.1763
4.5009	4.5794	8000	4.9908	50.7179	12.8769	43.783	95.4938	6.5702	17.565	0.4601	0.3863	0.8152	0.1787
4.5128	4.8656	8500	4.9830	50.7341	12.5935	44.2741	95.3346	6.7056	18.9024	0.4588	0.3905	0.8076	0.1783
4.457	5.1517	9000	5.0095	50.8947	12.7442	44.829	95.111	6.8278	21.5163	0.462	0.3987	0.8062	0.181

Framework versions

Transformers 4.57.3
Pytorch 2.9.1+cu128
Datasets 3.6.0
Tokenizers 0.22.1