jbhargav commited on
Commit
7ea4aed
·
verified ·
1 Parent(s): c86f267

Model save

Browse files
Files changed (4) hide show
  1. README.md +97 -0
  2. model.safetensors +1 -1
  3. tokenizer.json +1007 -0
  4. tokenizer_config.json +87 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: ai4bharat/IndicBART
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: gujarati-indicbart-5000
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # gujarati-indicbart-5000
15
+
16
+ This model is a fine-tuned version of [ai4bharat/IndicBART](https://huggingface.co/ai4bharat/IndicBART) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: nan
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 3e-05
38
+ - train_batch_size: 8
39
+ - eval_batch_size: 8
40
+ - seed: 42
41
+ - gradient_accumulation_steps: 2
42
+ - total_train_batch_size: 16
43
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
44
+ - lr_scheduler_type: linear
45
+ - lr_scheduler_warmup_steps: 500
46
+ - num_epochs: 15
47
+ - mixed_precision_training: Native AMP
48
+
49
+ ### Training results
50
+
51
+ | Training Loss | Epoch | Step | Validation Loss |
52
+ |:-------------:|:-----:|:----:|:---------------:|
53
+ | 26.3763 | 0.4 | 100 | 4.0052 |
54
+ | 22.6829 | 0.8 | 200 | nan |
55
+ | 0.0 | 1.2 | 300 | nan |
56
+ | 0.0 | 1.6 | 400 | nan |
57
+ | 0.0 | 2.0 | 500 | nan |
58
+ | 0.0 | 2.4 | 600 | nan |
59
+ | 0.0 | 2.8 | 700 | nan |
60
+ | 0.0 | 3.2 | 800 | nan |
61
+ | 0.0 | 3.6 | 900 | nan |
62
+ | 0.0 | 4.0 | 1000 | nan |
63
+ | 0.0 | 4.4 | 1100 | nan |
64
+ | 0.0 | 4.8 | 1200 | nan |
65
+ | 0.0 | 5.2 | 1300 | nan |
66
+ | 0.0 | 5.6 | 1400 | nan |
67
+ | 0.0 | 6.0 | 1500 | nan |
68
+ | 0.0 | 6.4 | 1600 | nan |
69
+ | 0.0 | 6.8 | 1700 | nan |
70
+ | 0.0 | 7.2 | 1800 | nan |
71
+ | 0.0 | 7.6 | 1900 | nan |
72
+ | 0.0 | 8.0 | 2000 | nan |
73
+ | 0.0 | 8.4 | 2100 | nan |
74
+ | 0.0 | 8.8 | 2200 | nan |
75
+ | 0.0 | 9.2 | 2300 | nan |
76
+ | 0.0 | 9.6 | 2400 | nan |
77
+ | 0.0 | 10.0 | 2500 | nan |
78
+ | 0.0 | 10.4 | 2600 | nan |
79
+ | 0.0 | 10.8 | 2700 | nan |
80
+ | 0.0 | 11.2 | 2800 | nan |
81
+ | 0.0 | 11.6 | 2900 | nan |
82
+ | 0.0 | 12.0 | 3000 | nan |
83
+ | 0.0 | 12.4 | 3100 | nan |
84
+ | 0.0 | 12.8 | 3200 | nan |
85
+ | 0.0 | 13.2 | 3300 | nan |
86
+ | 0.0 | 13.6 | 3400 | nan |
87
+ | 0.0 | 14.0 | 3500 | nan |
88
+ | 0.0 | 14.4 | 3600 | nan |
89
+ | 0.0 | 14.8 | 3700 | nan |
90
+
91
+
92
+ ### Framework versions
93
+
94
+ - Transformers 5.0.0
95
+ - Pytorch 2.9.0+cu128
96
+ - Datasets 4.0.0
97
+ - Tokenizers 0.22.2
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bd881e5e8b789fe7b8e0ebdbbbad23db0663748574e7af558f7acff69bd2f580
3
  size 1762959976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9215b8df889b89b4896acf3659dc9fda3f35b4c65e6cdad8567bf4197fa36d5
3
  size 1762959976
tokenizer.json ADDED
@@ -0,0 +1,1007 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 256,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": {
11
+ "Fixed": 256
12
+ },
13
+ "direction": "Right",
14
+ "pad_to_multiple_of": null,
15
+ "pad_id": 1,
16
+ "pad_type_id": 0,
17
+ "pad_token": "<pad>"
18
+ },
19
+ "added_tokens": [
20
+ {
21
+ "id": 0,
22
+ "content": "[CLS]",
23
+ "single_word": false,
24
+ "lstrip": false,
25
+ "rstrip": false,
26
+ "normalized": false,
27
+ "special": true
28
+ },
29
+ {
30
+ "id": 1,
31
+ "content": "<pad>",
32
+ "single_word": false,
33
+ "lstrip": false,
34
+ "rstrip": false,
35
+ "normalized": false,
36
+ "special": true
37
+ },
38
+ {
39
+ "id": 2,
40
+ "content": "[SEP]",
41
+ "single_word": false,
42
+ "lstrip": false,
43
+ "rstrip": false,
44
+ "normalized": false,
45
+ "special": true
46
+ },
47
+ {
48
+ "id": 3,
49
+ "content": "<unk>",
50
+ "single_word": false,
51
+ "lstrip": false,
52
+ "rstrip": false,
53
+ "normalized": false,
54
+ "special": true
55
+ },
56
+ {
57
+ "id": 5,
58
+ "content": "ar_AR",
59
+ "single_word": false,
60
+ "lstrip": false,
61
+ "rstrip": false,
62
+ "normalized": false,
63
+ "special": true
64
+ },
65
+ {
66
+ "id": 6,
67
+ "content": "cs_CZ",
68
+ "single_word": false,
69
+ "lstrip": false,
70
+ "rstrip": false,
71
+ "normalized": false,
72
+ "special": true
73
+ },
74
+ {
75
+ "id": 7,
76
+ "content": "de_DE",
77
+ "single_word": false,
78
+ "lstrip": false,
79
+ "rstrip": false,
80
+ "normalized": false,
81
+ "special": true
82
+ },
83
+ {
84
+ "id": 8,
85
+ "content": "en_XX",
86
+ "single_word": false,
87
+ "lstrip": false,
88
+ "rstrip": false,
89
+ "normalized": false,
90
+ "special": true
91
+ },
92
+ {
93
+ "id": 9,
94
+ "content": "es_XX",
95
+ "single_word": false,
96
+ "lstrip": false,
97
+ "rstrip": false,
98
+ "normalized": false,
99
+ "special": true
100
+ },
101
+ {
102
+ "id": 10,
103
+ "content": "et_EE",
104
+ "single_word": false,
105
+ "lstrip": false,
106
+ "rstrip": false,
107
+ "normalized": false,
108
+ "special": true
109
+ },
110
+ {
111
+ "id": 11,
112
+ "content": "fi_FI",
113
+ "single_word": false,
114
+ "lstrip": false,
115
+ "rstrip": false,
116
+ "normalized": false,
117
+ "special": true
118
+ },
119
+ {
120
+ "id": 12,
121
+ "content": "fr_XX",
122
+ "single_word": false,
123
+ "lstrip": false,
124
+ "rstrip": false,
125
+ "normalized": false,
126
+ "special": true
127
+ },
128
+ {
129
+ "id": 13,
130
+ "content": "gu_IN",
131
+ "single_word": false,
132
+ "lstrip": false,
133
+ "rstrip": false,
134
+ "normalized": false,
135
+ "special": true
136
+ },
137
+ {
138
+ "id": 14,
139
+ "content": "hi_IN",
140
+ "single_word": false,
141
+ "lstrip": false,
142
+ "rstrip": false,
143
+ "normalized": false,
144
+ "special": true
145
+ },
146
+ {
147
+ "id": 15,
148
+ "content": "it_IT",
149
+ "single_word": false,
150
+ "lstrip": false,
151
+ "rstrip": false,
152
+ "normalized": false,
153
+ "special": true
154
+ },
155
+ {
156
+ "id": 16,
157
+ "content": "ja_XX",
158
+ "single_word": false,
159
+ "lstrip": false,
160
+ "rstrip": false,
161
+ "normalized": false,
162
+ "special": true
163
+ },
164
+ {
165
+ "id": 17,
166
+ "content": "kk_KZ",
167
+ "single_word": false,
168
+ "lstrip": false,
169
+ "rstrip": false,
170
+ "normalized": false,
171
+ "special": true
172
+ },
173
+ {
174
+ "id": 18,
175
+ "content": "ko_KR",
176
+ "single_word": false,
177
+ "lstrip": false,
178
+ "rstrip": false,
179
+ "normalized": false,
180
+ "special": true
181
+ },
182
+ {
183
+ "id": 19,
184
+ "content": "lt_LT",
185
+ "single_word": false,
186
+ "lstrip": false,
187
+ "rstrip": false,
188
+ "normalized": false,
189
+ "special": true
190
+ },
191
+ {
192
+ "id": 20,
193
+ "content": "lv_LV",
194
+ "single_word": false,
195
+ "lstrip": false,
196
+ "rstrip": false,
197
+ "normalized": false,
198
+ "special": true
199
+ },
200
+ {
201
+ "id": 21,
202
+ "content": "my_MM",
203
+ "single_word": false,
204
+ "lstrip": false,
205
+ "rstrip": false,
206
+ "normalized": false,
207
+ "special": true
208
+ },
209
+ {
210
+ "id": 22,
211
+ "content": "ne_NP",
212
+ "single_word": false,
213
+ "lstrip": false,
214
+ "rstrip": false,
215
+ "normalized": false,
216
+ "special": true
217
+ },
218
+ {
219
+ "id": 23,
220
+ "content": "nl_XX",
221
+ "single_word": false,
222
+ "lstrip": false,
223
+ "rstrip": false,
224
+ "normalized": false,
225
+ "special": true
226
+ },
227
+ {
228
+ "id": 24,
229
+ "content": "ro_RO",
230
+ "single_word": false,
231
+ "lstrip": false,
232
+ "rstrip": false,
233
+ "normalized": false,
234
+ "special": true
235
+ },
236
+ {
237
+ "id": 25,
238
+ "content": "ru_RU",
239
+ "single_word": false,
240
+ "lstrip": false,
241
+ "rstrip": false,
242
+ "normalized": false,
243
+ "special": true
244
+ },
245
+ {
246
+ "id": 26,
247
+ "content": "si_LK",
248
+ "single_word": false,
249
+ "lstrip": false,
250
+ "rstrip": false,
251
+ "normalized": false,
252
+ "special": true
253
+ },
254
+ {
255
+ "id": 27,
256
+ "content": "tr_TR",
257
+ "single_word": false,
258
+ "lstrip": false,
259
+ "rstrip": false,
260
+ "normalized": false,
261
+ "special": true
262
+ },
263
+ {
264
+ "id": 28,
265
+ "content": "vi_VN",
266
+ "single_word": false,
267
+ "lstrip": false,
268
+ "rstrip": false,
269
+ "normalized": false,
270
+ "special": true
271
+ },
272
+ {
273
+ "id": 29,
274
+ "content": "zh_CN",
275
+ "single_word": false,
276
+ "lstrip": false,
277
+ "rstrip": false,
278
+ "normalized": false,
279
+ "special": true
280
+ },
281
+ {
282
+ "id": 30,
283
+ "content": "af_ZA",
284
+ "single_word": false,
285
+ "lstrip": false,
286
+ "rstrip": false,
287
+ "normalized": false,
288
+ "special": true
289
+ },
290
+ {
291
+ "id": 31,
292
+ "content": "az_AZ",
293
+ "single_word": false,
294
+ "lstrip": false,
295
+ "rstrip": false,
296
+ "normalized": false,
297
+ "special": true
298
+ },
299
+ {
300
+ "id": 32,
301
+ "content": "bn_IN",
302
+ "single_word": false,
303
+ "lstrip": false,
304
+ "rstrip": false,
305
+ "normalized": false,
306
+ "special": true
307
+ },
308
+ {
309
+ "id": 33,
310
+ "content": "fa_IR",
311
+ "single_word": false,
312
+ "lstrip": false,
313
+ "rstrip": false,
314
+ "normalized": false,
315
+ "special": true
316
+ },
317
+ {
318
+ "id": 34,
319
+ "content": "he_IL",
320
+ "single_word": false,
321
+ "lstrip": false,
322
+ "rstrip": false,
323
+ "normalized": false,
324
+ "special": true
325
+ },
326
+ {
327
+ "id": 35,
328
+ "content": "hr_HR",
329
+ "single_word": false,
330
+ "lstrip": false,
331
+ "rstrip": false,
332
+ "normalized": false,
333
+ "special": true
334
+ },
335
+ {
336
+ "id": 36,
337
+ "content": "id_ID",
338
+ "single_word": false,
339
+ "lstrip": false,
340
+ "rstrip": false,
341
+ "normalized": false,
342
+ "special": true
343
+ },
344
+ {
345
+ "id": 37,
346
+ "content": "ka_GE",
347
+ "single_word": false,
348
+ "lstrip": false,
349
+ "rstrip": false,
350
+ "normalized": false,
351
+ "special": true
352
+ },
353
+ {
354
+ "id": 38,
355
+ "content": "km_KH",
356
+ "single_word": false,
357
+ "lstrip": false,
358
+ "rstrip": false,
359
+ "normalized": false,
360
+ "special": true
361
+ },
362
+ {
363
+ "id": 39,
364
+ "content": "mk_MK",
365
+ "single_word": false,
366
+ "lstrip": false,
367
+ "rstrip": false,
368
+ "normalized": false,
369
+ "special": true
370
+ },
371
+ {
372
+ "id": 40,
373
+ "content": "ml_IN",
374
+ "single_word": false,
375
+ "lstrip": false,
376
+ "rstrip": false,
377
+ "normalized": false,
378
+ "special": true
379
+ },
380
+ {
381
+ "id": 41,
382
+ "content": "mn_MN",
383
+ "single_word": false,
384
+ "lstrip": false,
385
+ "rstrip": false,
386
+ "normalized": false,
387
+ "special": true
388
+ },
389
+ {
390
+ "id": 42,
391
+ "content": "mr_IN",
392
+ "single_word": false,
393
+ "lstrip": false,
394
+ "rstrip": false,
395
+ "normalized": false,
396
+ "special": true
397
+ },
398
+ {
399
+ "id": 43,
400
+ "content": "pl_PL",
401
+ "single_word": false,
402
+ "lstrip": false,
403
+ "rstrip": false,
404
+ "normalized": false,
405
+ "special": true
406
+ },
407
+ {
408
+ "id": 44,
409
+ "content": "ps_AF",
410
+ "single_word": false,
411
+ "lstrip": false,
412
+ "rstrip": false,
413
+ "normalized": false,
414
+ "special": true
415
+ },
416
+ {
417
+ "id": 45,
418
+ "content": "pt_XX",
419
+ "single_word": false,
420
+ "lstrip": false,
421
+ "rstrip": false,
422
+ "normalized": false,
423
+ "special": true
424
+ },
425
+ {
426
+ "id": 46,
427
+ "content": "sv_SE",
428
+ "single_word": false,
429
+ "lstrip": false,
430
+ "rstrip": false,
431
+ "normalized": false,
432
+ "special": true
433
+ },
434
+ {
435
+ "id": 47,
436
+ "content": "sw_KE",
437
+ "single_word": false,
438
+ "lstrip": false,
439
+ "rstrip": false,
440
+ "normalized": false,
441
+ "special": true
442
+ },
443
+ {
444
+ "id": 48,
445
+ "content": "ta_IN",
446
+ "single_word": false,
447
+ "lstrip": false,
448
+ "rstrip": false,
449
+ "normalized": false,
450
+ "special": true
451
+ },
452
+ {
453
+ "id": 49,
454
+ "content": "te_IN",
455
+ "single_word": false,
456
+ "lstrip": false,
457
+ "rstrip": false,
458
+ "normalized": false,
459
+ "special": true
460
+ },
461
+ {
462
+ "id": 50,
463
+ "content": "th_TH",
464
+ "single_word": false,
465
+ "lstrip": false,
466
+ "rstrip": false,
467
+ "normalized": false,
468
+ "special": true
469
+ },
470
+ {
471
+ "id": 51,
472
+ "content": "tl_XX",
473
+ "single_word": false,
474
+ "lstrip": false,
475
+ "rstrip": false,
476
+ "normalized": false,
477
+ "special": true
478
+ },
479
+ {
480
+ "id": 52,
481
+ "content": "uk_UA",
482
+ "single_word": false,
483
+ "lstrip": false,
484
+ "rstrip": false,
485
+ "normalized": false,
486
+ "special": true
487
+ },
488
+ {
489
+ "id": 53,
490
+ "content": "ur_PK",
491
+ "single_word": false,
492
+ "lstrip": false,
493
+ "rstrip": false,
494
+ "normalized": false,
495
+ "special": true
496
+ },
497
+ {
498
+ "id": 54,
499
+ "content": "xh_ZA",
500
+ "single_word": false,
501
+ "lstrip": false,
502
+ "rstrip": false,
503
+ "normalized": false,
504
+ "special": true
505
+ },
506
+ {
507
+ "id": 55,
508
+ "content": "gl_ES",
509
+ "single_word": false,
510
+ "lstrip": false,
511
+ "rstrip": false,
512
+ "normalized": false,
513
+ "special": true
514
+ },
515
+ {
516
+ "id": 56,
517
+ "content": "sl_SI",
518
+ "single_word": false,
519
+ "lstrip": false,
520
+ "rstrip": false,
521
+ "normalized": false,
522
+ "special": true
523
+ },
524
+ {
525
+ "id": 57,
526
+ "content": "[MASK]",
527
+ "single_word": false,
528
+ "lstrip": true,
529
+ "rstrip": false,
530
+ "normalized": true,
531
+ "special": true
532
+ },
533
+ {
534
+ "id": 58,
535
+ "content": "<s>",
536
+ "single_word": false,
537
+ "lstrip": false,
538
+ "rstrip": false,
539
+ "normalized": true,
540
+ "special": false
541
+ },
542
+ {
543
+ "id": 59,
544
+ "content": "</s>",
545
+ "single_word": false,
546
+ "lstrip": false,
547
+ "rstrip": false,
548
+ "normalized": true,
549
+ "special": false
550
+ },
551
+ {
552
+ "id": 60,
553
+ "content": "<2as>",
554
+ "single_word": false,
555
+ "lstrip": false,
556
+ "rstrip": false,
557
+ "normalized": true,
558
+ "special": false
559
+ },
560
+ {
561
+ "id": 61,
562
+ "content": "<2bn>",
563
+ "single_word": false,
564
+ "lstrip": false,
565
+ "rstrip": false,
566
+ "normalized": true,
567
+ "special": false
568
+ },
569
+ {
570
+ "id": 62,
571
+ "content": "<2en>",
572
+ "single_word": false,
573
+ "lstrip": false,
574
+ "rstrip": false,
575
+ "normalized": true,
576
+ "special": false
577
+ },
578
+ {
579
+ "id": 63,
580
+ "content": "<2gu>",
581
+ "single_word": false,
582
+ "lstrip": false,
583
+ "rstrip": false,
584
+ "normalized": true,
585
+ "special": false
586
+ },
587
+ {
588
+ "id": 64,
589
+ "content": "<2hi>",
590
+ "single_word": false,
591
+ "lstrip": false,
592
+ "rstrip": false,
593
+ "normalized": true,
594
+ "special": false
595
+ },
596
+ {
597
+ "id": 65,
598
+ "content": "<2kn>",
599
+ "single_word": false,
600
+ "lstrip": false,
601
+ "rstrip": false,
602
+ "normalized": true,
603
+ "special": false
604
+ },
605
+ {
606
+ "id": 66,
607
+ "content": "<2ml>",
608
+ "single_word": false,
609
+ "lstrip": false,
610
+ "rstrip": false,
611
+ "normalized": true,
612
+ "special": false
613
+ },
614
+ {
615
+ "id": 67,
616
+ "content": "<2mr>",
617
+ "single_word": false,
618
+ "lstrip": false,
619
+ "rstrip": false,
620
+ "normalized": true,
621
+ "special": false
622
+ },
623
+ {
624
+ "id": 68,
625
+ "content": "<2or>",
626
+ "single_word": false,
627
+ "lstrip": false,
628
+ "rstrip": false,
629
+ "normalized": true,
630
+ "special": false
631
+ },
632
+ {
633
+ "id": 69,
634
+ "content": "<2pa>",
635
+ "single_word": false,
636
+ "lstrip": false,
637
+ "rstrip": false,
638
+ "normalized": true,
639
+ "special": false
640
+ },
641
+ {
642
+ "id": 70,
643
+ "content": "<2ta>",
644
+ "single_word": false,
645
+ "lstrip": false,
646
+ "rstrip": false,
647
+ "normalized": true,
648
+ "special": false
649
+ },
650
+ {
651
+ "id": 71,
652
+ "content": "<2te>",
653
+ "single_word": false,
654
+ "lstrip": false,
655
+ "rstrip": false,
656
+ "normalized": true,
657
+ "special": false
658
+ }
659
+ ],
660
+ "normalizer": {
661
+ "type": "Sequence",
662
+ "normalizers": [
663
+ {
664
+ "type": "Replace",
665
+ "pattern": {
666
+ "Regex": "[\\n\\r\\t]"
667
+ },
668
+ "content": " "
669
+ },
670
+ {
671
+ "type": "NFKC"
672
+ },
673
+ {
674
+ "type": "Strip",
675
+ "strip_left": false,
676
+ "strip_right": true
677
+ },
678
+ {
679
+ "type": "Replace",
680
+ "pattern": {
681
+ "Regex": " {2,}"
682
+ },
683
+ "content": "▁"
684
+ }
685
+ ]
686
+ },
687
+ "pre_tokenizer": {
688
+ "type": "Metaspace",
689
+ "replacement": "▁",
690
+ "prepend_scheme": "always",
691
+ "split": true
692
+ },
693
+ "post_processor": {
694
+ "type": "TemplateProcessing",
695
+ "single": [
696
+ {
697
+ "SpecialToken": {
698
+ "id": "gu_IN",
699
+ "type_id": 0
700
+ }
701
+ },
702
+ {
703
+ "Sequence": {
704
+ "id": "A",
705
+ "type_id": 0
706
+ }
707
+ },
708
+ {
709
+ "SpecialToken": {
710
+ "id": "[SEP]",
711
+ "type_id": 0
712
+ }
713
+ }
714
+ ],
715
+ "pair": [
716
+ {
717
+ "SpecialToken": {
718
+ "id": "gu_IN",
719
+ "type_id": 0
720
+ }
721
+ },
722
+ {
723
+ "Sequence": {
724
+ "id": "A",
725
+ "type_id": 0
726
+ }
727
+ },
728
+ {
729
+ "Sequence": {
730
+ "id": "B",
731
+ "type_id": 0
732
+ }
733
+ },
734
+ {
735
+ "SpecialToken": {
736
+ "id": "[SEP]",
737
+ "type_id": 0
738
+ }
739
+ }
740
+ ],
741
+ "special_tokens": {
742
+ "[SEP]": {
743
+ "id": "[SEP]",
744
+ "ids": [
745
+ 2
746
+ ],
747
+ "tokens": [
748
+ "[SEP]"
749
+ ]
750
+ },
751
+ "gu_IN": {
752
+ "id": "gu_IN",
753
+ "ids": [
754
+ 13
755
+ ],
756
+ "tokens": [
757
+ "gu_IN"
758
+ ]
759
+ }
760
+ }
761
+ },
762
+ "decoder": {
763
+ "type": "Metaspace",
764
+ "replacement": "▁",
765
+ "prepend_scheme": "always",
766
+ "split": true
767
+ },
768
+ "model": {
769
+ "type": "Unigram",
770
+ "unk_id": 3,
771
+ "vocab": [
772
+ [
773
+ "[CLS]",
774
+ 0.0
775
+ ],
776
+ [
777
+ "<pad>",
778
+ 0.0
779
+ ],
780
+ [
781
+ "[SEP]",
782
+ 0.0
783
+ ],
784
+ [
785
+ "<unk>",
786
+ 0.0
787
+ ],
788
+ [
789
+ "▁",
790
+ -2.0
791
+ ],
792
+ [
793
+ "ar_AR",
794
+ 0.0
795
+ ],
796
+ [
797
+ "cs_CZ",
798
+ 0.0
799
+ ],
800
+ [
801
+ "de_DE",
802
+ 0.0
803
+ ],
804
+ [
805
+ "en_XX",
806
+ 0.0
807
+ ],
808
+ [
809
+ "es_XX",
810
+ 0.0
811
+ ],
812
+ [
813
+ "et_EE",
814
+ 0.0
815
+ ],
816
+ [
817
+ "fi_FI",
818
+ 0.0
819
+ ],
820
+ [
821
+ "fr_XX",
822
+ 0.0
823
+ ],
824
+ [
825
+ "gu_IN",
826
+ 0.0
827
+ ],
828
+ [
829
+ "hi_IN",
830
+ 0.0
831
+ ],
832
+ [
833
+ "it_IT",
834
+ 0.0
835
+ ],
836
+ [
837
+ "ja_XX",
838
+ 0.0
839
+ ],
840
+ [
841
+ "kk_KZ",
842
+ 0.0
843
+ ],
844
+ [
845
+ "ko_KR",
846
+ 0.0
847
+ ],
848
+ [
849
+ "lt_LT",
850
+ 0.0
851
+ ],
852
+ [
853
+ "lv_LV",
854
+ 0.0
855
+ ],
856
+ [
857
+ "my_MM",
858
+ 0.0
859
+ ],
860
+ [
861
+ "ne_NP",
862
+ 0.0
863
+ ],
864
+ [
865
+ "nl_XX",
866
+ 0.0
867
+ ],
868
+ [
869
+ "ro_RO",
870
+ 0.0
871
+ ],
872
+ [
873
+ "ru_RU",
874
+ 0.0
875
+ ],
876
+ [
877
+ "si_LK",
878
+ 0.0
879
+ ],
880
+ [
881
+ "tr_TR",
882
+ 0.0
883
+ ],
884
+ [
885
+ "vi_VN",
886
+ 0.0
887
+ ],
888
+ [
889
+ "zh_CN",
890
+ 0.0
891
+ ],
892
+ [
893
+ "af_ZA",
894
+ 0.0
895
+ ],
896
+ [
897
+ "az_AZ",
898
+ 0.0
899
+ ],
900
+ [
901
+ "bn_IN",
902
+ 0.0
903
+ ],
904
+ [
905
+ "fa_IR",
906
+ 0.0
907
+ ],
908
+ [
909
+ "he_IL",
910
+ 0.0
911
+ ],
912
+ [
913
+ "hr_HR",
914
+ 0.0
915
+ ],
916
+ [
917
+ "id_ID",
918
+ 0.0
919
+ ],
920
+ [
921
+ "ka_GE",
922
+ 0.0
923
+ ],
924
+ [
925
+ "km_KH",
926
+ 0.0
927
+ ],
928
+ [
929
+ "mk_MK",
930
+ 0.0
931
+ ],
932
+ [
933
+ "ml_IN",
934
+ 0.0
935
+ ],
936
+ [
937
+ "mn_MN",
938
+ 0.0
939
+ ],
940
+ [
941
+ "mr_IN",
942
+ 0.0
943
+ ],
944
+ [
945
+ "pl_PL",
946
+ 0.0
947
+ ],
948
+ [
949
+ "ps_AF",
950
+ 0.0
951
+ ],
952
+ [
953
+ "pt_XX",
954
+ 0.0
955
+ ],
956
+ [
957
+ "sv_SE",
958
+ 0.0
959
+ ],
960
+ [
961
+ "sw_KE",
962
+ 0.0
963
+ ],
964
+ [
965
+ "ta_IN",
966
+ 0.0
967
+ ],
968
+ [
969
+ "te_IN",
970
+ 0.0
971
+ ],
972
+ [
973
+ "th_TH",
974
+ 0.0
975
+ ],
976
+ [
977
+ "tl_XX",
978
+ 0.0
979
+ ],
980
+ [
981
+ "uk_UA",
982
+ 0.0
983
+ ],
984
+ [
985
+ "ur_PK",
986
+ 0.0
987
+ ],
988
+ [
989
+ "xh_ZA",
990
+ 0.0
991
+ ],
992
+ [
993
+ "gl_ES",
994
+ 0.0
995
+ ],
996
+ [
997
+ "sl_SI",
998
+ 0.0
999
+ ],
1000
+ [
1001
+ "[MASK]",
1002
+ 0.0
1003
+ ]
1004
+ ],
1005
+ "byte_fallback": false
1006
+ }
1007
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "bos_token": "[CLS]",
4
+ "cls_token": "[CLS]",
5
+ "do_lower_case": false,
6
+ "eos_token": "[SEP]",
7
+ "extra_special_tokens": [
8
+ "<s>",
9
+ "</s>",
10
+ "<2as>",
11
+ "<2bn>",
12
+ "<2en>",
13
+ "<2gu>",
14
+ "<2hi>",
15
+ "<2kn>",
16
+ "<2ml>",
17
+ "<2mr>",
18
+ "<2or>",
19
+ "<2pa>",
20
+ "<2ta>",
21
+ "<2te>",
22
+ "ar_AR",
23
+ "cs_CZ",
24
+ "de_DE",
25
+ "en_XX",
26
+ "es_XX",
27
+ "et_EE",
28
+ "fi_FI",
29
+ "fr_XX",
30
+ "gu_IN",
31
+ "hi_IN",
32
+ "it_IT",
33
+ "ja_XX",
34
+ "kk_KZ",
35
+ "ko_KR",
36
+ "lt_LT",
37
+ "lv_LV",
38
+ "my_MM",
39
+ "ne_NP",
40
+ "nl_XX",
41
+ "ro_RO",
42
+ "ru_RU",
43
+ "si_LK",
44
+ "tr_TR",
45
+ "vi_VN",
46
+ "zh_CN",
47
+ "af_ZA",
48
+ "az_AZ",
49
+ "bn_IN",
50
+ "fa_IR",
51
+ "he_IL",
52
+ "hr_HR",
53
+ "id_ID",
54
+ "ka_GE",
55
+ "km_KH",
56
+ "mk_MK",
57
+ "ml_IN",
58
+ "mn_MN",
59
+ "mr_IN",
60
+ "pl_PL",
61
+ "ps_AF",
62
+ "pt_XX",
63
+ "sv_SE",
64
+ "sw_KE",
65
+ "ta_IN",
66
+ "te_IN",
67
+ "th_TH",
68
+ "tl_XX",
69
+ "uk_UA",
70
+ "ur_PK",
71
+ "xh_ZA",
72
+ "gl_ES",
73
+ "sl_SI"
74
+ ],
75
+ "is_local": false,
76
+ "keep_accents": true,
77
+ "mask_token": "[MASK]",
78
+ "model_max_length": 1000000000000000019884624838656,
79
+ "pad_token": "<pad>",
80
+ "remove_space": true,
81
+ "sep_token": "[SEP]",
82
+ "src_lang": "gu_IN",
83
+ "tgt_lang": "gu_IN",
84
+ "tokenizer_class": "MBart50Tokenizer",
85
+ "unk_token": "<unk>",
86
+ "use_fast": false
87
+ }