skfrost19 commited on
Commit
c12f2c6
·
verified ·
1 Parent(s): 8fe1eb6

Add new CrossEncoder model

Browse files
Files changed (7) hide show
  1. README.md +472 -0
  2. config.json +34 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +37 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +65 -0
  7. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - generated_from_trainer
8
+ - dataset_size:1990000
9
+ - loss:BinaryCrossEntropyLoss
10
+ datasets:
11
+ - sentence-transformers/msmarco
12
+ pipeline_tag: text-ranking
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - map
16
+ - mrr@10
17
+ - ndcg@10
18
+ model-index:
19
+ - name: CrossEncoder
20
+ results:
21
+ - task:
22
+ type: cross-encoder-reranking
23
+ name: Cross Encoder Reranking
24
+ dataset:
25
+ name: NanoMSMARCO R100
26
+ type: NanoMSMARCO_R100
27
+ metrics:
28
+ - type: map
29
+ value: 0.5702
30
+ name: Map
31
+ - type: mrr@10
32
+ value: 0.5654
33
+ name: Mrr@10
34
+ - type: ndcg@10
35
+ value: 0.6447
36
+ name: Ndcg@10
37
+ - task:
38
+ type: cross-encoder-reranking
39
+ name: Cross Encoder Reranking
40
+ dataset:
41
+ name: NanoNFCorpus R100
42
+ type: NanoNFCorpus_R100
43
+ metrics:
44
+ - type: map
45
+ value: 0.3234
46
+ name: Map
47
+ - type: mrr@10
48
+ value: 0.5666
49
+ name: Mrr@10
50
+ - type: ndcg@10
51
+ value: 0.3802
52
+ name: Ndcg@10
53
+ - task:
54
+ type: cross-encoder-reranking
55
+ name: Cross Encoder Reranking
56
+ dataset:
57
+ name: NanoNQ R100
58
+ type: NanoNQ_R100
59
+ metrics:
60
+ - type: map
61
+ value: 0.6298
62
+ name: Map
63
+ - type: mrr@10
64
+ value: 0.6525
65
+ name: Mrr@10
66
+ - type: ndcg@10
67
+ value: 0.6893
68
+ name: Ndcg@10
69
+ - task:
70
+ type: cross-encoder-nano-beir
71
+ name: Cross Encoder Nano BEIR
72
+ dataset:
73
+ name: NanoBEIR R100 mean
74
+ type: NanoBEIR_R100_mean
75
+ metrics:
76
+ - type: map
77
+ value: 0.5078
78
+ name: Map
79
+ - type: mrr@10
80
+ value: 0.5948
81
+ name: Mrr@10
82
+ - type: ndcg@10
83
+ value: 0.5714
84
+ name: Ndcg@10
85
+ ---
86
+
87
+ # CrossEncoder
88
+
89
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model trained on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
90
+
91
+ ## Model Details
92
+
93
+ ### Model Description
94
+ - **Model Type:** Cross Encoder
95
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
96
+ - **Maximum Sequence Length:** 512 tokens
97
+ - **Number of Output Labels:** 1 label
98
+ - **Training Dataset:**
99
+ - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
100
+ - **Language:** en
101
+ <!-- - **License:** Unknown -->
102
+
103
+ ### Model Sources
104
+
105
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
106
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
107
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
108
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
109
+
110
+ ## Usage
111
+
112
+ ### Direct Usage (Sentence Transformers)
113
+
114
+ First install the Sentence Transformers library:
115
+
116
+ ```bash
117
+ pip install -U sentence-transformers
118
+ ```
119
+
120
+ Then you can load this model and run inference.
121
+ ```python
122
+ from sentence_transformers import CrossEncoder
123
+
124
+ # Download from the 🤗 Hub
125
+ model = CrossEncoder("skfrost19/reranker-MiniLM-L12-H384-uncased-msmarco-bce-ep-3")
126
+ # Get scores for pairs of texts
127
+ pairs = [
128
+ ['what symptoms might a patient with a tmd have', 'TMD sufferers have a long list of symptoms, including chronic pain (https://youtu.be/SvMaJb8o2RI), many of which are in common with Parkinsonâ\x80\x99s disease (PD) symptoms.'],
129
+ ['what is a thermal protector', 'The word hero comes from the Greek á¼¥Ï\x81Ï\x89Ï\x82 (hÄ\x93rÅ\x8ds), hero, warrior, particularly one such as Heracles with divine ancestry or later given divine honors. literally protector or defender.'],
130
+ ['how many copies of call of duty wwii sold', 'Call of Duty 3. Call of Duty 3 is a World War II first-person shooter and the third installment in the Call of Duty video game series. Released on November 7, 2006, the game was developed by Treyarch, and was the first major installment in the Call of Duty series not to be developed by Infinity Ward. It was also the first not to be released on the PC platform. It was released on the PlayStation 2, PlayStation 3, Wii, Xbox, and Xbox 360.'],
131
+ ['what is the desired temperature for the fresh food compartment in a refrigerator', 'A refrigerator maintains a temperature a few degrees above the freezing point of water. Optimum temperature range for perishable food storage is 3 to 5 °C (37 to 41 °F).emperature settings for refrigerator and freezer compartments are often given arbitrary numbers by manufacturers (for example, 1 through 9, warmest to coldest), but generally 3 to 5 °C (37 to 41 °F) is ideal for the refrigerator compartment and â\x88\x9218 °C (0 °F) for the freezer.'],
132
+ ['what is gsm alarm system', 'Iâ\x80\x99m sure you would have these questions in your mind when you heard GSM alarm system at the first time. GSM alarm system is an alarm system that operating through GSM (global system for mobile communications) network; not requiring a telephone line.urthermore, in the case of burglar entering the premises and cutting the telephone line, the GSM alarm would not be affected and still work as it does not require the use of a fixed phone line. So this security alarm is ideal for the place where no fixed phone line or hard to get one.'],
133
+ ]
134
+ scores = model.predict(pairs)
135
+ print(scores.shape)
136
+ # (5,)
137
+
138
+ # Or rank different texts based on similarity to a single text
139
+ ranks = model.rank(
140
+ 'what symptoms might a patient with a tmd have',
141
+ [
142
+ 'TMD sufferers have a long list of symptoms, including chronic pain (https://youtu.be/SvMaJb8o2RI), many of which are in common with Parkinsonâ\x80\x99s disease (PD) symptoms.',
143
+ 'The word hero comes from the Greek á¼¥Ï\x81Ï\x89Ï\x82 (hÄ\x93rÅ\x8ds), hero, warrior, particularly one such as Heracles with divine ancestry or later given divine honors. literally protector or defender.',
144
+ 'Call of Duty 3. Call of Duty 3 is a World War II first-person shooter and the third installment in the Call of Duty video game series. Released on November 7, 2006, the game was developed by Treyarch, and was the first major installment in the Call of Duty series not to be developed by Infinity Ward. It was also the first not to be released on the PC platform. It was released on the PlayStation 2, PlayStation 3, Wii, Xbox, and Xbox 360.',
145
+ 'A refrigerator maintains a temperature a few degrees above the freezing point of water. Optimum temperature range for perishable food storage is 3 to 5 °C (37 to 41 °F).emperature settings for refrigerator and freezer compartments are often given arbitrary numbers by manufacturers (for example, 1 through 9, warmest to coldest), but generally 3 to 5 °C (37 to 41 °F) is ideal for the refrigerator compartment and â\x88\x9218 °C (0 °F) for the freezer.',
146
+ 'Iâ\x80\x99m sure you would have these questions in your mind when you heard GSM alarm system at the first time. GSM alarm system is an alarm system that operating through GSM (global system for mobile communications) network; not requiring a telephone line.urthermore, in the case of burglar entering the premises and cutting the telephone line, the GSM alarm would not be affected and still work as it does not require the use of a fixed phone line. So this security alarm is ideal for the place where no fixed phone line or hard to get one.',
147
+ ]
148
+ )
149
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
150
+ ```
151
+
152
+ <!--
153
+ ### Direct Usage (Transformers)
154
+
155
+ <details><summary>Click to see the direct usage in Transformers</summary>
156
+
157
+ </details>
158
+ -->
159
+
160
+ <!--
161
+ ### Downstream Usage (Sentence Transformers)
162
+
163
+ You can finetune this model on your own dataset.
164
+
165
+ <details><summary>Click to expand</summary>
166
+
167
+ </details>
168
+ -->
169
+
170
+ <!--
171
+ ### Out-of-Scope Use
172
+
173
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
174
+ -->
175
+
176
+ ## Evaluation
177
+
178
+ ### Metrics
179
+
180
+ #### Cross Encoder Reranking
181
+
182
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
183
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
184
+ ```json
185
+ {
186
+ "at_k": 10,
187
+ "always_rerank_positives": true
188
+ }
189
+ ```
190
+
191
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
192
+ |:------------|:---------------------|:---------------------|:---------------------|
193
+ | map | 0.5702 (+0.0806) | 0.3234 (+0.0624) | 0.6298 (+0.2101) |
194
+ | mrr@10 | 0.5654 (+0.0879) | 0.5666 (+0.0667) | 0.6525 (+0.2258) |
195
+ | **ndcg@10** | **0.6447 (+0.1043)** | **0.3802 (+0.0552)** | **0.6893 (+0.1887)** |
196
+
197
+ #### Cross Encoder Nano BEIR
198
+
199
+ * Dataset: `NanoBEIR_R100_mean`
200
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
201
+ ```json
202
+ {
203
+ "dataset_names": [
204
+ "msmarco",
205
+ "nfcorpus",
206
+ "nq"
207
+ ],
208
+ "rerank_k": 100,
209
+ "at_k": 10,
210
+ "always_rerank_positives": true
211
+ }
212
+ ```
213
+
214
+ | Metric | Value |
215
+ |:------------|:---------------------|
216
+ | map | 0.5078 (+0.1177) |
217
+ | mrr@10 | 0.5948 (+0.1268) |
218
+ | **ndcg@10** | **0.5714 (+0.1161)** |
219
+
220
+ <!--
221
+ ## Bias, Risks and Limitations
222
+
223
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
224
+ -->
225
+
226
+ <!--
227
+ ### Recommendations
228
+
229
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
230
+ -->
231
+
232
+ ## Training Details
233
+
234
+ ### Training Dataset
235
+
236
+ #### msmarco
237
+
238
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
239
+ * Size: 1,990,000 training samples
240
+ * Columns: <code>query</code>, <code>passage</code>, and <code>score</code>
241
+ * Approximate statistics based on the first 1000 samples:
242
+ | | query | passage | score |
243
+ |:--------|:------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
244
+ | type | string | string | float |
245
+ | details | <ul><li>min: 11 characters</li><li>mean: 34.61 characters</li><li>max: 124 characters</li></ul> | <ul><li>min: 82 characters</li><li>mean: 357.43 characters</li><li>max: 1034 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.49</li><li>max: 1.0</li></ul> |
246
+ * Samples:
247
+ | query | passage | score |
248
+ |:---------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
249
+ | <code>what causes your tailbone to hurt</code> | <code>A coccyx injury results in pain and discomfort in the tailbone area (the condition is called coccydynia). These injuries may result in a bruise, dislocation, or fracture (break) of the coccyx. Although they may be slow to heal, the majority of coccyx injuries can be managed with cautious treatment.ost tailbone injuries are caused by trauma to the coccyx area. 1 A fall onto the tailbone in the seated position, usually against a hard surface, is the most common cause of coccyx injuries. 2 A direct blow to the tailbone, such as those that occur during contact sports, can injure the coccyx.</code> | <code>1.0</code> |
250
+ | <code>what muscles do trunk lateral flexion</code> | <code>It’s the same with the External Obliques, but unlike the External Obliques, they are not visible when fully developed. Action: 1 Supports abdominal wall, assists forced respiration, aids raising intra-abdominal pressure and, with muscles of other side, abducts and rotates trunk. 2 Contraction of one side alone laterally bends the trunk to that side and rotates the trunk to the other side.</code> | <code>0.0</code> |
251
+ | <code>brake horsepower definition</code> | <code>When the brake lights will not come on, the first thing to check is the third-brake light. If it too is not working, the brake-light switch, a bad fuse or an unplugged harness is likely.ull up on the brake pedal and if the lights go out, switch mis-alignment or pedal position error is the likely cause. The final possibility is a wire shorted to power. Unplug the brake-light switch and if the lights stay on, a short circuit is the case.</code> | <code>0.0</code> |
252
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
253
+ ```json
254
+ {
255
+ "activation_fn": "torch.nn.modules.linear.Identity",
256
+ "pos_weight": null
257
+ }
258
+ ```
259
+
260
+ ### Evaluation Dataset
261
+
262
+ #### msmarco
263
+
264
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
265
+ * Size: 10,000 evaluation samples
266
+ * Columns: <code>query</code>, <code>passage</code>, and <code>score</code>
267
+ * Approximate statistics based on the first 1000 samples:
268
+ | | query | passage | score |
269
+ |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:--------------------------------------------------------------|
270
+ | type | string | string | float |
271
+ | details | <ul><li>min: 9 characters</li><li>mean: 33.72 characters</li><li>max: 193 characters</li></ul> | <ul><li>min: 55 characters</li><li>mean: 353.35 characters</li><li>max: 895 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.5</li><li>max: 1.0</li></ul> |
272
+ * Samples:
273
+ | query | passage | score |
274
+ |:-----------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
275
+ | <code>what symptoms might a patient with a tmd have</code> | <code>TMD sufferers have a long list of symptoms, including chronic pain (https://youtu.be/SvMaJb8o2RI), many of which are in common with Parkinson’s disease (PD) symptoms.</code> | <code>1.0</code> |
276
+ | <code>what is a thermal protector</code> | <code>The word hero comes from the Greek ἥρως (hērōs), hero, warrior, particularly one such as Heracles with divine ancestry or later given divine honors. literally protector or defender.</code> | <code>0.0</code> |
277
+ | <code>how many copies of call of duty wwii sold</code> | <code>Call of Duty 3. Call of Duty 3 is a World War II first-person shooter and the third installment in the Call of Duty video game series. Released on November 7, 2006, the game was developed by Treyarch, and was the first major installment in the Call of Duty series not to be developed by Infinity Ward. It was also the first not to be released on the PC platform. It was released on the PlayStation 2, PlayStation 3, Wii, Xbox, and Xbox 360.</code> | <code>0.0</code> |
278
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
279
+ ```json
280
+ {
281
+ "activation_fn": "torch.nn.modules.linear.Identity",
282
+ "pos_weight": null
283
+ }
284
+ ```
285
+
286
+ ### Training Hyperparameters
287
+ #### Non-Default Hyperparameters
288
+
289
+ - `eval_strategy`: steps
290
+ - `per_device_train_batch_size`: 512
291
+ - `per_device_eval_batch_size`: 512
292
+ - `learning_rate`: 2e-05
293
+ - `num_train_epochs`: 1
294
+ - `warmup_ratio`: 0.1
295
+ - `seed`: 12
296
+ - `bf16`: True
297
+ - `dataloader_num_workers`: 4
298
+ - `load_best_model_at_end`: True
299
+
300
+ #### All Hyperparameters
301
+ <details><summary>Click to expand</summary>
302
+
303
+ - `overwrite_output_dir`: False
304
+ - `do_predict`: False
305
+ - `eval_strategy`: steps
306
+ - `prediction_loss_only`: True
307
+ - `per_device_train_batch_size`: 512
308
+ - `per_device_eval_batch_size`: 512
309
+ - `per_gpu_train_batch_size`: None
310
+ - `per_gpu_eval_batch_size`: None
311
+ - `gradient_accumulation_steps`: 1
312
+ - `eval_accumulation_steps`: None
313
+ - `torch_empty_cache_steps`: None
314
+ - `learning_rate`: 2e-05
315
+ - `weight_decay`: 0.0
316
+ - `adam_beta1`: 0.9
317
+ - `adam_beta2`: 0.999
318
+ - `adam_epsilon`: 1e-08
319
+ - `max_grad_norm`: 1.0
320
+ - `num_train_epochs`: 1
321
+ - `max_steps`: -1
322
+ - `lr_scheduler_type`: linear
323
+ - `lr_scheduler_kwargs`: {}
324
+ - `warmup_ratio`: 0.1
325
+ - `warmup_steps`: 0
326
+ - `log_level`: passive
327
+ - `log_level_replica`: warning
328
+ - `log_on_each_node`: True
329
+ - `logging_nan_inf_filter`: True
330
+ - `save_safetensors`: True
331
+ - `save_on_each_node`: False
332
+ - `save_only_model`: False
333
+ - `restore_callback_states_from_checkpoint`: False
334
+ - `no_cuda`: False
335
+ - `use_cpu`: False
336
+ - `use_mps_device`: False
337
+ - `seed`: 12
338
+ - `data_seed`: None
339
+ - `jit_mode_eval`: False
340
+ - `use_ipex`: False
341
+ - `bf16`: True
342
+ - `fp16`: False
343
+ - `fp16_opt_level`: O1
344
+ - `half_precision_backend`: auto
345
+ - `bf16_full_eval`: False
346
+ - `fp16_full_eval`: False
347
+ - `tf32`: None
348
+ - `local_rank`: 0
349
+ - `ddp_backend`: None
350
+ - `tpu_num_cores`: None
351
+ - `tpu_metrics_debug`: False
352
+ - `debug`: []
353
+ - `dataloader_drop_last`: False
354
+ - `dataloader_num_workers`: 4
355
+ - `dataloader_prefetch_factor`: None
356
+ - `past_index`: -1
357
+ - `disable_tqdm`: False
358
+ - `remove_unused_columns`: True
359
+ - `label_names`: None
360
+ - `load_best_model_at_end`: True
361
+ - `ignore_data_skip`: False
362
+ - `fsdp`: []
363
+ - `fsdp_min_num_params`: 0
364
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
365
+ - `tp_size`: 0
366
+ - `fsdp_transformer_layer_cls_to_wrap`: None
367
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
368
+ - `deepspeed`: None
369
+ - `label_smoothing_factor`: 0.0
370
+ - `optim`: adamw_torch
371
+ - `optim_args`: None
372
+ - `adafactor`: False
373
+ - `group_by_length`: False
374
+ - `length_column_name`: length
375
+ - `ddp_find_unused_parameters`: None
376
+ - `ddp_bucket_cap_mb`: None
377
+ - `ddp_broadcast_buffers`: False
378
+ - `dataloader_pin_memory`: True
379
+ - `dataloader_persistent_workers`: False
380
+ - `skip_memory_metrics`: True
381
+ - `use_legacy_prediction_loop`: False
382
+ - `push_to_hub`: False
383
+ - `resume_from_checkpoint`: None
384
+ - `hub_model_id`: None
385
+ - `hub_strategy`: every_save
386
+ - `hub_private_repo`: None
387
+ - `hub_always_push`: False
388
+ - `gradient_checkpointing`: False
389
+ - `gradient_checkpointing_kwargs`: None
390
+ - `include_inputs_for_metrics`: False
391
+ - `include_for_metrics`: []
392
+ - `eval_do_concat_batches`: True
393
+ - `fp16_backend`: auto
394
+ - `push_to_hub_model_id`: None
395
+ - `push_to_hub_organization`: None
396
+ - `mp_parameters`:
397
+ - `auto_find_batch_size`: False
398
+ - `full_determinism`: False
399
+ - `torchdynamo`: None
400
+ - `ray_scope`: last
401
+ - `ddp_timeout`: 1800
402
+ - `torch_compile`: False
403
+ - `torch_compile_backend`: None
404
+ - `torch_compile_mode`: None
405
+ - `dispatch_batches`: None
406
+ - `split_batches`: None
407
+ - `include_tokens_per_second`: False
408
+ - `include_num_input_tokens_seen`: False
409
+ - `neftune_noise_alpha`: None
410
+ - `optim_target_modules`: None
411
+ - `batch_eval_metrics`: False
412
+ - `eval_on_start`: False
413
+ - `use_liger_kernel`: False
414
+ - `eval_use_gather_object`: False
415
+ - `average_tokens_across_devices`: False
416
+ - `prompts`: None
417
+ - `batch_sampler`: batch_sampler
418
+ - `multi_dataset_batch_sampler`: proportional
419
+
420
+ </details>
421
+
422
+ ### Training Logs
423
+ | Epoch | Step | Training Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
424
+ |:------:|:----:|:-------------:|:------------------------:|:-------------------------:|:-------------------:|:--------------------------:|
425
+ | -1 | -1 | - | 0.6742 (+0.1337) | 0.4023 (+0.0772) | 0.7204 (+0.2197) | 0.5989 (+0.1436) |
426
+ | 0.0003 | 1 | 0.0588 | - | - | - | - |
427
+ | -1 | -1 | - | 0.6447 (+0.1043) | 0.3802 (+0.0552) | 0.6893 (+0.1887) | 0.5714 (+0.1161) |
428
+
429
+
430
+ ### Framework Versions
431
+ - Python: 3.11.5
432
+ - Sentence Transformers: 4.0.1
433
+ - Transformers: 4.50.3
434
+ - PyTorch: 2.6.0+cu124
435
+ - Accelerate: 1.6.0
436
+ - Datasets: 3.5.0
437
+ - Tokenizers: 0.21.1
438
+
439
+ ## Citation
440
+
441
+ ### BibTeX
442
+
443
+ #### Sentence Transformers
444
+ ```bibtex
445
+ @inproceedings{reimers-2019-sentence-bert,
446
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
447
+ author = "Reimers, Nils and Gurevych, Iryna",
448
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
449
+ month = "11",
450
+ year = "2019",
451
+ publisher = "Association for Computational Linguistics",
452
+ url = "https://arxiv.org/abs/1908.10084",
453
+ }
454
+ ```
455
+
456
+ <!--
457
+ ## Glossary
458
+
459
+ *Clearly define terms in order to be accessible across audiences.*
460
+ -->
461
+
462
+ <!--
463
+ ## Model Card Authors
464
+
465
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
466
+ -->
467
+
468
+ <!--
469
+ ## Model Card Contact
470
+
471
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
472
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 384,
10
+ "id2label": {
11
+ "0": "LABEL_0"
12
+ },
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 1536,
15
+ "label2id": {
16
+ "LABEL_0": 0
17
+ },
18
+ "layer_norm_eps": 1e-12,
19
+ "max_position_embeddings": 512,
20
+ "model_type": "bert",
21
+ "num_attention_heads": 12,
22
+ "num_hidden_layers": 12,
23
+ "pad_token_id": 0,
24
+ "position_embedding_type": "absolute",
25
+ "sentence_transformers": {
26
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
27
+ "version": "4.0.1"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.50.3",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1287301da6a19c1967530926b79e64baef914428cf5f23d76f2bc99c0de31fe2
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff