---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:28539
- loss:MultipleNegativesRankingLoss
base_model: laion/clap-htsat-unfused
widget:
- source_sentence: HE DECIDED TO WRITE HER CARE OF THE WEST SIDE POST OFFICE AND ASK
    FOR AN EXPLANATION AS WELL AS TO HAVE HER MEET HIM
  sentences:
  - GRADUALLY RELIEF CAME TO ALL OF US
  - IT SEEMED AS IF HIS FAMILY TROUBLES WERE JUST BEGINNING
  - I EXPLAINED TO ANTONIA HOW THIS MEANT THAT HE WAS TWENTY FOUR YEARS OLD THAT HE
    MUST HAVE BEEN THERE WHEN WHITE MEN FIRST CAME LEFT ON FROM BUFFALO AND INDIAN
    TIMES
- source_sentence: WITHOUT A WORD PETER GOT UP AND LIT HIS LANTERN
  sentences:
  - AS LEADING TO THE MENTION OF OTHER INTERESTING EVENTS WE MUST SET THIS INROAD
    CLEARLY BEFORE THE READER
  - SHE WANTED TO MAKE SOME REFERENCE TO THEIR RELATIONS UPON THE TRAIN BUT WAS TOO
    TIMID
  - THE DISTINGUISHING MARK OF THE HENS WAS A CREST OF LAMENTABLY SCANTY GROWTH IN
    THESE LATTER DAYS BUT SO ODDLY AND WICKEDLY ANALOGOUS TO HEPZIBAH'S TURBAN THAT
    PHOEBE TO THE POIGNANT DISTRESS OF HER CONSCIENCE BUT INEVITABLY WAS LED TO FANCY
    A GENERAL RESEMBLANCE BETWIXT THESE FORLORN BIPEDS AND HER RESPECTABLE RELATIVE
- source_sentence: NOTHING COULD BE MORE NATURAL THAN SUCH AN ASSEMBLY IN SUCH A PLACE
    AT SUCH A PERIOD
  sentences:
  - BUT HE COMPROMISED BY TELLING THE BOY THAT THERE WOULD BE NO REPLY
  - MANY LITTLE WRINKLES GATHERED BETWEEN HIS EYES AS HE CONTEMPLATED THIS AND HIS
    BROW MOISTENED
  - HE DID MANAGE TO BRING HIMSELF INTO THE MOOD TO GO OUT TO CARRIE BUT WHEN HE GOT
    IN OGDEN PLACE HE THOUGHT HE SAW A MAN WATCHING HIM AND WENT AWAY
- source_sentence: DEAR SIR WE BEG TO INFORM YOU THAT WE ARE INSTRUCTED TO WAIT UNTIL
    TO MORROW THURSDAY AT ONE O'CLOCK BEFORE FILING SUIT AGAINST YOU ON BEHALF OF
    MISSUS JULIA HURSTWOOD FOR DIVORCE AND ALIMONY
  sentences:
  - THE WHITE DOUBLE ROSEBUSH HAD EVIDENTLY BEEN PROPPED UP ANEW AGAINST THE HOUSE
    SINCE THE COMMENCEMENT OF THE SEASON AND A PEAR TREE AND THREE DAMSON TREES WHICH
    EXCEPT A ROW OF CURRANT BUSHES CONSTITUTED THE ONLY VARIETIES OF FRUIT BORE MARKS
    OF THE RECENT AMPUTATION OF SEVERAL SUPERFLUOUS OR DEFECTIVE LIMBS
  - LASTLY THE ROYAL BROTHERS FELL THEMSELVES VICTIMS TO THE EPIDEMIC WHICH SO SADLY
    SIGNALIZES THEIR REIGN
  - IT IS LIKE A BANDAGE OVER ONE'S EYES TO COME INTO IT
- source_sentence: HERE THE HOLY PRELATE OF FERNS MET HIM AND RELATED A VISION IN
    WHICH HE HAD BEEN INSTRUCTED TO DEMAND THE ABOLITION OF THE IMPOST
  sentences:
  - THE SHARP SMELL OF SPIRITS WENT THROUGH THE ROOM
  - YES HOW MANY
  - QUICKLY IT WAS COVERED WITH BRIGHT RED SPOTS I THOUGHT I HAD NEVER SEEN ANY BLOOD
    SO BRIGHT
datasets:
- openslr/librispeech_asr
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
co2_eq_emissions:
  emissions: 578.4000971210925
  energy_consumed: 2.161257658642011
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 7.59
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
- name: CLAP model trained on COCO Captions
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: librispeech eval
      type: librispeech-eval
    metrics:
    - type: cosine_accuracy@1
      value: 0.245
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.52
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.645
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.785
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.245
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.1733333333333333
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.12899999999999998
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.0785
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.245
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.52
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.645
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.785
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.503027364772325
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.41403968253968265
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.4252888359623941
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: librispeech test
      type: librispeech-test
    metrics:
    - type: cosine_accuracy@1
      value: 0.04885496183206107
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.1183206106870229
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.16908396946564885
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.2641221374045801
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.04885496183206107
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.03944020356234096
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.033816793893129776
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.026412213740458015
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.04885496183206107
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.1183206106870229
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.16908396946564885
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.2641221374045801
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.1402219692077291
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.10268266085059953
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.11950657997396778
      name: Cosine Map@100
---

# CLAP model trained on COCO Captions

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [laion/clap-htsat-unfused](https://huggingface.co/laion/clap-htsat-unfused) on the [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [laion/clap-htsat-unfused](https://huggingface.co/laion/clap-htsat-unfused) <!-- at revision 8fa0f1c6d0433df6e97c127f64b2a1d6c0dcda8a -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 512 dimensions
- **Similarity Function:** Cosine Similarity
- **Supported Modalities:** Text, Audio
- **Training Dataset:**
    - [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr)
- **Language:** en
- **License:** apache-2.0

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'get_text_features', 'method_output_name': 'pooler_output'}, 'audio': {'method': 'get_audio_features', 'method_output_name': 'pooler_output'}}, 'module_output_name': 'sentence_embedding', 'message_format': 'auto', 'architecture': 'ClapModel'})
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
# Run inference
inputs = [
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav',
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav',
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav',
]
embeddings = model.encode(inputs)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4362, 0.6843],
#         [0.4362, 1.0000, 0.2179],
#         [0.6843, 0.2179, 1.0000]])
```
<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval

* Datasets: `librispeech-eval` and `librispeech-test`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)

| Metric              | librispeech-eval | librispeech-test |
|:--------------------|:-----------------|:-----------------|
| cosine_accuracy@1   | 0.245            | 0.0489           |
| cosine_accuracy@3   | 0.52             | 0.1183           |
| cosine_accuracy@5   | 0.645            | 0.1691           |
| cosine_accuracy@10  | 0.785            | 0.2641           |
| cosine_precision@1  | 0.245            | 0.0489           |
| cosine_precision@3  | 0.1733           | 0.0394           |
| cosine_precision@5  | 0.129            | 0.0338           |
| cosine_precision@10 | 0.0785           | 0.0264           |
| cosine_recall@1     | 0.245            | 0.0489           |
| cosine_recall@3     | 0.52             | 0.1183           |
| cosine_recall@5     | 0.645            | 0.1691           |
| cosine_recall@10    | 0.785            | 0.2641           |
| **cosine_ndcg@10**  | **0.503**        | **0.1402**       |
| cosine_mrr@10       | 0.414            | 0.1027           |
| cosine_map@100      | 0.4253           | 0.1195           |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### librispeech_asr

* Dataset: [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) at [71cacbf](https://huggingface.co/datasets/openslr/librispeech_asr/tree/71cacbfb7e2354c4226d01e70d77d5fca3d04ba1)
* Size: 28,539 training samples
* Columns: <code>audio</code> and <code>text</code>
* Approximate statistics based on the first 1000 samples:
  |         | audio                                                                                                 | text                                                                               |
  |:--------|:------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
  | type    | audio                                                                                                 | string                                                                             |
  | details | <ul><li>min: 1.95s</li><li>mean: 12.68s</li><li>max: 17.21s</li><li>sampling_rate: 48000 Hz</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 64.9 tokens</li><li>max: 101 tokens</li></ul> |
* Samples:
  | audio                                                                                      | text                                                                                                                                                                                                                                |
  |:-------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <audio controls src="https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/example_audio_0.wav"><code>&lt;audio 14.53s&gt;</code></audio> | <code>CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED</code>                                     |
  | <audio controls src="https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/example_audio_1.wav"><code>&lt;audio 16.09s&gt;</code></audio> | <code>MARGUERITE TO BE UNABLE TO LIVE APART FROM ME IT WAS THE DAY AFTER THE EVENING WHEN SHE CAME TO SEE ME THAT I SENT HER MANON LESCAUT FROM THAT TIME SEEING THAT I COULD NOT CHANGE MY MISTRESS'S LIFE I CHANGED MY OWN</code> |
  | <audio controls src="https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/example_audio_2.wav"><code>&lt;audio 13.29s&gt;</code></audio> | <code>I WISHED ABOVE ALL NOT TO LEAVE MYSELF TIME TO THINK OVER THE POSITION I HAD ACCEPTED FOR IN SPITE OF MYSELF IT WAS A GREAT DISTRESS TO ME THUS MY LIFE GENERALLY SO CALM</code>                                              |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim",
      "gather_across_devices": false,
      "directions": [
          "query_to_doc",
          "doc_to_query"
      ],
      "partition_mode": "per_direction",
      "hardness_mode": null,
      "hardness_strength": 0.0
  }
  ```

### Evaluation Dataset

#### librispeech_asr

* Dataset: [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) at [71cacbf](https://huggingface.co/datasets/openslr/librispeech_asr/tree/71cacbfb7e2354c4226d01e70d77d5fca3d04ba1)
* Size: 200 evaluation samples
* Columns: <code>audio</code> and <code>text</code>
* Approximate statistics based on the first 200 samples:
  |         | audio                                                                                                | text                                                                               |
  |:--------|:-----------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
  | type    | audio                                                                                                | string                                                                             |
  | details | <ul><li>min: 1.56s</li><li>mean: 6.41s</li><li>max: 24.03s</li><li>sampling_rate: 48000 Hz</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 36.31 tokens</li><li>max: 129 tokens</li></ul> |
* Samples:
  | audio                                                                             | text                                                                                                                                         |
  |:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------|
  | <audio controls src="https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav"><code>&lt;audio 6.59s&gt;</code></audio> | <code>HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE</code>               |
  | <audio controls src="https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav"><code>&lt;audio 7.14s&gt;</code></audio> | <code>HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID</code> |
  | <audio controls src="https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav"><code>&lt;audio 4.83s&gt;</code></audio> | <code>HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION</code>                                                 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim",
      "gather_across_devices": false,
      "directions": [
          "query_to_doc",
          "doc_to_query"
      ],
      "partition_mode": "per_direction",
      "hardness_mode": null,
      "hardness_strength": 0.0
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 4
- `num_train_epochs`: 5
- `learning_rate`: 2e-05
- `warmup_steps`: 0.1
- `bf16`: True
- `eval_strategy`: steps
- `per_device_eval_batch_size`: 4
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `per_device_train_batch_size`: 4
- `num_train_epochs`: 5
- `max_steps`: -1
- `learning_rate`: 2e-05
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: None
- `warmup_steps`: 0.1
- `optim`: adamw_torch_fused
- `optim_args`: None
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `optim_target_modules`: None
- `gradient_accumulation_steps`: 1
- `average_tokens_across_devices`: True
- `max_grad_norm`: 1.0
- `label_smoothing_factor`: 0.0
- `bf16`: True
- `fp16`: False
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `use_liger_kernel`: False
- `liger_kernel_config`: None
- `use_cache`: False
- `neftune_noise_alpha`: None
- `torch_empty_cache_steps`: None
- `auto_find_batch_size`: False
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `include_num_input_tokens_seen`: no
- `log_level`: passive
- `log_level_replica`: warning
- `disable_tqdm`: False
- `project`: huggingface
- `trackio_space_id`: trackio
- `eval_strategy`: steps
- `per_device_eval_batch_size`: 4
- `prediction_loss_only`: True
- `eval_on_start`: False
- `eval_do_concat_batches`: True
- `eval_use_gather_object`: False
- `eval_accumulation_steps`: None
- `include_for_metrics`: []
- `batch_eval_metrics`: False
- `save_only_model`: False
- `save_on_each_node`: False
- `enable_jit_checkpoint`: False
- `push_to_hub`: False
- `hub_private_repo`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_always_push`: False
- `hub_revision`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `restore_callback_states_from_checkpoint`: False
- `full_determinism`: False
- `seed`: 42
- `data_seed`: None
- `use_cpu`: False
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `parallelism_config`: None
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `dataloader_prefetch_factor`: None
- `remove_unused_columns`: True
- `label_names`: None
- `train_sampling_strategy`: random
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `ddp_backend`: None
- `ddp_timeout`: 1800
- `fsdp`: []
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `deepspeed`: None
- `debug`: []
- `skip_memory_metrics`: True
- `do_predict`: False
- `resume_from_checkpoint`: None
- `warmup_ratio`: None
- `local_rank`: -1
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional
- `router_mapping`: {}
- `learning_rate_mapping`: {}

</details>

### Training Logs
| Epoch  | Step  | Training Loss | Validation Loss | librispeech-eval_cosine_ndcg@10 | librispeech-test_cosine_ndcg@10 |
|:------:|:-----:|:-------------:|:---------------:|:-------------------------------:|:-------------------------------:|
| -1     | -1    | -             | -               | 0.0279                          | 0.0037                          |
| 0.1001 | 714   | 1.4538        | 1.1503          | 0.0727                          | -                               |
| 0.2001 | 1428  | 0.9953        | 0.8749          | 0.0841                          | -                               |
| 0.3002 | 2142  | 0.9557        | 0.7760          | 0.1252                          | -                               |
| 0.4003 | 2856  | 0.9621        | 2.4026          | 0.0353                          | -                               |
| 0.5004 | 3570  | 0.9721        | 0.9326          | 0.0720                          | -                               |
| 0.6004 | 4284  | 0.8931        | 0.8454          | 0.0934                          | -                               |
| 0.7005 | 4998  | 0.8368        | 0.5494          | 0.1741                          | -                               |
| 0.8006 | 5712  | 0.8001        | 0.4935          | 0.2170                          | -                               |
| 0.9006 | 6426  | 0.7817        | 0.7168          | 0.1476                          | -                               |
| 1.0007 | 7140  | 0.7235        | 0.6410          | 0.1809                          | -                               |
| 1.1008 | 7854  | 0.6620        | 0.6527          | 0.1726                          | -                               |
| 1.2008 | 8568  | 0.6492        | 0.4146          | 0.2116                          | -                               |
| 1.3009 | 9282  | 0.6342        | 0.7536          | 0.1695                          | -                               |
| 1.4010 | 9996  | 0.6438        | 0.6872          | 0.1873                          | -                               |
| 1.5011 | 10710 | 0.6103        | 0.4385          | 0.2767                          | -                               |
| 1.6011 | 11424 | 0.6052        | 0.8028          | 0.1805                          | -                               |
| 1.7012 | 12138 | 0.5950        | 0.3628          | 0.2891                          | -                               |
| 1.8013 | 12852 | 0.5672        | 0.6978          | 0.2120                          | -                               |
| 1.9013 | 13566 | 0.5611        | 0.5946          | 0.1965                          | -                               |
| 2.0014 | 14280 | 0.5546        | 0.2659          | 0.3589                          | -                               |
| 2.1015 | 14994 | 0.5133        | 0.4273          | 0.2806                          | -                               |
| 2.2015 | 15708 | 0.4588        | 0.4356          | 0.2929                          | -                               |
| 2.3016 | 16422 | 0.4629        | 0.5123          | 0.2538                          | -                               |
| 2.4017 | 17136 | 0.4429        | 0.3757          | 0.3092                          | -                               |
| 2.5018 | 17850 | 0.5000        | 0.4237          | 0.3297                          | -                               |
| 2.6018 | 18564 | 0.4328        | 0.5146          | 0.3291                          | -                               |
| 2.7019 | 19278 | 0.4284        | 0.3348          | 0.3483                          | -                               |
| 2.8020 | 19992 | 0.4598        | 0.3768          | 0.3865                          | -                               |
| 2.9020 | 20706 | 0.4183        | 0.3908          | 0.2594                          | -                               |
| 3.0021 | 21420 | 0.4180        | 0.3240          | 0.3470                          | -                               |
| 3.1022 | 22134 | 0.3624        | 0.3487          | 0.4205                          | -                               |
| 3.2022 | 22848 | 0.3627        | 0.3124          | 0.3650                          | -                               |
| 3.3023 | 23562 | 0.3651        | 0.3025          | 0.3046                          | -                               |
| 3.4024 | 24276 | 0.3644        | 0.3708          | 0.4050                          | -                               |
| 3.5025 | 24990 | 0.3480        | 0.3458          | 0.3998                          | -                               |
| 3.6025 | 25704 | 0.3542        | 0.2936          | 0.4141                          | -                               |
| 3.7026 | 26418 | 0.2954        | 0.2692          | 0.3876                          | -                               |
| 3.8027 | 27132 | 0.3336        | 0.2221          | 0.3915                          | -                               |
| 3.9027 | 27846 | 0.3255        | 0.3140          | 0.4253                          | -                               |
| 4.0028 | 28560 | 0.3093        | 0.2278          | 0.4607                          | -                               |
| 4.1029 | 29274 | 0.2715        | 0.3176          | 0.4261                          | -                               |
| 4.2029 | 29988 | 0.2812        | 0.2814          | 0.4590                          | -                               |
| 4.3030 | 30702 | 0.2690        | 0.2390          | 0.4997                          | -                               |
| 4.4031 | 31416 | 0.2697        | 0.2575          | 0.4720                          | -                               |
| 4.5032 | 32130 | 0.2616        | 0.3054          | 0.4863                          | -                               |
| 4.6032 | 32844 | 0.2437        | 0.2467          | 0.4852                          | -                               |
| 4.7033 | 33558 | 0.2532        | 0.2505          | 0.5196                          | -                               |
| 4.8034 | 34272 | 0.2640        | 0.2242          | 0.4926                          | -                               |
| 4.9034 | 34986 | 0.2245        | 0.2345          | 0.4999                          | -                               |
| -1     | -1    | -             | -               | 0.5030                          | 0.1402                          |


### Environmental Impact
Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
- **Energy Consumed**: 2.161 kWh
- **Carbon Emitted**: 0.578 kg of CO2
- **Hours Used**: 7.59 hours

### Training Hardware
- **On Cloud**: No
- **GPU Model**: 1 x NVIDIA GeForce RTX 3090
- **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
- **RAM Size**: 31.78 GB

### Framework Versions
- Python: 3.11.6
- Sentence Transformers: 5.4.0.dev0
- Transformers: 5.3.0.dev0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0.dev0
- Datasets: 4.3.0
- Tokenizers: 0.22.2

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{günther2024jinaembeddings28192token,
      title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents},
      author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
      year={2024},
      eprint={2310.19923},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2310.19923},
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->