tomaarsen's picture
tomaarsen HF Staff
Add new SentenceTransformer model
4c65190 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:28539
  - loss:MultipleNegativesRankingLoss
base_model: laion/clap-htsat-unfused
widget:
  - source_sentence: >-
      HE DECIDED TO WRITE HER CARE OF THE WEST SIDE POST OFFICE AND ASK FOR AN
      EXPLANATION AS WELL AS TO HAVE HER MEET HIM
    sentences:
      - GRADUALLY RELIEF CAME TO ALL OF US
      - IT SEEMED AS IF HIS FAMILY TROUBLES WERE JUST BEGINNING
      - >-
        I EXPLAINED TO ANTONIA HOW THIS MEANT THAT HE WAS TWENTY FOUR YEARS OLD
        THAT HE MUST HAVE BEEN THERE WHEN WHITE MEN FIRST CAME LEFT ON FROM
        BUFFALO AND INDIAN TIMES
  - source_sentence: WITHOUT A WORD PETER GOT UP AND LIT HIS LANTERN
    sentences:
      - >-
        AS LEADING TO THE MENTION OF OTHER INTERESTING EVENTS WE MUST SET THIS
        INROAD CLEARLY BEFORE THE READER
      - >-
        SHE WANTED TO MAKE SOME REFERENCE TO THEIR RELATIONS UPON THE TRAIN BUT
        WAS TOO TIMID
      - >-
        THE DISTINGUISHING MARK OF THE HENS WAS A CREST OF LAMENTABLY SCANTY
        GROWTH IN THESE LATTER DAYS BUT SO ODDLY AND WICKEDLY ANALOGOUS TO
        HEPZIBAH'S TURBAN THAT PHOEBE TO THE POIGNANT DISTRESS OF HER CONSCIENCE
        BUT INEVITABLY WAS LED TO FANCY A GENERAL RESEMBLANCE BETWIXT THESE
        FORLORN BIPEDS AND HER RESPECTABLE RELATIVE
  - source_sentence: >-
      NOTHING COULD BE MORE NATURAL THAN SUCH AN ASSEMBLY IN SUCH A PLACE AT
      SUCH A PERIOD
    sentences:
      - BUT HE COMPROMISED BY TELLING THE BOY THAT THERE WOULD BE NO REPLY
      - >-
        MANY LITTLE WRINKLES GATHERED BETWEEN HIS EYES AS HE CONTEMPLATED THIS
        AND HIS BROW MOISTENED
      - >-
        HE DID MANAGE TO BRING HIMSELF INTO THE MOOD TO GO OUT TO CARRIE BUT
        WHEN HE GOT IN OGDEN PLACE HE THOUGHT HE SAW A MAN WATCHING HIM AND WENT
        AWAY
  - source_sentence: >-
      DEAR SIR WE BEG TO INFORM YOU THAT WE ARE INSTRUCTED TO WAIT UNTIL TO
      MORROW THURSDAY AT ONE O'CLOCK BEFORE FILING SUIT AGAINST YOU ON BEHALF OF
      MISSUS JULIA HURSTWOOD FOR DIVORCE AND ALIMONY
    sentences:
      - >-
        THE WHITE DOUBLE ROSEBUSH HAD EVIDENTLY BEEN PROPPED UP ANEW AGAINST THE
        HOUSE SINCE THE COMMENCEMENT OF THE SEASON AND A PEAR TREE AND THREE
        DAMSON TREES WHICH EXCEPT A ROW OF CURRANT BUSHES CONSTITUTED THE ONLY
        VARIETIES OF FRUIT BORE MARKS OF THE RECENT AMPUTATION OF SEVERAL
        SUPERFLUOUS OR DEFECTIVE LIMBS
      - >-
        LASTLY THE ROYAL BROTHERS FELL THEMSELVES VICTIMS TO THE EPIDEMIC WHICH
        SO SADLY SIGNALIZES THEIR REIGN
      - IT IS LIKE A BANDAGE OVER ONE'S EYES TO COME INTO IT
  - source_sentence: >-
      HERE THE HOLY PRELATE OF FERNS MET HIM AND RELATED A VISION IN WHICH HE
      HAD BEEN INSTRUCTED TO DEMAND THE ABOLITION OF THE IMPOST
    sentences:
      - THE SHARP SMELL OF SPIRITS WENT THROUGH THE ROOM
      - YES HOW MANY
      - >-
        QUICKLY IT WAS COVERED WITH BRIGHT RED SPOTS I THOUGHT I HAD NEVER SEEN
        ANY BLOOD SO BRIGHT
datasets:
  - openslr/librispeech_asr
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
co2_eq_emissions:
  emissions: 578.4000971210925
  energy_consumed: 2.161257658642011
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 7.59
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: CLAP model trained on COCO Captions
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: librispeech eval
          type: librispeech-eval
        metrics:
          - type: cosine_accuracy@1
            value: 0.245
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.52
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.645
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.785
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.245
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.1733333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.12899999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0785
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.245
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.52
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.645
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.785
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.503027364772325
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.41403968253968265
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4252888359623941
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: librispeech test
          type: librispeech-test
        metrics:
          - type: cosine_accuracy@1
            value: 0.04885496183206107
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.1183206106870229
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.16908396946564885
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.2641221374045801
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.04885496183206107
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.03944020356234096
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.033816793893129776
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.026412213740458015
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.04885496183206107
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.1183206106870229
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.16908396946564885
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.2641221374045801
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.1402219692077291
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.10268266085059953
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.11950657997396778
            name: Cosine Map@100

CLAP model trained on COCO Captions

This is a sentence-transformers model finetuned from laion/clap-htsat-unfused on the librispeech_asr dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: laion/clap-htsat-unfused
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 512 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modalities: Text, Audio
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'get_text_features', 'method_output_name': 'pooler_output'}, 'audio': {'method': 'get_audio_features', 'method_output_name': 'pooler_output'}}, 'module_output_name': 'sentence_embedding', 'message_format': 'auto', 'architecture': 'ClapModel'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
# Run inference
inputs = [
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav',
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav',
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav',
]
embeddings = model.encode(inputs)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4362, 0.6843],
#         [0.4362, 1.0000, 0.2179],
#         [0.6843, 0.2179, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric librispeech-eval librispeech-test
cosine_accuracy@1 0.245 0.0489
cosine_accuracy@3 0.52 0.1183
cosine_accuracy@5 0.645 0.1691
cosine_accuracy@10 0.785 0.2641
cosine_precision@1 0.245 0.0489
cosine_precision@3 0.1733 0.0394
cosine_precision@5 0.129 0.0338
cosine_precision@10 0.0785 0.0264
cosine_recall@1 0.245 0.0489
cosine_recall@3 0.52 0.1183
cosine_recall@5 0.645 0.1691
cosine_recall@10 0.785 0.2641
cosine_ndcg@10 0.503 0.1402
cosine_mrr@10 0.414 0.1027
cosine_map@100 0.4253 0.1195

Training Details

Training Dataset

librispeech_asr

  • Dataset: librispeech_asr at 71cacbf
  • Size: 28,539 training samples
  • Columns: audio and text
  • Approximate statistics based on the first 1000 samples:
    audio text
    type audio string
    details
    • min: 1.95s
    • mean: 12.68s
    • max: 17.21s
    • sampling_rate: 48000 Hz
    • min: 10 tokens
    • mean: 64.9 tokens
    • max: 101 tokens
  • Samples:
    audio text
    CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED
    MARGUERITE TO BE UNABLE TO LIVE APART FROM ME IT WAS THE DAY AFTER THE EVENING WHEN SHE CAME TO SEE ME THAT I SENT HER MANON LESCAUT FROM THAT TIME SEEING THAT I COULD NOT CHANGE MY MISTRESS'S LIFE I CHANGED MY OWN
    I WISHED ABOVE ALL NOT TO LEAVE MYSELF TIME TO THINK OVER THE POSITION I HAD ACCEPTED FOR IN SPITE OF MYSELF IT WAS A GREAT DISTRESS TO ME THUS MY LIFE GENERALLY SO CALM
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc",
            "doc_to_query"
        ],
        "partition_mode": "per_direction",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Evaluation Dataset

librispeech_asr

  • Dataset: librispeech_asr at 71cacbf
  • Size: 200 evaluation samples
  • Columns: audio and text
  • Approximate statistics based on the first 200 samples:
    audio text
    type audio string
    details
    • min: 1.56s
    • mean: 6.41s
    • max: 24.03s
    • sampling_rate: 48000 Hz
    • min: 6 tokens
    • mean: 36.31 tokens
    • max: 129 tokens
  • Samples:
    audio text
    HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE
    HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID
    HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc",
            "doc_to_query"
        ],
        "partition_mode": "per_direction",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • num_train_epochs: 5
  • learning_rate: 2e-05
  • warmup_steps: 0.1
  • bf16: True
  • eval_strategy: steps
  • per_device_eval_batch_size: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 4
  • num_train_epochs: 5
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 4
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss librispeech-eval_cosine_ndcg@10 librispeech-test_cosine_ndcg@10
-1 -1 - - 0.0279 0.0037
0.1001 714 1.4538 1.1503 0.0727 -
0.2001 1428 0.9953 0.8749 0.0841 -
0.3002 2142 0.9557 0.7760 0.1252 -
0.4003 2856 0.9621 2.4026 0.0353 -
0.5004 3570 0.9721 0.9326 0.0720 -
0.6004 4284 0.8931 0.8454 0.0934 -
0.7005 4998 0.8368 0.5494 0.1741 -
0.8006 5712 0.8001 0.4935 0.2170 -
0.9006 6426 0.7817 0.7168 0.1476 -
1.0007 7140 0.7235 0.6410 0.1809 -
1.1008 7854 0.6620 0.6527 0.1726 -
1.2008 8568 0.6492 0.4146 0.2116 -
1.3009 9282 0.6342 0.7536 0.1695 -
1.4010 9996 0.6438 0.6872 0.1873 -
1.5011 10710 0.6103 0.4385 0.2767 -
1.6011 11424 0.6052 0.8028 0.1805 -
1.7012 12138 0.5950 0.3628 0.2891 -
1.8013 12852 0.5672 0.6978 0.2120 -
1.9013 13566 0.5611 0.5946 0.1965 -
2.0014 14280 0.5546 0.2659 0.3589 -
2.1015 14994 0.5133 0.4273 0.2806 -
2.2015 15708 0.4588 0.4356 0.2929 -
2.3016 16422 0.4629 0.5123 0.2538 -
2.4017 17136 0.4429 0.3757 0.3092 -
2.5018 17850 0.5000 0.4237 0.3297 -
2.6018 18564 0.4328 0.5146 0.3291 -
2.7019 19278 0.4284 0.3348 0.3483 -
2.8020 19992 0.4598 0.3768 0.3865 -
2.9020 20706 0.4183 0.3908 0.2594 -
3.0021 21420 0.4180 0.3240 0.3470 -
3.1022 22134 0.3624 0.3487 0.4205 -
3.2022 22848 0.3627 0.3124 0.3650 -
3.3023 23562 0.3651 0.3025 0.3046 -
3.4024 24276 0.3644 0.3708 0.4050 -
3.5025 24990 0.3480 0.3458 0.3998 -
3.6025 25704 0.3542 0.2936 0.4141 -
3.7026 26418 0.2954 0.2692 0.3876 -
3.8027 27132 0.3336 0.2221 0.3915 -
3.9027 27846 0.3255 0.3140 0.4253 -
4.0028 28560 0.3093 0.2278 0.4607 -
4.1029 29274 0.2715 0.3176 0.4261 -
4.2029 29988 0.2812 0.2814 0.4590 -
4.3030 30702 0.2690 0.2390 0.4997 -
4.4031 31416 0.2697 0.2575 0.4720 -
4.5032 32130 0.2616 0.3054 0.4863 -
4.6032 32844 0.2437 0.2467 0.4852 -
4.7033 33558 0.2532 0.2505 0.5196 -
4.8034 34272 0.2640 0.2242 0.4926 -
4.9034 34986 0.2245 0.2345 0.4999 -
-1 -1 - - 0.5030 0.1402

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 2.161 kWh
  • Carbon Emitted: 0.578 kg of CO2
  • Hours Used: 7.59 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.4.0.dev0
  • Transformers: 5.3.0.dev0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0.dev0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{günther2024jinaembeddings28192token,
      title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents},
      author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
      year={2024},
      eprint={2310.19923},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2310.19923},
}