Add new SentenceTransformer model

4c65190 verified 3 months ago

30.7 kB

language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:28539
  - loss:MultipleNegativesRankingLoss
base_model: laion/clap-htsat-unfused
widget:
  - source_sentence: >-
      HE DECIDED TO WRITE HER CARE OF THE WEST SIDE POST OFFICE AND ASK FOR AN
      EXPLANATION AS WELL AS TO HAVE HER MEET HIM
    sentences:
      - GRADUALLY RELIEF CAME TO ALL OF US
      - IT SEEMED AS IF HIS FAMILY TROUBLES WERE JUST BEGINNING
      - >-
        I EXPLAINED TO ANTONIA HOW THIS MEANT THAT HE WAS TWENTY FOUR YEARS OLD
        THAT HE MUST HAVE BEEN THERE WHEN WHITE MEN FIRST CAME LEFT ON FROM
        BUFFALO AND INDIAN TIMES
  - source_sentence: WITHOUT A WORD PETER GOT UP AND LIT HIS LANTERN
    sentences:
      - >-
        AS LEADING TO THE MENTION OF OTHER INTERESTING EVENTS WE MUST SET THIS
        INROAD CLEARLY BEFORE THE READER
      - >-
        SHE WANTED TO MAKE SOME REFERENCE TO THEIR RELATIONS UPON THE TRAIN BUT
        WAS TOO TIMID
      - >-
        THE DISTINGUISHING MARK OF THE HENS WAS A CREST OF LAMENTABLY SCANTY
        GROWTH IN THESE LATTER DAYS BUT SO ODDLY AND WICKEDLY ANALOGOUS TO
        HEPZIBAH'S TURBAN THAT PHOEBE TO THE POIGNANT DISTRESS OF HER CONSCIENCE
        BUT INEVITABLY WAS LED TO FANCY A GENERAL RESEMBLANCE BETWIXT THESE
        FORLORN BIPEDS AND HER RESPECTABLE RELATIVE
  - source_sentence: >-
      NOTHING COULD BE MORE NATURAL THAN SUCH AN ASSEMBLY IN SUCH A PLACE AT
      SUCH A PERIOD
    sentences:
      - BUT HE COMPROMISED BY TELLING THE BOY THAT THERE WOULD BE NO REPLY
      - >-
        MANY LITTLE WRINKLES GATHERED BETWEEN HIS EYES AS HE CONTEMPLATED THIS
        AND HIS BROW MOISTENED
      - >-
        HE DID MANAGE TO BRING HIMSELF INTO THE MOOD TO GO OUT TO CARRIE BUT
        WHEN HE GOT IN OGDEN PLACE HE THOUGHT HE SAW A MAN WATCHING HIM AND WENT
        AWAY
  - source_sentence: >-
      DEAR SIR WE BEG TO INFORM YOU THAT WE ARE INSTRUCTED TO WAIT UNTIL TO
      MORROW THURSDAY AT ONE O'CLOCK BEFORE FILING SUIT AGAINST YOU ON BEHALF OF
      MISSUS JULIA HURSTWOOD FOR DIVORCE AND ALIMONY
    sentences:
      - >-
        THE WHITE DOUBLE ROSEBUSH HAD EVIDENTLY BEEN PROPPED UP ANEW AGAINST THE
        HOUSE SINCE THE COMMENCEMENT OF THE SEASON AND A PEAR TREE AND THREE
        DAMSON TREES WHICH EXCEPT A ROW OF CURRANT BUSHES CONSTITUTED THE ONLY
        VARIETIES OF FRUIT BORE MARKS OF THE RECENT AMPUTATION OF SEVERAL
        SUPERFLUOUS OR DEFECTIVE LIMBS
      - >-
        LASTLY THE ROYAL BROTHERS FELL THEMSELVES VICTIMS TO THE EPIDEMIC WHICH
        SO SADLY SIGNALIZES THEIR REIGN
      - IT IS LIKE A BANDAGE OVER ONE'S EYES TO COME INTO IT
  - source_sentence: >-
      HERE THE HOLY PRELATE OF FERNS MET HIM AND RELATED A VISION IN WHICH HE
      HAD BEEN INSTRUCTED TO DEMAND THE ABOLITION OF THE IMPOST
    sentences:
      - THE SHARP SMELL OF SPIRITS WENT THROUGH THE ROOM
      - YES HOW MANY
      - >-
        QUICKLY IT WAS COVERED WITH BRIGHT RED SPOTS I THOUGHT I HAD NEVER SEEN
        ANY BLOOD SO BRIGHT
datasets:
  - openslr/librispeech_asr
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
co2_eq_emissions:
  emissions: 578.4000971210925
  energy_consumed: 2.161257658642011
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 7.59
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: CLAP model trained on COCO Captions
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: librispeech eval
          type: librispeech-eval
        metrics:
          - type: cosine_accuracy@1
            value: 0.245
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.52
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.645
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.785
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.245
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.1733333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.12899999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0785
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.245
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.52
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.645
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.785
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.503027364772325
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.41403968253968265
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4252888359623941
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: librispeech test
          type: librispeech-test
        metrics:
          - type: cosine_accuracy@1
            value: 0.04885496183206107
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.1183206106870229
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.16908396946564885
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.2641221374045801
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.04885496183206107
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.03944020356234096
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.033816793893129776
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.026412213740458015
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.04885496183206107
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.1183206106870229
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.16908396946564885
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.2641221374045801
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.1402219692077291
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.10268266085059953
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.11950657997396778
            name: Cosine Map@100

CLAP model trained on COCO Captions

This is a sentence-transformers model finetuned from laion/clap-htsat-unfused on the librispeech_asr dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: laion/clap-htsat-unfused
Maximum Sequence Length: 512 tokens
Output Dimensionality: 512 dimensions
Similarity Function: Cosine Similarity
Supported Modalities: Text, Audio
Training Dataset:
- librispeech_asr
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'get_text_features', 'method_output_name': 'pooler_output'}, 'audio': {'method': 'get_audio_features', 'method_output_name': 'pooler_output'}}, 'module_output_name': 'sentence_embedding', 'message_format': 'auto', 'architecture': 'ClapModel'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
# Run inference
inputs = [
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav',
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav',
    'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav',
]
embeddings = model.encode(inputs)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4362, 0.6843],
#         [0.4362, 1.0000, 0.2179],
#         [0.6843, 0.2179, 1.0000]])

Evaluation

Metrics

Information Retrieval

Datasets: librispeech-eval and librispeech-test
Evaluated with InformationRetrievalEvaluator

Metric	librispeech-eval	librispeech-test
cosine_accuracy@1	0.245	0.0489
cosine_accuracy@3	0.52	0.1183
cosine_accuracy@5	0.645	0.1691
cosine_accuracy@10	0.785	0.2641
cosine_precision@1	0.245	0.0489
cosine_precision@3	0.1733	0.0394
cosine_precision@5	0.129	0.0338
cosine_precision@10	0.0785	0.0264
cosine_recall@1	0.245	0.0489
cosine_recall@3	0.52	0.1183
cosine_recall@5	0.645	0.1691
cosine_recall@10	0.785	0.2641
cosine_ndcg@10	0.503	0.1402
cosine_mrr@10	0.414	0.1027
cosine_map@100	0.4253	0.1195

Training Details

Training Dataset

librispeech_asr

Dataset: librispeech_asr at 71cacbf
Size: 28,539 training samples
Columns: audio and text
Approximate statistics based on the first 1000 samples:
audio text
type audio string
details
min: 1.95s
mean: 12.68s
max: 17.21s
sampling_rate: 48000 Hz

min: 10 tokens
mean: 64.9 tokens
max: 101 tokens

	audio	text
type	audio	string
details	min: 1.95s mean: 12.68s max: 17.21s sampling_rate: 48000 Hz	min: 10 tokens mean: 64.9 tokens max: 101 tokens

Samples:

audio	text
	`CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED`
	`MARGUERITE TO BE UNABLE TO LIVE APART FROM ME IT WAS THE DAY AFTER THE EVENING WHEN SHE CAME TO SEE ME THAT I SENT HER MANON LESCAUT FROM THAT TIME SEEING THAT I COULD NOT CHANGE MY MISTRESS'S LIFE I CHANGED MY OWN`
	`I WISHED ABOVE ALL NOT TO LEAVE MYSELF TIME TO THINK OVER THE POSITION I HAD ACCEPTED FOR IN SPITE OF MYSELF IT WAS A GREAT DISTRESS TO ME THUS MY LIFE GENERALLY SO CALM`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false,
    "directions": [
        "query_to_doc",
        "doc_to_query"
    ],
    "partition_mode": "per_direction",
    "hardness_mode": null,
    "hardness_strength": 0.0
}

Evaluation Dataset

librispeech_asr

Dataset: librispeech_asr at 71cacbf
Size: 200 evaluation samples
Columns: audio and text
Approximate statistics based on the first 200 samples:
audio text
type audio string
details
min: 1.56s
mean: 6.41s
max: 24.03s
sampling_rate: 48000 Hz

min: 6 tokens
mean: 36.31 tokens
max: 129 tokens

	audio	text
type	audio	string
details	min: 1.56s mean: 6.41s max: 24.03s sampling_rate: 48000 Hz	min: 6 tokens mean: 36.31 tokens max: 129 tokens

Samples:

audio	text
	`HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE`
	`HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID`
	`HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false,
    "directions": [
        "query_to_doc",
        "doc_to_query"
    ],
    "partition_mode": "per_direction",
    "hardness_mode": null,
    "hardness_strength": 0.0
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 4
num_train_epochs: 5
learning_rate: 2e-05
warmup_steps: 0.1
bf16: True
eval_strategy: steps
per_device_eval_batch_size: 4
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

per_device_train_batch_size: 4
num_train_epochs: 5
max_steps: -1
learning_rate: 2e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.1
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: steps
per_device_eval_batch_size: 4
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: False
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	librispeech-eval_cosine_ndcg@10	librispeech-test_cosine_ndcg@10
-1	-1	-	-	0.0279	0.0037
0.1001	714	1.4538	1.1503	0.0727	-
0.2001	1428	0.9953	0.8749	0.0841	-
0.3002	2142	0.9557	0.7760	0.1252	-
0.4003	2856	0.9621	2.4026	0.0353	-
0.5004	3570	0.9721	0.9326	0.0720	-
0.6004	4284	0.8931	0.8454	0.0934	-
0.7005	4998	0.8368	0.5494	0.1741	-
0.8006	5712	0.8001	0.4935	0.2170	-
0.9006	6426	0.7817	0.7168	0.1476	-
1.0007	7140	0.7235	0.6410	0.1809	-
1.1008	7854	0.6620	0.6527	0.1726	-
1.2008	8568	0.6492	0.4146	0.2116	-
1.3009	9282	0.6342	0.7536	0.1695	-
1.4010	9996	0.6438	0.6872	0.1873	-
1.5011	10710	0.6103	0.4385	0.2767	-
1.6011	11424	0.6052	0.8028	0.1805	-
1.7012	12138	0.5950	0.3628	0.2891	-
1.8013	12852	0.5672	0.6978	0.2120	-
1.9013	13566	0.5611	0.5946	0.1965	-
2.0014	14280	0.5546	0.2659	0.3589	-
2.1015	14994	0.5133	0.4273	0.2806	-
2.2015	15708	0.4588	0.4356	0.2929	-
2.3016	16422	0.4629	0.5123	0.2538	-
2.4017	17136	0.4429	0.3757	0.3092	-
2.5018	17850	0.5000	0.4237	0.3297	-
2.6018	18564	0.4328	0.5146	0.3291	-
2.7019	19278	0.4284	0.3348	0.3483	-
2.8020	19992	0.4598	0.3768	0.3865	-
2.9020	20706	0.4183	0.3908	0.2594	-
3.0021	21420	0.4180	0.3240	0.3470	-
3.1022	22134	0.3624	0.3487	0.4205	-
3.2022	22848	0.3627	0.3124	0.3650	-
3.3023	23562	0.3651	0.3025	0.3046	-
3.4024	24276	0.3644	0.3708	0.4050	-
3.5025	24990	0.3480	0.3458	0.3998	-
3.6025	25704	0.3542	0.2936	0.4141	-
3.7026	26418	0.2954	0.2692	0.3876	-
3.8027	27132	0.3336	0.2221	0.3915	-
3.9027	27846	0.3255	0.3140	0.4253	-
4.0028	28560	0.3093	0.2278	0.4607	-
4.1029	29274	0.2715	0.3176	0.4261	-
4.2029	29988	0.2812	0.2814	0.4590	-
4.3030	30702	0.2690	0.2390	0.4997	-
4.4031	31416	0.2697	0.2575	0.4720	-
4.5032	32130	0.2616	0.3054	0.4863	-
4.6032	32844	0.2437	0.2467	0.4852	-
4.7033	33558	0.2532	0.2505	0.5196	-
4.8034	34272	0.2640	0.2242	0.4926	-
4.9034	34986	0.2245	0.2345	0.4999	-
-1	-1	-	-	0.5030	0.1402

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Energy Consumed: 2.161 kWh
Carbon Emitted: 0.578 kg of CO2
Hours Used: 7.59 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.11.6
Sentence Transformers: 5.4.0.dev0
Transformers: 5.3.0.dev0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0.dev0
Datasets: 4.3.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{günther2024jinaembeddings28192token,
      title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents},
      author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
      year={2024},
      eprint={2310.19923},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2310.19923},
}