openslr/librispeech_asr
Viewer • Updated • 585k • 111k • 228
How to use tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
sentences = [
"HE DECIDED TO WRITE HER CARE OF THE WEST SIDE POST OFFICE AND ASK FOR AN EXPLANATION AS WELL AS TO HAVE HER MEET HIM",
"GRADUALLY RELIEF CAME TO ALL OF US",
"IT SEEMED AS IF HIS FAMILY TROUBLES WERE JUST BEGINNING",
"I EXPLAINED TO ANTONIA HOW THIS MEANT THAT HE WAS TWENTY FOUR YEARS OLD THAT HE MUST HAVE BEEN THERE WHEN WHITE MEN FIRST CAME LEFT ON FROM BUFFALO AND INDIAN TIMES"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from laion/clap-htsat-unfused on the librispeech_asr dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'get_text_features', 'method_output_name': 'pooler_output'}, 'audio': {'method': 'get_audio_features', 'method_output_name': 'pooler_output'}}, 'module_output_name': 'sentence_embedding', 'message_format': 'auto', 'architecture': 'ClapModel'})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
# Run inference
inputs = [
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav',
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav',
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav',
]
embeddings = model.encode(inputs)
print(embeddings.shape)
# [3, 512]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4362, 0.6843],
# [0.4362, 1.0000, 0.2179],
# [0.6843, 0.2179, 1.0000]])
librispeech-eval and librispeech-testInformationRetrievalEvaluator| Metric | librispeech-eval | librispeech-test |
|---|---|---|
| cosine_accuracy@1 | 0.245 | 0.0489 |
| cosine_accuracy@3 | 0.52 | 0.1183 |
| cosine_accuracy@5 | 0.645 | 0.1691 |
| cosine_accuracy@10 | 0.785 | 0.2641 |
| cosine_precision@1 | 0.245 | 0.0489 |
| cosine_precision@3 | 0.1733 | 0.0394 |
| cosine_precision@5 | 0.129 | 0.0338 |
| cosine_precision@10 | 0.0785 | 0.0264 |
| cosine_recall@1 | 0.245 | 0.0489 |
| cosine_recall@3 | 0.52 | 0.1183 |
| cosine_recall@5 | 0.645 | 0.1691 |
| cosine_recall@10 | 0.785 | 0.2641 |
| cosine_ndcg@10 | 0.503 | 0.1402 |
| cosine_mrr@10 | 0.414 | 0.1027 |
| cosine_map@100 | 0.4253 | 0.1195 |
audio and text| audio | text | |
|---|---|---|
| type | audio | string |
| details |
|
|
| audio | text |
|---|---|
CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED |
|
MARGUERITE TO BE UNABLE TO LIVE APART FROM ME IT WAS THE DAY AFTER THE EVENING WHEN SHE CAME TO SEE ME THAT I SENT HER MANON LESCAUT FROM THAT TIME SEEING THAT I COULD NOT CHANGE MY MISTRESS'S LIFE I CHANGED MY OWN |
|
I WISHED ABOVE ALL NOT TO LEAVE MYSELF TIME TO THINK OVER THE POSITION I HAD ACCEPTED FOR IN SPITE OF MYSELF IT WAS A GREAT DISTRESS TO ME THUS MY LIFE GENERALLY SO CALM |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc",
"doc_to_query"
],
"partition_mode": "per_direction",
"hardness_mode": null,
"hardness_strength": 0.0
}
audio and text| audio | text | |
|---|---|---|
| type | audio | string |
| details |
|
|
| audio | text |
|---|---|
HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE |
|
HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID |
|
HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc",
"doc_to_query"
],
"partition_mode": "per_direction",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 4num_train_epochs: 5learning_rate: 2e-05warmup_steps: 0.1bf16: Trueeval_strategy: stepsper_device_eval_batch_size: 4batch_sampler: no_duplicatesper_device_train_batch_size: 4num_train_epochs: 5max_steps: -1learning_rate: 2e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: stepsper_device_eval_batch_size: 4prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | librispeech-eval_cosine_ndcg@10 | librispeech-test_cosine_ndcg@10 |
|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.0279 | 0.0037 |
| 0.1001 | 714 | 1.4538 | 1.1503 | 0.0727 | - |
| 0.2001 | 1428 | 0.9953 | 0.8749 | 0.0841 | - |
| 0.3002 | 2142 | 0.9557 | 0.7760 | 0.1252 | - |
| 0.4003 | 2856 | 0.9621 | 2.4026 | 0.0353 | - |
| 0.5004 | 3570 | 0.9721 | 0.9326 | 0.0720 | - |
| 0.6004 | 4284 | 0.8931 | 0.8454 | 0.0934 | - |
| 0.7005 | 4998 | 0.8368 | 0.5494 | 0.1741 | - |
| 0.8006 | 5712 | 0.8001 | 0.4935 | 0.2170 | - |
| 0.9006 | 6426 | 0.7817 | 0.7168 | 0.1476 | - |
| 1.0007 | 7140 | 0.7235 | 0.6410 | 0.1809 | - |
| 1.1008 | 7854 | 0.6620 | 0.6527 | 0.1726 | - |
| 1.2008 | 8568 | 0.6492 | 0.4146 | 0.2116 | - |
| 1.3009 | 9282 | 0.6342 | 0.7536 | 0.1695 | - |
| 1.4010 | 9996 | 0.6438 | 0.6872 | 0.1873 | - |
| 1.5011 | 10710 | 0.6103 | 0.4385 | 0.2767 | - |
| 1.6011 | 11424 | 0.6052 | 0.8028 | 0.1805 | - |
| 1.7012 | 12138 | 0.5950 | 0.3628 | 0.2891 | - |
| 1.8013 | 12852 | 0.5672 | 0.6978 | 0.2120 | - |
| 1.9013 | 13566 | 0.5611 | 0.5946 | 0.1965 | - |
| 2.0014 | 14280 | 0.5546 | 0.2659 | 0.3589 | - |
| 2.1015 | 14994 | 0.5133 | 0.4273 | 0.2806 | - |
| 2.2015 | 15708 | 0.4588 | 0.4356 | 0.2929 | - |
| 2.3016 | 16422 | 0.4629 | 0.5123 | 0.2538 | - |
| 2.4017 | 17136 | 0.4429 | 0.3757 | 0.3092 | - |
| 2.5018 | 17850 | 0.5000 | 0.4237 | 0.3297 | - |
| 2.6018 | 18564 | 0.4328 | 0.5146 | 0.3291 | - |
| 2.7019 | 19278 | 0.4284 | 0.3348 | 0.3483 | - |
| 2.8020 | 19992 | 0.4598 | 0.3768 | 0.3865 | - |
| 2.9020 | 20706 | 0.4183 | 0.3908 | 0.2594 | - |
| 3.0021 | 21420 | 0.4180 | 0.3240 | 0.3470 | - |
| 3.1022 | 22134 | 0.3624 | 0.3487 | 0.4205 | - |
| 3.2022 | 22848 | 0.3627 | 0.3124 | 0.3650 | - |
| 3.3023 | 23562 | 0.3651 | 0.3025 | 0.3046 | - |
| 3.4024 | 24276 | 0.3644 | 0.3708 | 0.4050 | - |
| 3.5025 | 24990 | 0.3480 | 0.3458 | 0.3998 | - |
| 3.6025 | 25704 | 0.3542 | 0.2936 | 0.4141 | - |
| 3.7026 | 26418 | 0.2954 | 0.2692 | 0.3876 | - |
| 3.8027 | 27132 | 0.3336 | 0.2221 | 0.3915 | - |
| 3.9027 | 27846 | 0.3255 | 0.3140 | 0.4253 | - |
| 4.0028 | 28560 | 0.3093 | 0.2278 | 0.4607 | - |
| 4.1029 | 29274 | 0.2715 | 0.3176 | 0.4261 | - |
| 4.2029 | 29988 | 0.2812 | 0.2814 | 0.4590 | - |
| 4.3030 | 30702 | 0.2690 | 0.2390 | 0.4997 | - |
| 4.4031 | 31416 | 0.2697 | 0.2575 | 0.4720 | - |
| 4.5032 | 32130 | 0.2616 | 0.3054 | 0.4863 | - |
| 4.6032 | 32844 | 0.2437 | 0.2467 | 0.4852 | - |
| 4.7033 | 33558 | 0.2532 | 0.2505 | 0.5196 | - |
| 4.8034 | 34272 | 0.2640 | 0.2242 | 0.4926 | - |
| 4.9034 | 34986 | 0.2245 | 0.2345 | 0.4999 | - |
| -1 | -1 | - | - | 0.5030 | 0.1402 |
Carbon emissions were measured using CodeCarbon.
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{günther2024jinaembeddings28192token,
title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents},
author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
year={2024},
eprint={2310.19923},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2310.19923},
}
Base model
laion/clap-htsat-unfused