---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:28539
- loss:MultipleNegativesRankingLoss
base_model: laion/clap-htsat-unfused
widget:
- source_sentence: HE DECIDED TO WRITE HER CARE OF THE WEST SIDE POST OFFICE AND ASK
FOR AN EXPLANATION AS WELL AS TO HAVE HER MEET HIM
sentences:
- GRADUALLY RELIEF CAME TO ALL OF US
- IT SEEMED AS IF HIS FAMILY TROUBLES WERE JUST BEGINNING
- I EXPLAINED TO ANTONIA HOW THIS MEANT THAT HE WAS TWENTY FOUR YEARS OLD THAT HE
MUST HAVE BEEN THERE WHEN WHITE MEN FIRST CAME LEFT ON FROM BUFFALO AND INDIAN
TIMES
- source_sentence: WITHOUT A WORD PETER GOT UP AND LIT HIS LANTERN
sentences:
- AS LEADING TO THE MENTION OF OTHER INTERESTING EVENTS WE MUST SET THIS INROAD
CLEARLY BEFORE THE READER
- SHE WANTED TO MAKE SOME REFERENCE TO THEIR RELATIONS UPON THE TRAIN BUT WAS TOO
TIMID
- THE DISTINGUISHING MARK OF THE HENS WAS A CREST OF LAMENTABLY SCANTY GROWTH IN
THESE LATTER DAYS BUT SO ODDLY AND WICKEDLY ANALOGOUS TO HEPZIBAH'S TURBAN THAT
PHOEBE TO THE POIGNANT DISTRESS OF HER CONSCIENCE BUT INEVITABLY WAS LED TO FANCY
A GENERAL RESEMBLANCE BETWIXT THESE FORLORN BIPEDS AND HER RESPECTABLE RELATIVE
- source_sentence: NOTHING COULD BE MORE NATURAL THAN SUCH AN ASSEMBLY IN SUCH A PLACE
AT SUCH A PERIOD
sentences:
- BUT HE COMPROMISED BY TELLING THE BOY THAT THERE WOULD BE NO REPLY
- MANY LITTLE WRINKLES GATHERED BETWEEN HIS EYES AS HE CONTEMPLATED THIS AND HIS
BROW MOISTENED
- HE DID MANAGE TO BRING HIMSELF INTO THE MOOD TO GO OUT TO CARRIE BUT WHEN HE GOT
IN OGDEN PLACE HE THOUGHT HE SAW A MAN WATCHING HIM AND WENT AWAY
- source_sentence: DEAR SIR WE BEG TO INFORM YOU THAT WE ARE INSTRUCTED TO WAIT UNTIL
TO MORROW THURSDAY AT ONE O'CLOCK BEFORE FILING SUIT AGAINST YOU ON BEHALF OF
MISSUS JULIA HURSTWOOD FOR DIVORCE AND ALIMONY
sentences:
- THE WHITE DOUBLE ROSEBUSH HAD EVIDENTLY BEEN PROPPED UP ANEW AGAINST THE HOUSE
SINCE THE COMMENCEMENT OF THE SEASON AND A PEAR TREE AND THREE DAMSON TREES WHICH
EXCEPT A ROW OF CURRANT BUSHES CONSTITUTED THE ONLY VARIETIES OF FRUIT BORE MARKS
OF THE RECENT AMPUTATION OF SEVERAL SUPERFLUOUS OR DEFECTIVE LIMBS
- LASTLY THE ROYAL BROTHERS FELL THEMSELVES VICTIMS TO THE EPIDEMIC WHICH SO SADLY
SIGNALIZES THEIR REIGN
- IT IS LIKE A BANDAGE OVER ONE'S EYES TO COME INTO IT
- source_sentence: HERE THE HOLY PRELATE OF FERNS MET HIM AND RELATED A VISION IN
WHICH HE HAD BEEN INSTRUCTED TO DEMAND THE ABOLITION OF THE IMPOST
sentences:
- THE SHARP SMELL OF SPIRITS WENT THROUGH THE ROOM
- YES HOW MANY
- QUICKLY IT WAS COVERED WITH BRIGHT RED SPOTS I THOUGHT I HAD NEVER SEEN ANY BLOOD
SO BRIGHT
datasets:
- openslr/librispeech_asr
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
co2_eq_emissions:
emissions: 578.4000971210925
energy_consumed: 2.161257658642011
source: codecarbon
training_type: fine-tuning
on_cloud: false
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
ram_total_size: 31.777088165283203
hours_used: 7.59
hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
- name: CLAP model trained on COCO Captions
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: librispeech eval
type: librispeech-eval
metrics:
- type: cosine_accuracy@1
value: 0.245
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.52
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.645
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.785
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.245
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.1733333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.12899999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0785
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.245
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.52
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.645
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.785
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.503027364772325
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.41403968253968265
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.4252888359623941
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: librispeech test
type: librispeech-test
metrics:
- type: cosine_accuracy@1
value: 0.04885496183206107
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.1183206106870229
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.16908396946564885
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.2641221374045801
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.04885496183206107
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.03944020356234096
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.033816793893129776
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.026412213740458015
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.04885496183206107
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.1183206106870229
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.16908396946564885
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.2641221374045801
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.1402219692077291
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.10268266085059953
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.11950657997396778
name: Cosine Map@100
---
# CLAP model trained on COCO Captions
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [laion/clap-htsat-unfused](https://huggingface.co/laion/clap-htsat-unfused) on the [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [laion/clap-htsat-unfused](https://huggingface.co/laion/clap-htsat-unfused)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 512 dimensions
- **Similarity Function:** Cosine Similarity
- **Supported Modalities:** Text, Audio
- **Training Dataset:**
- [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr)
- **Language:** en
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'get_text_features', 'method_output_name': 'pooler_output'}, 'audio': {'method': 'get_audio_features', 'method_output_name': 'pooler_output'}}, 'module_output_name': 'sentence_embedding', 'message_format': 'auto', 'architecture': 'ClapModel'})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
# Run inference
inputs = [
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav',
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav',
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav',
]
embeddings = model.encode(inputs)
print(embeddings.shape)
# [3, 512]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4362, 0.6843],
# [0.4362, 1.0000, 0.2179],
# [0.6843, 0.2179, 1.0000]])
```
## Evaluation
### Metrics
#### Information Retrieval
* Datasets: `librispeech-eval` and `librispeech-test`
* Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)
| Metric | librispeech-eval | librispeech-test |
|:--------------------|:-----------------|:-----------------|
| cosine_accuracy@1 | 0.245 | 0.0489 |
| cosine_accuracy@3 | 0.52 | 0.1183 |
| cosine_accuracy@5 | 0.645 | 0.1691 |
| cosine_accuracy@10 | 0.785 | 0.2641 |
| cosine_precision@1 | 0.245 | 0.0489 |
| cosine_precision@3 | 0.1733 | 0.0394 |
| cosine_precision@5 | 0.129 | 0.0338 |
| cosine_precision@10 | 0.0785 | 0.0264 |
| cosine_recall@1 | 0.245 | 0.0489 |
| cosine_recall@3 | 0.52 | 0.1183 |
| cosine_recall@5 | 0.645 | 0.1691 |
| cosine_recall@10 | 0.785 | 0.2641 |
| **cosine_ndcg@10** | **0.503** | **0.1402** |
| cosine_mrr@10 | 0.414 | 0.1027 |
| cosine_map@100 | 0.4253 | 0.1195 |
## Training Details
### Training Dataset
#### librispeech_asr
* Dataset: [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) at [71cacbf](https://huggingface.co/datasets/openslr/librispeech_asr/tree/71cacbfb7e2354c4226d01e70d77d5fca3d04ba1)
* Size: 28,539 training samples
* Columns: audio and text
* Approximate statistics based on the first 1000 samples:
| | audio | text |
|:--------|:------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | audio | string |
| details |
CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED |
| | MARGUERITE TO BE UNABLE TO LIVE APART FROM ME IT WAS THE DAY AFTER THE EVENING WHEN SHE CAME TO SEE ME THAT I SENT HER MANON LESCAUT FROM THAT TIME SEEING THAT I COULD NOT CHANGE MY MISTRESS'S LIFE I CHANGED MY OWN |
| | I WISHED ABOVE ALL NOT TO LEAVE MYSELF TIME TO THINK OVER THE POSITION I HAD ACCEPTED FOR IN SPITE OF MYSELF IT WAS A GREAT DISTRESS TO ME THUS MY LIFE GENERALLY SO CALM |
* Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc",
"doc_to_query"
],
"partition_mode": "per_direction",
"hardness_mode": null,
"hardness_strength": 0.0
}
```
### Evaluation Dataset
#### librispeech_asr
* Dataset: [librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) at [71cacbf](https://huggingface.co/datasets/openslr/librispeech_asr/tree/71cacbfb7e2354c4226d01e70d77d5fca3d04ba1)
* Size: 200 evaluation samples
* Columns: audio and text
* Approximate statistics based on the first 200 samples:
| | audio | text |
|:--------|:-----------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | audio | string |
| details | HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE |
| | HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID |
| | HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION |
* Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc",
"doc_to_query"
],
"partition_mode": "per_direction",
"hardness_mode": null,
"hardness_strength": 0.0
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 4
- `num_train_epochs`: 5
- `learning_rate`: 2e-05
- `warmup_steps`: 0.1
- `bf16`: True
- `eval_strategy`: steps
- `per_device_eval_batch_size`: 4
- `batch_sampler`: no_duplicates
#### All Hyperparameters