How to use from the
Use from the
sentence-transformers library
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("seniichev/me5-wb")

sentences = [
    "That is a happy person",
    "That is a happy dog",
    "That is a very happy person",
    "Today is a sunny day"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Multilingual E5 WB

Fine-tuned version of default multilingual-e5-base for WB DS School and RAG project.

Evaluation Results

As model is used as retriever, goal was to boost its performance at cosine similarity between question and answer. With given dataset of QA pairs model performance on EmbeddingSimilarityEvaluator improved from 0.62 to 0.78.

Fine Tuning

DataLoader:

torch.utils.data.dataloader.DataLoader of length 790 with parameters:

{'batch_size': 12, 'sampler': 'torch.utils.data.sampler.SequentialSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss:

sentence_transformers.losses.ContrastiveLoss.ContrastiveLoss with parameters:

{'distance_metric': 'SiameseDistanceMetric.COSINE_DISTANCE', 'margin': 0.5, 'size_average': True}

Parameters of the fit()-Method:

{
    "epochs": 10,
    "evaluation_steps": 100,
    "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 790,
    "weight_decay": 0.01
}
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support