Sentence Similarity
sentence-transformers
PyTorch
Safetensors
Portuguese
xlm-roberta
text-embeddings-inference
Instructions to use vabatista/sbert-mpnet-base-bm25-hard-neg-pt-br with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use vabatista/sbert-mpnet-base-bm25-hard-neg-pt-br with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("vabatista/sbert-mpnet-base-bm25-hard-neg-pt-br") sentences = [ "A quem a Virgem Maria supostamente apareceu em 1858 em Lourdes, França?", "É uma réplica da gruta de Lourdes, na França, onde a Virgem Maria apareceu para Santa Bernadette Soubirous em 1858.", "No topo da cúpula de ouro do edifício principal está uma estátua de ouro da Virgem Maria." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
This similarity model was trained for 2 epochs based on the sentence-transformers/paraphrase-multilingual-mpnet-base-v2. We used the SQuAD 1.1 dataset (brazilian portuguese version) to compare similarity between questions and sentences containing the answer to the question. We employed the MultipleNegativesRankingLoss as the objective function. To generate negative examples, our strategy involved using BM25 to retrieve similar examples from all sentences in the dataset that did not contain the answer.
We tested this model using Faquad QA portuguese dataset and improved dense retrieval by 10% in MRR@10 compared to the base model.
- Downloads last month
- 7
from sentence_transformers import SentenceTransformer model = SentenceTransformer("vabatista/sbert-mpnet-base-bm25-hard-neg-pt-br") sentences = [ "A quem a Virgem Maria supostamente apareceu em 1858 em Lourdes, França?", "É uma réplica da gruta de Lourdes, na França, onde a Virgem Maria apareceu para Santa Bernadette Soubirous em 1858.", "No topo da cúpula de ouro do edifício principal está uma estátua de ouro da Virgem Maria." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3]