Sentence Similarity
sentence-transformers
Safetensors
Transformers
xlm-roberta
feature-extraction
text-retrieval
semantic-search
amharic
text-embedding-inference
Eval Results (legacy)
text-embeddings-inference
Instructions to use abdulmunimjemal/xlm-r-retrieval-am-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use abdulmunimjemal/xlm-r-retrieval-am-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("abdulmunimjemal/xlm-r-retrieval-am-v1") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use abdulmunimjemal/xlm-r-retrieval-am-v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("abdulmunimjemal/xlm-r-retrieval-am-v1") model = AutoModelForMultimodalLM.from_pretrained("abdulmunimjemal/xlm-r-retrieval-am-v1") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - sentence-transformers | |
| - text-retrieval | |
| - sentence-similarity | |
| - feature-extraction | |
| - semantic-search | |
| - amharic | |
| - text-embedding-inference | |
| - transformers | |
| pipeline_tag: sentence-similarity | |
| library_name: sentence-transformers | |
| license: mit | |
| metrics: | |
| - cosine_accuracy | |
| model-index: | |
| - name: SentenceTransformer | |
| results: | |
| - task: | |
| type: triplet | |
| name: Triplet | |
| dataset: | |
| name: TestTripletEvaluator | |
| type: TestTripletEvaluator | |
| metrics: | |
| - type: cosine_accuracy | |
| value: 0.875 | |
| name: Cosine Accuracy | |
| # SentenceTransformer Fine-Tuned for Amharic Retrieval | |
| This model is a [sentence-transformers](https://www.sbert.net) model finetuned on Amharic QA triplets. It maps sentences and paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. | |
| ## Model Details | |
| - **Model Type:** Sentence Transformer | |
| - **Base Model:** `sentence-transformers/paraphrase-xlm-r-multilingual-v1` | |
| - **Training Task:** Triplet loss with Matryoshka loss | |
| - **Language:** Amharic | |
| - **Maximum Sequence Length:** 128 tokens | |
| - **Output Dimensionality:** 768 dimensions | |
| - **Similarity Function:** Cosine Similarity | |
| ## Training Overview | |
| - **Training Data:** Custom Amharic QA triplets (with positive and negative examples) | |
| - **Training Strategy:** | |
| The model was finetuned using a combination of triplet loss and a Matryoshka loss, with evaluation performed using a `TripletEvaluator`. | |
| - **Hyperparameters:** | |
| - Epochs: 3 | |
| - Batch Size: 16 | |
| - Learning Rate: 1e-6 | |
| - Warmup Ratio: 0.08 | |
| - Weight Decay: 0.05 | |
| ## Evaluation | |
| The model was evaluated on a held-out test set using cosine similarity as the metric: | |
| | Metric | Value | | |
| |---------------------|--------| | |
| | **Cosine Accuracy** | 0.875 | | |
| ## Usage | |
| To use the model in your own project: | |
| 1. **Install Sentence Transformers:** | |
| ```bash | |
| pip install -U sentence-transformers | |
| ``` | |
| 2. **Load the Model:** | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| model = SentenceTransformer("abdulmunimjemal/xlm-r-retrieval-am-v5") | |
| sentences = [ | |
| "α°αα αα α ααα΅ ααα αα?", | |
| "α°αα α°αα«α ααα α ααα’" , | |
| "α₯α αα³ α₯ααα« α ααα’" , | |
| "α£αα αα α ααα΅ ααα αα?", | |
| "α α¨α α αα΅α ααͺα« α«α ααα’" | |
| ] | |
| embeddings = model.encode(sentences) | |
| print(embeddings.shape) # Expected output: (5, 768) | |
| ``` | |
| 3. **Compute Similarity:** | |
| ```python | |
| from sklearn.metrics.pairwise import cosine_similarity | |
| similarities = cosine_similarity(embeddings, embeddings) | |
| print(similarities.shape) # Expected output: (5, 5) | |
| ``` | |
| ## Model Architecture | |
| Below is an outline of the model architecture: | |
| ``` | |
| SentenceTransformer( | |
| (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel | |
| (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True, ...}) | |
| ) | |
| ``` | |
| ## Training Environment | |
| - **Python:** 3.11.11 | |
| - **Sentence Transformers:** 3.3.1 | |
| - **Transformers:** 4.47.1 | |
| - **PyTorch:** 2.5.1+cu124 | |
| - **Accelerate:** 1.2.1 | |
| - **Datasets:** 3.2.0 | |
| - **Tokenizers:** 0.21.0 | |
| ## Citation | |
| If you use this model in your research, please cite it appropriately. | |
| ```bibtex | |
| @misc{your_model, | |
| title = {SentenceTransformer Fine-Tuned for Amharic Retrieval}, | |
| author = {Abdulmunim J. Jemal}, | |
| year = {2025}, | |
| howpublished = {Hugging Face Model Hub, \url{https://huggingface.co/abdulmunimjemal/xlm-r-retrieval-am-v1}} | |
| } | |
| ``` |