Sentence Similarity
Transformers
PyTorch
ONNX
sentence-transformers
Arabic
bert
feature-extraction
miniDense
passage-retrieval
knowledge-distillation
middle-training
text-embeddings-inference
Instructions to use prithivida/miniDense_arabic_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivida/miniDense_arabic_v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("prithivida/miniDense_arabic_v1") model = AutoModel.from_pretrained("prithivida/miniDense_arabic_v1") - sentence-transformers
How to use prithivida/miniDense_arabic_v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("prithivida/miniDense_arabic_v1") sentences = [ "هذا شخص سعيد", "هذا كلب سعيد", "هذا شخص سعيد جدا", "اليوم هو يوم مشمس" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -63,10 +63,10 @@ pipeline_tag: sentence-similarity
|
|
| 63 |
|
| 64 |
# Request, Terms, Disclaimers
|
| 65 |
|
|
|
|
| 66 |
|
| 67 |
<center>
|
| 68 |
<img src="./ar_terms.png" width=250%/>
|
| 69 |
-
<b><p>[https://github.com/sponsors/PrithivirajDamodaran](https://github.com/sponsors/PrithivirajDamodaran)</p><b>
|
| 70 |
</center>
|
| 71 |
|
| 72 |
|
|
@@ -173,7 +173,7 @@ The below numbers are with mDPR model, but miniDense_arabic_v1 should give a eve
|
|
| 173 |
|
| 174 |
| Language | ISO | nDCG@10 BM25 | nDCG@10 mDPR | nDCG@10 Hybrid |
|
| 175 |
|-----------|-----|--------------|--------------|----------------|
|
| 176 |
-
| **Arabic** | **ar** | **0.395** | **0.499** | **0.
|
| 177 |
|
| 178 |
*Note: MIRACL paper shows a different (higher) value for BM25 Arabic, So we are taking that value from BGE-M3 paper, rest all are form the MIRACL paper.*
|
| 179 |
|
|
@@ -184,7 +184,7 @@ So it makes sense to evaluate our models in retrieval slice of the MTEB benchmar
|
|
| 184 |
##### Long Document Retrieval
|
| 185 |
|
| 186 |
<center>
|
| 187 |
-
<img src="./ar_metrics_4.png" width=
|
| 188 |
<b><p>Table 3: Detailed Arabic retrieval performance on the MultiLongDoc dev set (measured by nDCG@10)</p></b>
|
| 189 |
</center>
|
| 190 |
|
|
@@ -194,7 +194,7 @@ So it makes sense to evaluate our models in retrieval slice of the MTEB benchmar
|
|
| 194 |
Almost all models below are monolingual arabic models based so they have no notion of any other languages.
|
| 195 |
|
| 196 |
<center>
|
| 197 |
-
<img src="./ar_metrics_5.png" width=
|
| 198 |
<b><p>Table 4: Detailed Arabic retrieval performance on the 3 X-lingual test set (measured by nDCG@10)</p></b>
|
| 199 |
</center>
|
| 200 |
|
|
|
|
| 63 |
|
| 64 |
# Request, Terms, Disclaimers
|
| 65 |
|
| 66 |
+
[https://github.com/sponsors/PrithivirajDamodaran](https://github.com/sponsors/PrithivirajDamodaran)
|
| 67 |
|
| 68 |
<center>
|
| 69 |
<img src="./ar_terms.png" width=250%/>
|
|
|
|
| 70 |
</center>
|
| 71 |
|
| 72 |
|
|
|
|
| 173 |
|
| 174 |
| Language | ISO | nDCG@10 BM25 | nDCG@10 mDPR | nDCG@10 Hybrid |
|
| 175 |
|-----------|-----|--------------|--------------|----------------|
|
| 176 |
+
| **Arabic** | **ar** | **0.395** | **0.499** | **0.673** |
|
| 177 |
|
| 178 |
*Note: MIRACL paper shows a different (higher) value for BM25 Arabic, So we are taking that value from BGE-M3 paper, rest all are form the MIRACL paper.*
|
| 179 |
|
|
|
|
| 184 |
##### Long Document Retrieval
|
| 185 |
|
| 186 |
<center>
|
| 187 |
+
<img src="./ar_metrics_4.png" width=80%/>
|
| 188 |
<b><p>Table 3: Detailed Arabic retrieval performance on the MultiLongDoc dev set (measured by nDCG@10)</p></b>
|
| 189 |
</center>
|
| 190 |
|
|
|
|
| 194 |
Almost all models below are monolingual arabic models based so they have no notion of any other languages.
|
| 195 |
|
| 196 |
<center>
|
| 197 |
+
<img src="./ar_metrics_5.png" width=80%/>
|
| 198 |
<b><p>Table 4: Detailed Arabic retrieval performance on the 3 X-lingual test set (measured by nDCG@10)</p></b>
|
| 199 |
</center>
|
| 200 |
|