---
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - ancient-greek
  - biblical-verses
  - clustering
base_model: pranaydeeps/Ancient-Greek-BERT
pipeline_tag: sentence-similarity
library_name: sentence-transformers
license: mit
language:
  - grc
---

# Ancient Greek Variant SBERT (ONNX)

A [sentence-transformers](https://www.SBERT.net) model fine-tuned from [pranaydeeps/Ancient-Greek-BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT) for semantic similarity of Ancient Greek biblical texts.

> **Note:** This is the ONNX-converted version of [Paulanerus/AncientGreekVariantSBERT](https://huggingface.co/Paulanerus/AncientGreekVariantSBERT).

## Model Description

This model maps Ancient Greek sentences and paragraphs to a 768-dimensional dense vector space, optimized for:

- **Semantic textual similarity** between biblical verses
- **Clustering** of related passages
- **Semantic search** across biblical corpora

The model was fine-tuned using Multiple Negatives Ranking Loss to learn meaningful representations that capture semantic relationships between biblical Greek texts.

## Strengths

This model excels at:

- **Variant Detection**: High similarity scores (>0.9) for verses that are textually identical or near-identical, even with minor orthographic differences
- **Semantic Clustering**: Effectively groups related passages and parallel texts
- **Robust to Spelling Variations**: Handles common manuscript variations in biblical Greek

## Usage

### Installation & Training

For installation instructions and training scripts, see the [GitHub repository](https://github.com/Paulanerus/AncientGreekVariantSB).

### Inference Example

```python
import unicodedata
from sentence_transformers import SentenceTransformer, util


def strip_accents_and_lowercase(s):
    return "".join(
        c for c in unicodedata.normalize("NFD", s) if unicodedata.category(c) != "Mn"
    ).lower()


xsent = [
    "οι δε φαρισεοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο",
    "ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης",
    "νυν κρισις εστιν του κοσμου νυν ο αρχων τουτου τουτου νυν ο αρχων του κοσμου τουτου εκβληθησεται εξω",
    "διο προσλαμβανεσθαι αλληλους καθως και ο χς προσελαβετο υμας εις δοξαν του θυ",
]

ysent = [
    "οι δε φαρισαιοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο",
    "ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης και ρυσαι ημας απο του πονηρου",
    "νυν δε προς σε ερχομαι και ταυτα λαλω εν τω κοσμω ινα εχωσιν την χαραν την εμην πεπληρωκενην εν αυτοις",
    "μετανοησαται ουν και επιστρεψαται προς το εξαλιφθηναι υμων τας αμαρτιας",
]

xsent_norm = [strip_accents_and_lowercase(s) for s in xsent]
ysent_norm = [strip_accents_and_lowercase(s) for s in ysent]

model = SentenceTransformer("Paulanerus/AncientGreekVariantSBERT")

x_embeddings = model.encode(xsent_norm, convert_to_tensor=True)
y_embeddings = model.encode(ysent_norm, convert_to_tensor=True)

print("Similarities:")
for i in range(len(xsent_norm)):
    similarity = util.cos_sim(x_embeddings[i], y_embeddings[i]).item()
    print(f"Pair {i + 1}: {similarity:.4f}")
```

**Expected output:**

```
Similarities:
Pair 1: 0.9882  # Near-identical verses (minor spelling difference)
Pair 2: 0.9000  # Same verse, one with additional text
Pair 3: 0.1772
Pair 4: 0.1724
```

## Training Details

### Base Model

- **Base**: [pranaydeeps/Ancient-Greek-BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT)
- **Architecture**: BERT-base (12 layers, 768 hidden dimensions)

### Training Configuration

| Parameter     | Value                        |
| ------------- | ---------------------------- |
| Batch Size    | 256                          |
| Epochs        | 8                            |
| Learning Rate | 2e-5                         |
| Loss Function | MultipleNegativesRankingLoss |
| Warmup        | 10% of training steps        |
| Hardware      | NVIDIA A100 80GB PCIe        |

### Preprocessing

All input text should be normalized by:

1. Removing diacritics/accents (NFD normalization)
2. Converting to lowercase

## Evaluation Results

Evaluated on information retrieval task:

| Metric       | Score |
| ------------ | ----- |
| Accuracy@1   | 0.43  |
| Accuracy@3   | 1.00  |
| Accuracy@5   | 1.00  |
| Precision@3  | 0.74  |
| Precision@10 | 0.88  |
| MRR@10       | 0.715 |
| NDCG@10      | 0.843 |
| MAP@100      | 0.911 |

## Limitations

- Optimized specifically for Ancient Greek; may have reduced performance on other genres (Classical, Homeric, etc.)
- Requires text preprocessing (accent stripping, lowercasing) for best results

## Citation

```bibtex
@misc{ancient-greek-variant-sbert,
  author = {Fröhlich, Paul},
  title = {Ancient Greek Variant SBERT: Fine-tuned Embeddings for Biblical text verses in Ancient Greek},
  year = {2026},
  howpublished = {\url{https://huggingface.co/Paulanerus/AncientGreekVariantSBERT}},
  note = {Model release}
}
```

## Acknowledgments

This model builds upon the [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT) by Singh, Rutten, and Lefever (2021).

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 513300936.