--- tags: - sentence-transformers - sentence-similarity - feature-extraction - ancient-greek - biblical-verses - clustering base_model: pranaydeeps/Ancient-Greek-BERT pipeline_tag: sentence-similarity library_name: sentence-transformers license: mit language: - grc --- # Ancient Greek Variant SBERT (ONNX) A [sentence-transformers](https://www.SBERT.net) model fine-tuned from [pranaydeeps/Ancient-Greek-BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT) for semantic similarity of Ancient Greek biblical texts. > **Note:** This is the ONNX-converted version of [Paulanerus/AncientGreekVariantSBERT](https://huggingface.co/Paulanerus/AncientGreekVariantSBERT). ## Model Description This model maps Ancient Greek sentences and paragraphs to a 768-dimensional dense vector space, optimized for: - **Semantic textual similarity** between biblical verses - **Clustering** of related passages - **Semantic search** across biblical corpora The model was fine-tuned using Multiple Negatives Ranking Loss to learn meaningful representations that capture semantic relationships between biblical Greek texts. ## Strengths This model excels at: - **Variant Detection**: High similarity scores (>0.9) for verses that are textually identical or near-identical, even with minor orthographic differences - **Semantic Clustering**: Effectively groups related passages and parallel texts - **Robust to Spelling Variations**: Handles common manuscript variations in biblical Greek ## Usage ### Installation & Training For installation instructions and training scripts, see the [GitHub repository](https://github.com/Paulanerus/AncientGreekVariantSB). ### Inference Example ```python import unicodedata from sentence_transformers import SentenceTransformer, util def strip_accents_and_lowercase(s): return "".join( c for c in unicodedata.normalize("NFD", s) if unicodedata.category(c) != "Mn" ).lower() xsent = [ "οι δε φαρισεοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο", "ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης", "νυν κρισις εστιν του κοσμου νυν ο αρχων τουτου τουτου νυν ο αρχων του κοσμου τουτου εκβληθησεται εξω", "διο προσλαμβανεσθαι αλληλους καθως και ο χς προσελαβετο υμας εις δοξαν του θυ", ] ysent = [ "οι δε φαρισαιοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο", "ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης και ρυσαι ημας απο του πονηρου", "νυν δε προς σε ερχομαι και ταυτα λαλω εν τω κοσμω ινα εχωσιν την χαραν την εμην πεπληρωκενην εν αυτοις", "μετανοησαται ουν και επιστρεψαται προς το εξαλιφθηναι υμων τας αμαρτιας", ] xsent_norm = [strip_accents_and_lowercase(s) for s in xsent] ysent_norm = [strip_accents_and_lowercase(s) for s in ysent] model = SentenceTransformer("Paulanerus/AncientGreekVariantSBERT") x_embeddings = model.encode(xsent_norm, convert_to_tensor=True) y_embeddings = model.encode(ysent_norm, convert_to_tensor=True) print("Similarities:") for i in range(len(xsent_norm)): similarity = util.cos_sim(x_embeddings[i], y_embeddings[i]).item() print(f"Pair {i + 1}: {similarity:.4f}") ``` **Expected output:** ``` Similarities: Pair 1: 0.9882 # Near-identical verses (minor spelling difference) Pair 2: 0.9000 # Same verse, one with additional text Pair 3: 0.1772 Pair 4: 0.1724 ``` ## Training Details ### Base Model - **Base**: [pranaydeeps/Ancient-Greek-BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT) - **Architecture**: BERT-base (12 layers, 768 hidden dimensions) ### Training Configuration | Parameter | Value | | ------------- | ---------------------------- | | Batch Size | 256 | | Epochs | 8 | | Learning Rate | 2e-5 | | Loss Function | MultipleNegativesRankingLoss | | Warmup | 10% of training steps | | Hardware | NVIDIA A100 80GB PCIe | ### Preprocessing All input text should be normalized by: 1. Removing diacritics/accents (NFD normalization) 2. Converting to lowercase ## Evaluation Results Evaluated on information retrieval task: | Metric | Score | | ------------ | ----- | | Accuracy@1 | 0.43 | | Accuracy@3 | 1.00 | | Accuracy@5 | 1.00 | | Precision@3 | 0.74 | | Precision@10 | 0.88 | | MRR@10 | 0.715 | | NDCG@10 | 0.843 | | MAP@100 | 0.911 | ## Limitations - Optimized specifically for Ancient Greek; may have reduced performance on other genres (Classical, Homeric, etc.) - Requires text preprocessing (accent stripping, lowercasing) for best results ## Citation ```bibtex @misc{ancient-greek-variant-sbert, author = {Fröhlich, Paul}, title = {Ancient Greek Variant SBERT: Fine-tuned Embeddings for Biblical text verses in Ancient Greek}, year = {2026}, howpublished = {\url{https://huggingface.co/Paulanerus/AncientGreekVariantSBERT}}, note = {Model release} } ``` ## Acknowledgments This model builds upon the [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT) by Singh, Rutten, and Lefever (2021). This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 513300936.