Text Classification
Transformers
Safetensors
Greek
bert
greek
ancient-text
multi-class-classification
text-embeddings-inference
Instructions to use anand095/greek-bert-3epoch-lr-2e-5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use anand095/greek-bert-3epoch-lr-2e-5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="anand095/greek-bert-3epoch-lr-2e-5")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("anand095/greek-bert-3epoch-lr-2e-5") model = AutoModelForSequenceClassification.from_pretrained("anand095/greek-bert-3epoch-lr-2e-5") - Notebooks
- Google Colab
- Kaggle
GreekBERT Fine-tuned β Ancient Text Location Classification
Fine-tuned version of nlpaueb/bert-base-greek-uncased-v1 for multi-class classification of ancient Greek texts by geographic provenance. Given an ancient inscription or scripture fragment, the model predicts the region or location it originated from across 15 classes.
Built as part of the Ancient Texts Provenance Challenge (Kaggle β nppe1).
Model Details
- Model type: BERT-based sequence classifier
- Base model: nlpaueb/bert-base-greek-uncased-v1
- Task: Multi-class text classification (15 classes)
- Language: Ancient/Classical Greek
- Developed by: Anand Kumar
- Training platform: Kaggle (GPU T4/P100)
- Experiment tracking: Weights & Biases (W&B)
Training Details
Dataset
- Source: Ancient Texts Provenance Challenge (Kaggle β nppe1)
- Split: 80/20 stratified train/test split (seed=42, stratified by label)
- Classes: 15 geographic provenance labels
- Note: Dataset has significant class imbalance β addressed via Macro-F1 as the primary evaluation metric
Preprocessing
- Tokenized using
nlpaueb/bert-base-greek-uncased-v1tokenizer - Truncated to max length of 512 tokens
- Dynamic padding via
DataCollatorWithPadding
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 5 |
| Per-device batch size | 32 |
| Learning rate | 5e-5 |
| LR scheduler | Linear with warmup |
| Warmup ratio | 0.1 |
| Precision | fp16 mixed precision |
| Evaluation strategy | Per epoch |
Evaluation
Metrics
Macro-F1 was chosen as the primary metric due to class imbalance in the dataset. It evaluates performance equally across all 15 classes regardless of class frequency.
Results
| Metric | Score |
|---|---|
| Macro-F1 | 0.51 |
| Accuracy | 0.66 |
Tracked and logged via Weights & Biases β project: nppe1, run: greek-bert
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "anand095/greek-bert-5epoch-lr-5e-5-warmup"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "your ancient greek text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()
print(f"Predicted location class: {predicted_class}")
Limitations
- Trained specifically on the nppe1 Kaggle dataset β performance on other ancient text corpora may vary
- Limited to 15 predefined geographic classes from the training data
- Model handles ancient/classical Greek text only; not suitable for modern Greek
- Downloads last month
- 20
Model tree for anand095/greek-bert-3epoch-lr-2e-5
Base model
nlpaueb/bert-base-greek-uncased-v1