--- language: - ru - en - multilingual license: mit tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - e5 - contrastive-learning base_model: intfloat/multilingual-e5-large pipeline_tag: sentence-similarity --- # multilingual-e5-large-finetuned-orders Fine-tuned version of [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) for order-offer matching task. ## Model Description This model was fine-tuned on a Russian dataset of order-offer pairs for semantic similarity matching. It significantly outperforms the base model on this specific task. ## Training Details - **Base model**: intfloat/multilingual-e5-large - **Training data**: 68,270 order-offer pairs - **Loss function**: MultipleNegativesRankingLoss - **Epochs**: 3 - **Batch size**: 32 - **Learning rate**: 2e-5 - **Training time**: ~22 minutes on NVIDIA RTX PRO 6000 ## Performance | Metric | Base E5 | Fine-tuned | Improvement | |--------|---------|------------|-------------| | Accuracy@1 | 49.40% | **72.93%** | +23.53% | | Accuracy@5 | 69.52% | **91.20%** | +21.67% | | Accuracy@10 | 76.83% | **95.11%** | +18.28% | | MRR | 0.586 | **0.811** | +0.225 | ## Usage ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer('olegGerbylev/multilingual-e5-large-finetuned-orders') # Important: Use E5-style prefixes orders = ["query: Кабель ВВГнг 3x2.5 | 100 м"] offers = ["passage: Кабель ВВГнг(А)-LS 3х2,5 | 100.0 м"] order_embeddings = model.encode(orders) offer_embeddings = model.encode(offers) # Compute similarity from sklearn.metrics.pairwise import cosine_similarity similarity = cosine_similarity(order_embeddings, offer_embeddings) print(f"Similarity: {similarity[0][0]:.4f}") ``` ## Input Format This model expects E5-style prefixes: - For queries (orders): `"query: "` - For documents (offers): `"passage: "` ## License MIT