Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use Vinsuka/legora_model with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Vinsuka/legora_model")
sentences = [
"What is the duration of the period mentioned in the text?",
". The only excep Ɵon to the requirement that the plainƟff must be a lending i nsƟtuƟon in order to invoke the provisions of the Act is contained in SecƟon 25, in terms of which a person who inter alia knowingly draws a cheque which is subsequently dishonoured by the bank for want of funds is guilty of an offence under the Act, and proceedings can be insƟtuted against such person in the Magistrate’s",
"? The 1st question of law is formulated on the basis that , the 1st Defendant is the licensee of the 2nd Defendant and therefore, the 1st Defendant cannot claim prescriptive title to the subject matter",
".50,000/ - (that is , a period of 36 months) but such “Facility” is subject to review on 30 /09/2000”, (that is, a period of about only 5 months from the date of P4)"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'What is described in Section 25 of the Arbitration Act?',
'. (3) The provision of subsections (1) and (2) shall apply only to the extent agreed to by the parties. (4) The arbitral tribunal shall decide according to considerations of general justice and fairness or trade usages only if the parties have expressly authorised it to do so. Section 25 of the Arbitration Act describes the form and content of the arbitral award as follows: 25',
'. 9 and 10 based on the objection taken to them by the Counsel for HNB, despite the fact that they did not arise from the pleadings, and were altogether inconsistent with them, answered the afore-stated question of law (in respect of which this Court had granted Leave to Appeal in that case) in the affirmative and in favour of HNB, and stated as follows: “In conclusion, it needs to be emphasised',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768, dim_512, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
| cosine_accuracy@3 | 0.7616 | 0.7631 | 0.7282 | 0.6759 | 0.5581 |
| cosine_accuracy@5 | 0.8198 | 0.8212 | 0.7922 | 0.7355 | 0.6221 |
| cosine_accuracy@10 | 0.8852 | 0.875 | 0.8619 | 0.8241 | 0.7253 |
| cosine_precision@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
| cosine_precision@3 | 0.2539 | 0.2544 | 0.2427 | 0.2253 | 0.186 |
| cosine_precision@5 | 0.164 | 0.1642 | 0.1584 | 0.1471 | 0.1244 |
| cosine_precision@10 | 0.0885 | 0.0875 | 0.0862 | 0.0824 | 0.0725 |
| cosine_recall@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
| cosine_recall@3 | 0.7616 | 0.7631 | 0.7282 | 0.6759 | 0.5581 |
| cosine_recall@5 | 0.8198 | 0.8212 | 0.7922 | 0.7355 | 0.6221 |
| cosine_recall@10 | 0.8852 | 0.875 | 0.8619 | 0.8241 | 0.7253 |
| cosine_ndcg@10 | 0.7308 | 0.7262 | 0.7078 | 0.6568 | 0.5514 |
| cosine_mrr@10 | 0.6812 | 0.6782 | 0.6586 | 0.6038 | 0.497 |
| cosine_map@100 | 0.6852 | 0.6828 | 0.6631 | 0.609 | 0.505 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
How must the District Court exercise its discretion? |
imposition of ‘ a’ term; (5) It is not mandatory to impose security, as evinced by the use of the conjunction “or”; (6) In imposing terms, the District Court must be mindful of the objectives of the Act, and its discretion must be exercised judicially |
What is the source of the observation made by Christian Appu? |
. Christian Appu , (1895) 1 NLR 288 observed that , “possession is "disturbed" either by an action intended to remove the possessor from the land, or by acts which prevent the possessor from enjoying the free and full use of 12 the land of which he is in the course of acquiring the dominion, and which convert his continuous user into a disconnected and divided user ” |
What must the defendant do regarding the plaintiff's claim? |
. The Court of Appeal in Ramanayake v Sampath Bank Ltd and Others [(1993) 1 Sri LR 145 at page 153] has held that, “The defendant has to deal with the plaintiff’s claim on its merits; it is not competent for the defendant to merely set out technical objections. It is also incumbent on the defendant to reveal his defence, if he has any |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 16gradient_accumulation_steps: 8learning_rate: 2e-05lr_scheduler_type: cosinewarmup_ratio: 0.1tf32: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|---|
| 0.1034 | 5 | 29.8712 | - | - | - | - | - |
| 0.2067 | 10 | 26.1323 | - | - | - | - | - |
| 0.3101 | 15 | 17.8585 | - | - | - | - | - |
| 0.4134 | 20 | 14.0232 | - | - | - | - | - |
| 0.5168 | 25 | 11.6897 | - | - | - | - | - |
| 0.6202 | 30 | 10.8431 | - | - | - | - | - |
| 0.7235 | 35 | 9.264 | - | - | - | - | - |
| 0.8269 | 40 | 11.2186 | - | - | - | - | - |
| 0.9302 | 45 | 9.9143 | - | - | - | - | - |
| 1.0 | 49 | - | 0.7134 | 0.7110 | 0.6902 | 0.6341 | 0.5282 |
| 1.0207 | 50 | 7.2581 | - | - | - | - | - |
| 1.1240 | 55 | 6.066 | - | - | - | - | - |
| 1.2274 | 60 | 6.3626 | - | - | - | - | - |
| 1.3307 | 65 | 6.8135 | - | - | - | - | - |
| 1.4341 | 70 | 5.5556 | - | - | - | - | - |
| 1.5375 | 75 | 6.0144 | - | - | - | - | - |
| 1.6408 | 80 | 6.1965 | - | - | - | - | - |
| 1.7442 | 85 | 5.596 | - | - | - | - | - |
| 1.8475 | 90 | 6.631 | - | - | - | - | - |
| 1.9509 | 95 | 6.3319 | - | - | - | - | - |
| 2.0 | 98 | - | 0.7331 | 0.7304 | 0.7074 | 0.6569 | 0.5477 |
| 2.0413 | 100 | 4.7382 | - | - | - | - | - |
| 2.1447 | 105 | 4.1516 | - | - | - | - | - |
| 2.2481 | 110 | 4.3517 | - | - | - | - | - |
| 2.3514 | 115 | 3.7044 | - | - | - | - | - |
| 2.4548 | 120 | 4.1593 | - | - | - | - | - |
| 2.5581 | 125 | 4.8081 | - | - | - | - | - |
| 2.6615 | 130 | 3.908 | - | - | - | - | - |
| 2.7649 | 135 | 3.7684 | - | - | - | - | - |
| 2.8682 | 140 | 3.8927 | - | - | - | - | - |
| 2.9509 | 144 | - | 0.7308 | 0.7262 | 0.7078 | 0.6568 | 0.5514 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base