Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 14
How to use Ch333tah/modernbert-finqalab-embeddings with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Ch333tah/modernbert-finqalab-embeddings")
sentences = [
"Question: How long do I have to complete the biometric verification? Answer: Once you receive the OTP, you must finish the biometric process within 45 days. An automated email will remind you to complete it.",
"What's the deadline for that fingerprint thing after I get the code?",
"I'm disputing my capital gains tax calculation from NCCPL, what can I do?",
"Question: Can I complete the biometric verification from outside Pakistan? Answer: Yes, if you are currently abroad, you can complete the biometric process either online or by visiting an NCCPL office when you return to Pakistan, provided its within 45 days of account activation."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the finqalab_embedding_finetune dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Ch333tah/modernbert-finqalab-embeddings")
# Run inference
sentences = [
"Question: I'm experiencing issues with logging in to the app. What should I do? Answer: In case you are facing any issues, Try closing the app and opening it again. Try clearing the cache or updating to the latest version. If the issue still persists, contact our customer support through Whatsapp (+923003672522). Email your query at support@finqalab.com.pk",
"My app won't let me log in! Help!",
"Question: The app is not loading properly on my device. What could be the problem? Answer: If the app isn't loading properly: Please check if you have a stable internet connection. Try refreshing the screen 2-3 times. Close the app and open it again. Try clearing the cache, check for app updates or reinstall the app. If the issue still persists, contact our customer support through Whatsapp (+923003672522) or email your query at support@finqalab.com",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
ai-job-train, ai-job-valid, ai-job-test, ai-job-train, ai-job-valid and ai-job-testTripletEvaluator| Metric | ai-job-train | ai-job-valid | ai-job-test |
|---|---|---|---|
| cosine_accuracy | 0.9604 | 0.9524 | 0.913 |
Pos_Context, Query, and Neg_Context| Pos_Context | Query | Neg_Context | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| Pos_Context | Query | Neg_Context |
|---|---|---|
Question: I did not receive a verification email? What should I do? Answer: Please check your spam folder and double check your registered email address. If you still dont see a verification email, contact our customer support department at support@finqalab.com. |
My verification email didn't arrive, any ideas? |
Question: I did not receive my instant transfer in Finqalab account within 10 minutes. What should I do? Answer: If your instant transfer hasnt been credited to your Finqalab account within 10 minutes, please email us at support@finqalab.com with a screenshot of the receipt or send it to us on Whatsapp (+923003672522). Our team will review the issue, escalate it to the bank by sending the transaction receipt, and follow up to ensure your funds are credited promptly. |
Question: What are the applicable CGT rates for RDA Account Holders? Answer: Filer rates are applied to RDA account holders irrespective of their status (Filer or Non-filer). |
How are capital gains taxes handled for someone with an RDA account? |
Question: What does Minimum Lot Size mean? Answer: This means that you need to buy a minimum quantity for a share. In case of ETFs the minimum lot size is 500 or in multiples of 500 shares. Whereas, for non-ETF stocks the minimum lot size is 1 share. |
Question: How do I receive bonus shares? Answer: Bonus shares are distributed to shareholders based on a ratio announced by the company. For example, if a company declares a 20% bonus issue, you will receive 2 additional shares for every 10 shares you already own. |
What's the deal with getting extra shares? |
Question: Do I have to pay for bonus shares? Answer: No, bonus shares are issued free of charge. They are typically paid for by utilizing the companys retained earnings or reserves. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
Pos_Context, Query, and Neg_Context| Pos_Context | Query | Neg_Context | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| Pos_Context | Query | Neg_Context |
|---|---|---|
Question: How will I know if my biometric verification is pending or close to the deadline? Answer: We receive reports every Monday and Friday regarding users with pending biometric verifications. If you have less than 7 days remaining, we will contact you to remind you to complete the process or provide alternate solutions if necessary. |
What happens if my biometric verification is about to expire? |
Question: I entered my CNIC in the Bioverify app and got CNIC not eligible for this service message. What does this mean? Answer: This means that the Biometric verification is not required. |
Question: How long do I have to complete the biometric verification? Answer: Once you receive the OTP, you must finish the biometric process within 45 days. An automated email will remind you to complete it. |
What's the deadline for that fingerprint thing after I get the code? |
Question: Can I complete the biometric verification from outside Pakistan? Answer: Yes, if you are currently abroad, you can complete the biometric process either online or by visiting an NCCPL office when you return to Pakistan, provided its within 45 days of account activation. |
Question: Is historical price data available for stocks in the app? Answer: Yes, it is. |
Can I see stock prices from the past in this app? |
Question: How often is the stock data updated in the app? Answer: The stock data is updated in real time. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | ai-job-train_cosine_accuracy | ai-job-valid_cosine_accuracy | ai-job-test_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | 0.9604 | 0.9524 | 0.9130 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base