CrossEncoder based on nreimers/BERT-Tiny_L-2_H-128_A-2

This is a Cross Encoder model finetuned from nreimers/BERT-Tiny_L-2_H-128_A-2 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("lucasflins/CE-BERT-Tiny_L-2_H-128_A-2-BCE-20260207-084638")
# Get scores for pairs of texts
pairs = [
    ['aparador off white', 'aparador para sala ambiente classic off white nature - imcal'],
    ['itatiaia renova', 'balcão itatiaia branco renova com 3 portas e 2 gavetas'],
    ['cozinhas moduladas completas 100 mdf', 'cozinha completa 7 peças 100% mdf com portas de vidro americana'],
    ['caixa de som', 'caixa de som torre double 12 2300w bluetooth pulse - ps736'],
    ['escrivaninha 90cm', 'escrivaninha mesa escritório dobrável 90cm industrial steel quadra'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'aparador off white',
    [
        'aparador para sala ambiente classic off white nature - imcal',
        'balcão itatiaia branco renova com 3 portas e 2 gavetas',
        'cozinha completa 7 peças 100% mdf com portas de vidro americana',
        'caixa de som torre double 12 2300w bluetooth pulse - ps736',
        'escrivaninha mesa escritório dobrável 90cm industrial steel quadra',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,891,914 training samples
  • Columns: query, text, and label
  • Approximate statistics based on the first 1000 samples:
    query text label
    type string string float
    details
    • min: 2 characters
    • mean: 22.67 characters
    • max: 80 characters
    • min: 20 characters
    • mean: 67.39 characters
    • max: 188 characters
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    query text label
    tenis under armour masculino tênis under armour charged slight 3 se 0.9281755975185405
    jogos de cama casal 100 algodao 4 pecas lençol casal 400 fios 3 peças toque 100% macio com fronhas 0.5433727088037827
    geladeira para caminhao geladeira portátil 40l 12v/24v 110v/220v caminhão ônibus van 0.7394712159210983
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,323,175 evaluation samples
  • Columns: query, text, and label
  • Approximate statistics based on the first 1000 samples:
    query text label
    type string string float
    details
    • min: 3 characters
    • mean: 22.38 characters
    • max: 76 characters
    • min: 15 characters
    • mean: 66.75 characters
    • max: 150 characters
    • min: 0.0
    • mean: 0.46
    • max: 1.0
  • Samples:
    query text label
    aparador off white aparador para sala ambiente classic off white nature - imcal 0.3710301443240389
    itatiaia renova balcão itatiaia branco renova com 3 portas e 2 gavetas 0.35256627704291904
    cozinhas moduladas completas 100 mdf cozinha completa 7 peças 100% mdf com portas de vidro americana 0.2038212525937755
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 32
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • log_level: info
  • tf32: True
  • load_best_model_at_end: True
  • hub_strategy: end
  • hub_private_repo: True
  • hub_always_push: True
  • eval_on_start: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: info
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: end
  • hub_private_repo: True
  • hub_always_push: True
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0 0 - 0.6922
1.0 1901 0.6888 0.6872
2.0 3802 0.6802 0.6879
3.0 5703 0.675 0.6902
4.0 7604 0.6715 0.6912
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.7.1+cu128
  • Accelerate: 1.9.0
  • Datasets: 4.1.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lucasflins/CE-BERT-Tiny_L-2_H-128_A-2-BCE-20260207-084638

Finetuned
(3)
this model

Paper for lucasflins/CE-BERT-Tiny_L-2_H-128_A-2-BCE-20260207-084638