bge-reranker-v2-m3 (ONNX INT8)

ONNX INT8 quantization of BAAI/bge-reranker-v2-m3 for browser & edge inference.

Files

File Convention Use with
onnx/model_int8.onnx Modern (Transformers.js v3 standard) dtype: 'int8'
onnx/model_quantized.onnx Legacy (Optimum) dtype: 'q8'

Both files are byte-identical — same INT8 weights. Pick whichever matches your loader convention.

Quantization

  • Method: Optimum dynamic INT8 quantization
  • Source: BAAI/bge-reranker-v2-m3 (multilingual cross-encoder, XLM-RoBERTa-large, 568M params)
  • Size: ~570 MB (vs ~2.3 GB for the FP32 source)

Usage (Transformers.js)

import { AutoTokenizer, AutoModel } from '@huggingface/transformers';

const model = await AutoModel.from_pretrained(
  'tss-deposium/bge-reranker-v2-m3-onnx-int8',
  { dtype: 'int8' }  // or 'q8' for legacy file name
);

Notes

  • Cross-encoder: tokenize (query, doc) together, score from logits.
  • Multilingual (100+ langs); handles cross-language reranking better than bge-reranker-base.
  • For the smaller variant (279 MB, faster but weaker cross-lang), see Xenova/bge-reranker-base.
Downloads last month
220
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tss-deposium/bge-reranker-v2-m3-onnx-int8

Quantized
(47)
this model