Instructions to use tss-deposium/bge-reranker-v2-m3-onnx-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use tss-deposium/bge-reranker-v2-m3-onnx-int8 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('text-classification', 'tss-deposium/bge-reranker-v2-m3-onnx-int8');
bge-reranker-v2-m3 (ONNX INT8)
ONNX INT8 quantization of BAAI/bge-reranker-v2-m3 for browser & edge inference.
Files
| File | Convention | Use with |
|---|---|---|
onnx/model_int8.onnx |
Modern (Transformers.js v3 standard) | dtype: 'int8' |
onnx/model_quantized.onnx |
Legacy (Optimum) | dtype: 'q8' |
Both files are byte-identical — same INT8 weights. Pick whichever matches your loader convention.
Quantization
- Method: Optimum dynamic INT8 quantization
- Source:
BAAI/bge-reranker-v2-m3(multilingual cross-encoder, XLM-RoBERTa-large, 568M params) - Size: ~570 MB (vs ~2.3 GB for the FP32 source)
Usage (Transformers.js)
import { AutoTokenizer, AutoModel } from '@huggingface/transformers';
const model = await AutoModel.from_pretrained(
'tss-deposium/bge-reranker-v2-m3-onnx-int8',
{ dtype: 'int8' } // or 'q8' for legacy file name
);
Notes
- Cross-encoder: tokenize
(query, doc)together, score from logits. - Multilingual (100+ langs); handles cross-language reranking better than
bge-reranker-base. - For the smaller variant (279 MB, faster but weaker cross-lang), see
Xenova/bge-reranker-base.
- Downloads last month
- 220
Model tree for tss-deposium/bge-reranker-v2-m3-onnx-int8
Base model
BAAI/bge-reranker-v2-m3