rerank-indonesia

A lightweight Indonesian (Bahasa Indonesia) cross-encoder reranker, small enough to serve on a cheap CPU VPS yet competitive with a 17× larger model.

It is built by Margin-MSE knowledge distillation: a strong multilingual teacher, BAAI/bge-reranker-v2-m3 (568M params), supervises the tiny student cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 on in-domain Indonesian (query, positive, negative) triplets from TyDi QA and MIRACL-id (with BM25 + dense hard-negative mining). The student learns the teacher's score margin between relevant and non-relevant passages.

Built as part of flashIndorank.

Evaluation

MIRACL-id official retrieve-then-rerank protocol (BM25 top-100 → rerank, 960 dev queries, pytrec_eval):

model	params	nDCG@10	MRR@10	Recall@100
`cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` (base)	tiny	0.656	0.623	0.760
this model (in-domain distillation)	tiny	0.701	0.677	0.760
`BAAI/bge-reranker-v2-m3` (teacher)	568M	0.712	0.689	0.760

The distilled student improves nDCG@10 by +4.5 points over the base while staying within ~1 point of the 568M teacher — roughly 98% of the teacher's ranking quality at a fraction of the size and latency. (Recall@100 is the BM25 first-stage ceiling and bounds all rerankers.)

How it compares to hosted commercial rerankers

An independent cross-system check on 300 MIRACL-id dev queries (BM25 top-100 → rerank). Every reranker is scored with the same harness, the same BM25 candidates, and the same metric implementation, so the comparison is apples-to-apples. NVIDIA and Cohere were called through the OpenRouter rerank API.

reranker	nDCG@10	MRR@10	cost / availability
BM25 (no rerank)	0.393	0.330	—
this model (int8 ONNX, CPU)	0.655	0.633	free · local · offline
`nvidia/llama-nemotron-rerank-vl-1b-v2`	0.656	0.632	hosted API
`cohere/rerank-v3.5`	0.664	0.636	paid API
`cohere/rerank-4-pro`	0.665	0.640	paid API

Takeaways:

Statistically tied with NVIDIA's hosted reranker (nDCG@10 0.655 vs 0.656; it is marginally ahead on MRR@10) — while running free and offline on CPU.
Within ~~0.01 nDCG (~~1.5%) of Cohere's strongest commercial reranker.

Honesty note: the absolute scores in this comparison are slightly lower than the 0.701 reported above because this is a 300-query slice scored with flashIndorank's own metric harness, not the full 960-query pytrec_eval run. The relative standing (≈ NVIDIA, just under Cohere) is the point. A smaller 30-query slice was even noisier and is not a reliable signal — prefer these 300-query (or the full 960) numbers.

Usage

sentence-transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("madebyaris/rerank-indonesia")
query = "Bagaimana cara menurunkan berat badan?"
passages = [
    "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh.",
    "Harga emas global naik tajam dalam sepekan terakhir.",
]
scores = model.predict([[query, p] for p in passages])
print(scores)

Lightweight ONNX (int8) via flashIndorank

from huggingface_hub import snapshot_download
from flashindorank import CustomReranker
from flashrank import RerankRequest

path = snapshot_download("madebyaris/rerank-indonesia", allow_patterns=["onnx/*"])
ranker = CustomReranker(f"{path}/onnx")
out = ranker.rerank(RerankRequest(
    query="Bagaimana cara menurunkan berat badan?",
    passages=[{"id": 1, "text": "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh."}],
))
print(out)

Training

Student / base: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
Teacher: BAAI/bge-reranker-v2-m3
Method: Margin-MSE knowledge distillation (Hofstätter et al., 2020) — label = teacher(q, pos) - teacher(q, neg)
Data: in-domain Indonesian triplets from TyDi QA + MIRACL-id train, BM25 + dense hard negatives
Optimizer: 3 epochs, lr 8e-6, bf16, MarginMSELoss (CrossEncoderTrainer)

See TRAINING.md.

Roadmap — what's next to improve

The model is already quality-competitive with hosted rerankers; the remaining wins, highest-leverage first:

Close the small gap to Cohere (quality). Re-distill on combined data — mMARCO-id (~400k triplets) + in-domain TyDi/MIRACL-id upsampled ~3× so in-domain signal isn't diluted — for a few more epochs. Targets pushing nDCG@10 past 0.70 toward the teacher ceiling.
Stronger / ensemble teacher. Distill from a larger teacher (e.g. BAAI/bge-reranker-v2-gemma) or an ensemble of teacher margins to raise the distillation ceiling above the current ~0.712.
Harder negatives. Re-mine negatives with a strong dense retriever (not just BM25); cross-encoders learn most from hard negatives.
Lift the real ceiling = better first-stage retrieval. MIRACL nDCG is capped by Recall@100 (~0.71–0.76 here). A better retriever (multilingual-e5 / BGE-M3 dense, or hybrid BM25+dense) raises the candidates the reranker sees — likely a bigger end-to-end win than any reranker tweak.
Faster CPU serving. The int8 ONNX is quality-ready; latency is the lever. Length-sorted mini-batching (cut padding waste), an optional multi-threaded ONNX mode, and a lower default max_length (256) materially reduce CPU latency and RAM.
Broaden evaluation. Report the full 960-query MIRACL-id run and add other domains (e-commerce, news) so the quality claim generalizes beyond Wikipedia QA.

License

Apache-2.0, inherited from the base model. TyDi QA and MIRACL are Apache-2.0.

Downloads last month: 159

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for madebyaris/rerank-indonesia

Base model

nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large

Quantized

cross-encoder/mmarco-mMiniLMv2-L12-H384-v1