How to use from the
Use from the
sentence-transformers library
from sentence_transformers import CrossEncoder

model = CrossEncoder("madebyaris/rerank-indonesia")

query = "Which planet is known as the Red Planet?"
passages = [
	"Venus is often called Earth's twin because of its similar size and proximity.",
	"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
	"Jupiter, the largest planet in our solar system, has a prominent red spot.",
	"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

rerank-indonesia

A lightweight Indonesian (Bahasa Indonesia) cross-encoder reranker, small enough to serve on a cheap CPU VPS yet competitive with a 17× larger model.

It is built by Margin-MSE knowledge distillation: a strong multilingual teacher, BAAI/bge-reranker-v2-m3 (568M params), supervises the tiny student cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 on in-domain Indonesian (query, positive, negative) triplets from TyDi QA and MIRACL-id (with BM25 + dense hard-negative mining). The student learns the teacher's score margin between relevant and non-relevant passages.

Built as part of flashIndorank.

Evaluation

MIRACL-id official retrieve-then-rerank protocol (BM25 top-100 → rerank, 960 dev queries, pytrec_eval):

model params nDCG@10 MRR@10 Recall@100
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 (base) tiny 0.656 0.623 0.760
this model (in-domain distillation) tiny 0.701 0.677 0.760
BAAI/bge-reranker-v2-m3 (teacher) 568M 0.712 0.689 0.760

The distilled student improves nDCG@10 by +4.5 points over the base while staying within ~1 point of the 568M teacher — roughly 98% of the teacher's ranking quality at a fraction of the size and latency. (Recall@100 is the BM25 first-stage ceiling and bounds all rerankers.)

How it compares to hosted commercial rerankers

An independent cross-system check on 300 MIRACL-id dev queries (BM25 top-100 → rerank). Every reranker is scored with the same harness, the same BM25 candidates, and the same metric implementation, so the comparison is apples-to-apples. NVIDIA and Cohere were called through the OpenRouter rerank API.

reranker nDCG@10 MRR@10 cost / availability
BM25 (no rerank) 0.393 0.330
this model (int8 ONNX, CPU) 0.655 0.633 free · local · offline
nvidia/llama-nemotron-rerank-vl-1b-v2 0.656 0.632 hosted API
cohere/rerank-v3.5 0.664 0.636 paid API
cohere/rerank-4-pro 0.665 0.640 paid API

Takeaways:

  • Statistically tied with NVIDIA's hosted reranker (nDCG@10 0.655 vs 0.656; it is marginally ahead on MRR@10) — while running free and offline on CPU.
  • Within 0.01 nDCG (1.5%) of Cohere's strongest commercial reranker.

Honesty note: the absolute scores in this comparison are slightly lower than the 0.701 reported above because this is a 300-query slice scored with flashIndorank's own metric harness, not the full 960-query pytrec_eval run. The relative standing (≈ NVIDIA, just under Cohere) is the point. A smaller 30-query slice was even noisier and is not a reliable signal — prefer these 300-query (or the full 960) numbers.

Usage

sentence-transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("madebyaris/rerank-indonesia")
query = "Bagaimana cara menurunkan berat badan?"
passages = [
    "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh.",
    "Harga emas global naik tajam dalam sepekan terakhir.",
]
scores = model.predict([[query, p] for p in passages])
print(scores)

Lightweight ONNX (int8) via flashIndorank

from huggingface_hub import snapshot_download
from flashindorank import CustomReranker
from flashrank import RerankRequest

path = snapshot_download("madebyaris/rerank-indonesia", allow_patterns=["onnx/*"])
ranker = CustomReranker(f"{path}/onnx")
out = ranker.rerank(RerankRequest(
    query="Bagaimana cara menurunkan berat badan?",
    passages=[{"id": 1, "text": "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh."}],
))
print(out)

Training

  • Student / base: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
  • Teacher: BAAI/bge-reranker-v2-m3
  • Method: Margin-MSE knowledge distillation (Hofstätter et al., 2020) — label = teacher(q, pos) - teacher(q, neg)
  • Data: in-domain Indonesian triplets from TyDi QA + MIRACL-id train, BM25 + dense hard negatives
  • Optimizer: 3 epochs, lr 8e-6, bf16, MarginMSELoss (CrossEncoderTrainer)

See TRAINING.md.

Roadmap — what's next to improve

The model is already quality-competitive with hosted rerankers; the remaining wins, highest-leverage first:

  1. Close the small gap to Cohere (quality). Re-distill on combined data — mMARCO-id (~400k triplets) + in-domain TyDi/MIRACL-id upsampled ~3× so in-domain signal isn't diluted — for a few more epochs. Targets pushing nDCG@10 past 0.70 toward the teacher ceiling.
  2. Stronger / ensemble teacher. Distill from a larger teacher (e.g. BAAI/bge-reranker-v2-gemma) or an ensemble of teacher margins to raise the distillation ceiling above the current ~0.712.
  3. Harder negatives. Re-mine negatives with a strong dense retriever (not just BM25); cross-encoders learn most from hard negatives.
  4. Lift the real ceiling = better first-stage retrieval. MIRACL nDCG is capped by Recall@100 (~0.71–0.76 here). A better retriever (multilingual-e5 / BGE-M3 dense, or hybrid BM25+dense) raises the candidates the reranker sees — likely a bigger end-to-end win than any reranker tweak.
  5. Faster CPU serving. The int8 ONNX is quality-ready; latency is the lever. Length-sorted mini-batching (cut padding waste), an optional multi-threaded ONNX mode, and a lower default max_length (256) materially reduce CPU latency and RAM.
  6. Broaden evaluation. Report the full 960-query MIRACL-id run and add other domains (e-commerce, news) so the quality claim generalizes beyond Wikipedia QA.

License

Apache-2.0, inherited from the base model. TyDi QA and MIRACL are Apache-2.0.

Downloads last month
181
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for madebyaris/rerank-indonesia

Datasets used to train madebyaris/rerank-indonesia