Instructions to use madebyaris/rerank-indonesia with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use madebyaris/rerank-indonesia with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("madebyaris/rerank-indonesia") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
rerank-indonesia
A lightweight Indonesian (Bahasa Indonesia) cross-encoder reranker, small enough to serve on a cheap CPU VPS yet competitive with a 17× larger model.
It is built by Margin-MSE knowledge distillation: a strong multilingual
teacher, BAAI/bge-reranker-v2-m3
(568M params), supervises the tiny student
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
on in-domain Indonesian (query, positive, negative) triplets from TyDi QA and
MIRACL-id (with BM25 + dense hard-negative mining). The student learns the
teacher's score margin between relevant and non-relevant passages.
Built as part of flashIndorank.
Evaluation
MIRACL-id official retrieve-then-rerank protocol (BM25 top-100 → rerank,
960 dev queries, pytrec_eval):
| model | params | nDCG@10 | MRR@10 | Recall@100 |
|---|---|---|---|---|
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 (base) |
tiny | 0.656 | 0.623 | 0.760 |
| this model (in-domain distillation) | tiny | 0.701 | 0.677 | 0.760 |
BAAI/bge-reranker-v2-m3 (teacher) |
568M | 0.712 | 0.689 | 0.760 |
The distilled student improves nDCG@10 by +4.5 points over the base while staying within ~1 point of the 568M teacher — roughly 98% of the teacher's ranking quality at a fraction of the size and latency. (Recall@100 is the BM25 first-stage ceiling and bounds all rerankers.)
How it compares to hosted commercial rerankers
An independent cross-system check on 300 MIRACL-id dev queries (BM25 top-100
→ rerank). Every reranker is scored with the same harness, the same BM25
candidates, and the same metric implementation, so the comparison is
apples-to-apples. NVIDIA and Cohere were called through the OpenRouter rerank API.
| reranker | nDCG@10 | MRR@10 | cost / availability |
|---|---|---|---|
| BM25 (no rerank) | 0.393 | 0.330 | — |
| this model (int8 ONNX, CPU) | 0.655 | 0.633 | free · local · offline |
nvidia/llama-nemotron-rerank-vl-1b-v2 |
0.656 | 0.632 | hosted API |
cohere/rerank-v3.5 |
0.664 | 0.636 | paid API |
cohere/rerank-4-pro |
0.665 | 0.640 | paid API |
Takeaways:
- Statistically tied with NVIDIA's hosted reranker (nDCG@10 0.655 vs 0.656; it is marginally ahead on MRR@10) — while running free and offline on CPU.
- Within
0.01 nDCG (1.5%) of Cohere's strongest commercial reranker.
Honesty note: the absolute scores in this comparison are slightly lower than the 0.701 reported above because this is a 300-query slice scored with flashIndorank's own metric harness, not the full 960-query
pytrec_evalrun. The relative standing (≈ NVIDIA, just under Cohere) is the point. A smaller 30-query slice was even noisier and is not a reliable signal — prefer these 300-query (or the full 960) numbers.
Usage
sentence-transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("madebyaris/rerank-indonesia")
query = "Bagaimana cara menurunkan berat badan?"
passages = [
"Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh.",
"Harga emas global naik tajam dalam sepekan terakhir.",
]
scores = model.predict([[query, p] for p in passages])
print(scores)
Lightweight ONNX (int8) via flashIndorank
from huggingface_hub import snapshot_download
from flashindorank import CustomReranker
from flashrank import RerankRequest
path = snapshot_download("madebyaris/rerank-indonesia", allow_patterns=["onnx/*"])
ranker = CustomReranker(f"{path}/onnx")
out = ranker.rerank(RerankRequest(
query="Bagaimana cara menurunkan berat badan?",
passages=[{"id": 1, "text": "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh."}],
))
print(out)
Training
- Student / base:
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 - Teacher:
BAAI/bge-reranker-v2-m3 - Method: Margin-MSE knowledge distillation (Hofstätter et al., 2020) —
label = teacher(q, pos) - teacher(q, neg) - Data: in-domain Indonesian triplets from TyDi QA + MIRACL-id train, BM25 + dense hard negatives
- Optimizer: 3 epochs, lr 8e-6, bf16,
MarginMSELoss(CrossEncoderTrainer)
See TRAINING.md.
Roadmap — what's next to improve
The model is already quality-competitive with hosted rerankers; the remaining wins, highest-leverage first:
- Close the small gap to Cohere (quality). Re-distill on combined data — mMARCO-id (~400k triplets) + in-domain TyDi/MIRACL-id upsampled ~3× so in-domain signal isn't diluted — for a few more epochs. Targets pushing nDCG@10 past 0.70 toward the teacher ceiling.
- Stronger / ensemble teacher. Distill from a larger teacher (e.g.
BAAI/bge-reranker-v2-gemma) or an ensemble of teacher margins to raise the distillation ceiling above the current ~0.712. - Harder negatives. Re-mine negatives with a strong dense retriever (not just BM25); cross-encoders learn most from hard negatives.
- Lift the real ceiling = better first-stage retrieval. MIRACL nDCG is capped by
Recall@100(~0.71–0.76 here). A better retriever (multilingual-e5 / BGE-M3 dense, or hybrid BM25+dense) raises the candidates the reranker sees — likely a bigger end-to-end win than any reranker tweak. - Faster CPU serving. The int8 ONNX is quality-ready; latency is the lever.
Length-sorted mini-batching (cut padding waste), an optional multi-threaded ONNX
mode, and a lower default
max_length(256) materially reduce CPU latency and RAM. - Broaden evaluation. Report the full 960-query MIRACL-id run and add other domains (e-commerce, news) so the quality claim generalizes beyond Wikipedia QA.
License
Apache-2.0, inherited from the base model. TyDi QA and MIRACL are Apache-2.0.
- Downloads last month
- 159