Text Ranking
sentence-transformers
ONNX
Safetensors
Indonesian
xlm-roberta
reranker
cross-encoder
indonesian
bahasa-indonesia
knowledge-distillation
flashrank
text-embeddings-inference
Instructions to use madebyaris/rerank-indonesia with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use madebyaris/rerank-indonesia with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("madebyaris/rerank-indonesia") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
| language: | |
| - id | |
| license: apache-2.0 | |
| library_name: sentence-transformers | |
| pipeline_tag: text-ranking | |
| tags: | |
| - reranker | |
| - cross-encoder | |
| - text-ranking | |
| - indonesian | |
| - bahasa-indonesia | |
| - knowledge-distillation | |
| - flashrank | |
| - onnx | |
| base_model: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 | |
| datasets: | |
| - google-research-datasets/tydiqa | |
| - miracl/miracl | |
| metrics: | |
| - mrr | |
| - ndcg | |
| # rerank-indonesia | |
| A lightweight **Indonesian (Bahasa Indonesia) cross-encoder reranker**, small | |
| enough to serve on a cheap CPU VPS yet competitive with a 17× larger model. | |
| It is built by **Margin-MSE knowledge distillation**: a strong multilingual | |
| teacher, [`BAAI/bge-reranker-v2-m3`](https://huggingface.co/BAAI/bge-reranker-v2-m3) | |
| (568M params), supervises the tiny student | |
| [`cross-encoder/mmarco-mMiniLMv2-L12-H384-v1`](https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1) | |
| on in-domain Indonesian (query, positive, negative) triplets from **TyDi QA** and | |
| **MIRACL-id** (with BM25 + dense hard-negative mining). The student learns the | |
| teacher's score *margin* between relevant and non-relevant passages. | |
| Built as part of [flashIndorank](https://github.com/madebyaris/flashIndorank). | |
| ## Evaluation | |
| **MIRACL-id** official retrieve-then-rerank protocol (BM25 top-100 → rerank, | |
| 960 dev queries, `pytrec_eval`): | |
| | model | params | nDCG@10 | MRR@10 | Recall@100 | | |
| | --- | --- | --- | --- | --- | | |
| | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` (base) | tiny | 0.656 | 0.623 | 0.760 | | |
| | **this model** (in-domain distillation) | **tiny** | **0.701** | **0.677** | 0.760 | | |
| | `BAAI/bge-reranker-v2-m3` (teacher) | 568M | 0.712 | 0.689 | 0.760 | | |
| The distilled student improves nDCG@10 by **+4.5 points** over the base while | |
| staying within **~1 point of the 568M teacher** — roughly 98% of the teacher's | |
| ranking quality at a fraction of the size and latency. (Recall@100 is the BM25 | |
| first-stage ceiling and bounds all rerankers.) | |
| ### How it compares to hosted commercial rerankers | |
| An independent cross-system check on **300 MIRACL-id `dev` queries** (BM25 top-100 | |
| → rerank). Every reranker is scored with the **same** harness, the **same** BM25 | |
| candidates, and the **same** metric implementation, so the comparison is | |
| apples-to-apples. NVIDIA and Cohere were called through the OpenRouter rerank API. | |
| | reranker | nDCG@10 | MRR@10 | cost / availability | | |
| | --- | --- | --- | --- | | |
| | BM25 (no rerank) | 0.393 | 0.330 | — | | |
| | **this model** (int8 ONNX, CPU) | 0.655 | 0.633 | **free · local · offline** | | |
| | `nvidia/llama-nemotron-rerank-vl-1b-v2` | 0.656 | 0.632 | hosted API | | |
| | `cohere/rerank-v3.5` | 0.664 | 0.636 | paid API | | |
| | `cohere/rerank-4-pro` | 0.665 | 0.640 | paid API | | |
| Takeaways: | |
| - **Statistically tied with NVIDIA's hosted reranker** (nDCG@10 0.655 vs 0.656; it is | |
| marginally *ahead* on MRR@10) — while running free and offline on CPU. | |
| - Within **~0.01 nDCG (~1.5%)** of Cohere's strongest commercial reranker. | |
| > Honesty note: the absolute scores in this comparison are slightly lower than the | |
| > **0.701** reported above because this is a 300-query slice scored with | |
| > flashIndorank's own metric harness, not the full 960-query `pytrec_eval` run. The | |
| > **relative** standing (≈ NVIDIA, just under Cohere) is the point. A smaller | |
| > 30-query slice was even noisier and is not a reliable signal — prefer these | |
| > 300-query (or the full 960) numbers. | |
| ## Usage | |
| ### sentence-transformers | |
| ```python | |
| from sentence_transformers import CrossEncoder | |
| model = CrossEncoder("madebyaris/rerank-indonesia") | |
| query = "Bagaimana cara menurunkan berat badan?" | |
| passages = [ | |
| "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh.", | |
| "Harga emas global naik tajam dalam sepekan terakhir.", | |
| ] | |
| scores = model.predict([[query, p] for p in passages]) | |
| print(scores) | |
| ``` | |
| ### Lightweight ONNX (int8) via flashIndorank | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| from flashindorank import CustomReranker | |
| from flashrank import RerankRequest | |
| path = snapshot_download("madebyaris/rerank-indonesia", allow_patterns=["onnx/*"]) | |
| ranker = CustomReranker(f"{path}/onnx") | |
| out = ranker.rerank(RerankRequest( | |
| query="Bagaimana cara menurunkan berat badan?", | |
| passages=[{"id": 1, "text": "Olahraga teratur dan pola makan sehat membantu mengurangi bobot tubuh."}], | |
| )) | |
| print(out) | |
| ``` | |
| ## Training | |
| - Student / base: `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` | |
| - Teacher: `BAAI/bge-reranker-v2-m3` | |
| - Method: Margin-MSE knowledge distillation (Hofstätter et al., 2020) — | |
| `label = teacher(q, pos) - teacher(q, neg)` | |
| - Data: in-domain Indonesian triplets from TyDi QA + MIRACL-id train, | |
| BM25 + dense hard negatives | |
| - Optimizer: 3 epochs, lr 8e-6, bf16, `MarginMSELoss` (`CrossEncoderTrainer`) | |
| See [TRAINING.md](https://github.com/madebyaris/flashIndorank/blob/main/TRAINING.md). | |
| ## Roadmap — what's next to improve | |
| The model is already quality-competitive with hosted rerankers; the remaining wins, | |
| highest-leverage first: | |
| 1. **Close the small gap to Cohere (quality).** Re-distill on **combined** data — | |
| mMARCO-id (~400k triplets) + in-domain TyDi/MIRACL-id upsampled ~3× so in-domain | |
| signal isn't diluted — for a few more epochs. Targets pushing nDCG@10 past 0.70 | |
| toward the teacher ceiling. | |
| 2. **Stronger / ensemble teacher.** Distill from a larger teacher (e.g. | |
| `BAAI/bge-reranker-v2-gemma`) or an ensemble of teacher margins to raise the | |
| distillation ceiling above the current ~0.712. | |
| 3. **Harder negatives.** Re-mine negatives with a strong *dense* retriever (not just | |
| BM25); cross-encoders learn most from hard negatives. | |
| 4. **Lift the real ceiling = better first-stage retrieval.** MIRACL nDCG is capped by | |
| `Recall@100` (~0.71–0.76 here). A better retriever (multilingual-e5 / BGE-M3 dense, | |
| or hybrid BM25+dense) raises the candidates the reranker sees — likely a bigger | |
| end-to-end win than any reranker tweak. | |
| 5. **Faster CPU serving.** The int8 ONNX is quality-ready; latency is the lever. | |
| Length-sorted mini-batching (cut padding waste), an optional multi-threaded ONNX | |
| mode, and a lower default `max_length` (256) materially reduce CPU latency and RAM. | |
| 6. **Broaden evaluation.** Report the full 960-query MIRACL-id run and add other | |
| domains (e-commerce, news) so the quality claim generalizes beyond Wikipedia QA. | |
| ## License | |
| Apache-2.0, inherited from the base model. TyDi QA and MIRACL are Apache-2.0. | |