Instructions to use KaLM-Embedding/KaLM-Reranker-V1-Nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KaLM-Embedding/KaLM-Reranker-V1-Nano with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("KaLM-Embedding/KaLM-Reranker-V1-Nano") model = AutoModelForMultimodalLM.from_pretrained("KaLM-Embedding/KaLM-Reranker-V1-Nano") - Notebooks
- Google Colab
- Kaggle
language:
- multilingual
base_model:
- google/t5gemma-2-270m-270m
pipeline_tag: text-ranking
datasets:
- KaLM-Embedding/KaLM-embedding-finetuning-data
- Shitao/bge-m3-data
tags:
- reranker
- encoder-decoder
- FBNL
license: mit
KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking
We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling.
Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations. This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention.
We instantiate KaLM-Reranker-V1 in three sizes, Nano, Small, and Large, with 0.27B, 1B, and 4B activated parameters, respectively.
Extensive experiments on BEIR, MIRACL, and LMEB show that the KaLM-Reranker-V1 series achieves competitive reranking performance compared with strong industrial rerankers while significantly reducing online overhead.
Model Details
| Models | Activated Params. | Non-Embedding Params. | Embedding Params. | #Layers | Sequence Length | Document Token Dim. | MEP Support | Instruction Aware |
|---|---|---|---|---|---|---|---|---|
| KaLM-Reranker-V1-Nano | 0.27B | 100M | 168M | 18 | 128K | 640 | 1x-32x | Yes |
| KaLM-Reranker-V1-Small | 1B | 698M | 302M | 26 | 128K | 1152 | 1x-32x | Yes |
| KaLM-Reranker-V1-Large | 4B | 3209M | 675M | 34 | 128K | 2560 | 1x-32x | Yes |
Prompt Template
f"<Document>: {document}"
(
f"<bos><start_of_turn>user\n"
f"Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".\n\n"
f"<Instruct>: {task_instruction}\n"
f"<Query>: {query}<end_of_turn>\n"
f"<start_of_turn>model\n\n\n\n"
)

