Text Ranking
Transformers
Safetensors
multilingual
t5gemma2
text2text-generation
reranker
encoder-decoder
FBNL
Retrieval
RAG
Instructions to use KaLM-Embedding/KaLM-Reranker-V1-Nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KaLM-Embedding/KaLM-Reranker-V1-Nano with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("KaLM-Embedding/KaLM-Reranker-V1-Nano") model = AutoModelForMultimodalLM.from_pretrained("KaLM-Embedding/KaLM-Reranker-V1-Nano") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - multilingual | |
| base_model: | |
| - google/t5gemma-2-270m-270m | |
| pipeline_tag: text-ranking | |
| datasets: | |
| - KaLM-Embedding/KaLM-embedding-finetuning-data | |
| - Shitao/bge-m3-data | |
| tags: | |
| - reranker | |
| - encoder-decoder | |
| - FBNL | |
| license: mit | |
| <h1 align="center">KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking</h1> | |
| <p align="center"> | |
| <a href="https://huggingface.co/collections/KaLM-Embedding/lychee-kalm-reranker"> | |
| <img src="https://img.shields.io/badge/%F0%9F%A4%97_Collection-Model-ffbd45.svg" alt="HF Collection"> | |
| </a> | |
| <a href="https://arxiv.org/abs/2506.20923"> | |
| <img src="https://img.shields.io/badge/Paper-KaLM--Reranker--V1-d4333f?logo=arxiv&logoColor=white&colorA=cccccc&colorB=d4333f&style=flat" alt="Paper"> | |
| </a> | |
| </p> | |
| We present `KaLM-Reranker-V1`, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling. | |
| Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations. | |
| This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention. | |
| We instantiate KaLM-Reranker-V1 in three sizes, `Nano`, `Small`, and `Large`, with `0.27B`, `1B`, and `4B` activated parameters, respectively. | |
|  | |
| Extensive experiments on BEIR, MIRACL, and LMEB show that the KaLM-Reranker-V1 series achieves competitive reranking performance compared with strong industrial rerankers while significantly reducing online overhead. | |
| # Model Details | |
| | Models | Activated Params. | Non-Embedding Params. | Embedding Params. | #Layers | Sequence Length | Document Token Dim. | MEP Support | Instruction Aware | | |
| | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | |
| | [KaLM-Reranker-V1-Nano](https://huggingface.co/KaLM-Embedding/KaLM-Reranker-V1-Nano) | 0.27B | 100M | 168M | 18 | 128K | 640 | 1x-32x | Yes | | |
| | [KaLM-Reranker-V1-Small](https://huggingface.co/KaLM-Embedding/KaLM-Reranker-V1-Small) | 1B | 698M | 302M | 26 | 128K | 1152 | 1x-32x | Yes | | |
| | [KaLM-Reranker-V1-Large](https://huggingface.co/KaLM-Embedding/KaLM-Reranker-V1-Large) | 4B | 3209M | 675M | 34 | 128K | 2560 | 1x-32x | Yes | | |
| # Prompt Template | |
| ```python | |
| f"<Document>: {document}" | |
| ``` | |
| ```python | |
| ( | |
| f"<bos><start_of_turn>user\n" | |
| f"Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".\n\n" | |
| f"<Instruct>: {task_instruction}\n" | |
| f"<Query>: {query}<end_of_turn>\n" | |
| f"<start_of_turn>model\n\n\n\n" | |
| ) | |
| ``` | |
|  | |
| # Evaluation | |
| ## BEIR | |
| ## MIRACL | |
| ## LMEB | |