KaLM-Embedding
/

KaLM-Reranker-V1-Nano

text2text-generation

encoder-decoder

Model card Files Files and versions

KaLM-Reranker-V1-Nano / README.md

Yuki131's picture

Update README.md

8b36257 verified 3 days ago

|

3.02 kB

	---
	language:
	- multilingual
	base_model:
	- google/t5gemma-2-270m-270m
	pipeline_tag: text-ranking
	datasets:
	- KaLM-Embedding/KaLM-embedding-finetuning-data
	- Shitao/bge-m3-data
	tags:
	- reranker
	- encoder-decoder
	- FBNL
	license: mit
	---




	<h1 align="center">KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking</h1>


	<p align="center">
	<a href="https://huggingface.co/collections/KaLM-Embedding/lychee-kalm-reranker">
	<img src="https://img.shields.io/badge/%F0%9F%A4%97_Collection-Model-ffbd45.svg" alt="HF Collection">
	</a>
	<a href="https://arxiv.org/abs/2506.20923">
	<img src="https://img.shields.io/badge/Paper-KaLM--Reranker--V1-d4333f?logo=arxiv&logoColor=white&colorA=cccccc&colorB=d4333f&style=flat" alt="Paper">
	</a>
	</p>


	We present `KaLM-Reranker-V1`, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling.

	Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations.
	This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention.

	We instantiate KaLM-Reranker-V1 in three sizes, `Nano`, `Small`, and `Large`, with `0.27B`, `1B`, and `4B` activated parameters, respectively.


	![kalm-reranker-v1 architecture](./assets/framework.jpg)


	Extensive experiments on BEIR, MIRACL, and LMEB show that the KaLM-Reranker-V1 series achieves competitive reranking performance compared with strong industrial rerankers while significantly reducing online overhead.

	# Model Details
	\| Models \| Activated Params. \| Non-Embedding Params. \| Embedding Params. \| #Layers \| Sequence Length \| Document Token Dim. \| MEP Support \| Instruction Aware \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
	\| [KaLM-Reranker-V1-Nano](https://huggingface.co/KaLM-Embedding/KaLM-Reranker-V1-Nano) \| 0.27B \| 100M \| 168M \| 18 \| 128K \| 640 \| 1x-32x \| Yes \|
	\| [KaLM-Reranker-V1-Small](https://huggingface.co/KaLM-Embedding/KaLM-Reranker-V1-Small) \| 1B \| 698M \| 302M \| 26 \| 128K \| 1152 \| 1x-32x \| Yes \|
	\| [KaLM-Reranker-V1-Large](https://huggingface.co/KaLM-Embedding/KaLM-Reranker-V1-Large) \| 4B \| 3209M \| 675M \| 34 \| 128K \| 2560 \| 1x-32x \| Yes \|


	# Prompt Template
	```python
	f"<Document>: {document}"

	```
	```python
	(
	f"<bos><start_of_turn>user\n"
	f"Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".\n\n"
	f"<Instruct>: {task_instruction}\n"
	f"<Query>: {query}<end_of_turn>\n"
	f"<start_of_turn>model\n\n\n\n"
	)

	```

	![kalm-reranker-v1 template](./assets/template.jpg)


	# Evaluation
	## BEIR


	## MIRACL


	## LMEB