Instructions to use Qwen/Qwen3-Reranker-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3-Reranker-4B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-4B") model = AutoModelForMultimodalLM.from_pretrained("Qwen/Qwen3-Reranker-4B") - sentence-transformers
How to use Qwen/Qwen3-Reranker-4B with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Qwen/Qwen3-Reranker-4B") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Working GGUF for llama.cpp (native Windows/Linux, no WSL needed)
Hi β most community GGUF conversions of Qwen3-Reranker are broken with llama.cpp (missing cls.output.weight tensor, producing scores like 4.5e-23 instead of real relevance scores). See llama.cpp#16407 for details.
I've converted all three sizes (0.6B, 4B, 8B) using the official convert_hf_to_gguf.py and verified they work:
Collection: https://huggingface.co/collections/Voodisss/qwen3-reranker-gguf-for-llamacpp
4B: https://huggingface.co/Voodisss/Qwen3-Reranker-4B-GGUF-llama_cpp
Works natively on Windows and Linux with llama-server.exe or llama-cli β no WSL, no vLLM, no Docker containers that refuse to release RAM. Just:
llama-server -m Qwen3-Reranker-4B-f16.gguf --reranking --pooling rank --embedding
Then call /v1/rerank and get real scores.