Working GGUF for llama.cpp (native Windows/Linux, no WSL needed)

by Voodisss - opened Mar 9

Mar 9

Hi — most community GGUF conversions of Qwen3-Reranker are broken with llama.cpp (missing cls.output.weight tensor, producing scores like 4.5e-23 instead of real relevance scores). See llama.cpp#16407 for details.

I've converted all three sizes (0.6B, 4B, 8B) using the official convert_hf_to_gguf.py and verified they work:

Collection: https://huggingface.co/collections/Voodisss/qwen3-reranker-gguf-for-llamacpp
4B: https://huggingface.co/Voodisss/Qwen3-Reranker-4B-GGUF-llama_cpp

Works natively on Windows and Linux with llama-server.exe or llama-cli — no WSL, no vLLM, no Docker containers that refuse to release RAM. Just:

llama-server -m Qwen3-Reranker-4B-f16.gguf --reranking --pooling rank --embedding

Then call /v1/rerank and get real scores.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment