Model Details

Model Description

This repository contains the merged full model produced from the best Qwen3 reranker training run in QAnchor, a finance-oriented RAG and reranking pipeline for question answering over Chinese A-share annual reports.

This model is a Qwen3-based cross-encoder reranker fine-tuned to score and rank candidate document chunks for Chinese financial question answering. It is intended for same-document reranking over chunks extracted from annual reports and related financial filings.

  • Model type: Cross-encoder reranker implemented as a sequence classification model
  • Language(s) (NLP): Chinese (primary fine-tuning domain)
  • License: Apache-2.0
  • Finetuned from model: tomaarsen/Qwen3-Reranker-0.6B-seq-cls

Results

Compared against the same retrieval pipeline without finetuning, the fine-tuned Qwen3 reranker achieved:

Metric Base Finetuned Absolute Gain Relative Gain
MRR@10 0.6115 0.7758 +0.1643 +26.9%
NDCG@10 0.7572 0.8761 +0.1189 +15.7%
P@10 0.1920 0.2280 +0.0360 +18.8%

Why this base model

This reranker is fine-tuned from tomaarsen/Qwen3-Reranker-0.6B-seq-cls, a sequence-classification adaptation of Qwen/Qwen3-Reranker-0.6B.

We selected this base because the QAnchor training pipeline optimizes grouped candidate scores with a custom listwise ranking loss built on top of outputs.logits. Using the seq-cls variant allows the model to fit naturally into a standard AutoModelForSequenceClassification + LoRA training workflow.

By contrast, the original Qwen3 reranker uses a different scoring path based on the final-token "yes" / "no" logits of a causal language model, which would require a different training interface.

Model Sources

Uses

Direct Use

This model is intended to rerank candidate document chunks returned by a first-stage retriever.

Recommended use cases include:

  • Chinese financial-document reranking
  • same-document candidate reranking
  • question answering pipelines over A-share annual reports and related filings

Downstream Use

Typical downstream usage is:

  1. Retrieve a candidate set with a first-stage retriever
  2. Format each (query, document) pair using the training-time template
  3. Score each candidate with this reranker
  4. Sort by score and keep top-k

Out-of-Scope Use

This model is not intended for:

  • standalone generative QA
  • direct answer generation without retrieval
  • cross-document retrieval without a first-stage candidate generator
  • legal, accounting, or investment advice
  • settings where training-time formatting is not preserved

Bias, Risks, and Limitations

This model inherits limitations from both the upstream Qwen3 reranker family and the QAnchor training setup.

Key limitations include:

  • The fine-tuning domain is Chinese financial QA, especially A-share annual reports
  • The training and evaluation setup assumes same-document reranking rather than open-domain retrieval
  • The model is sensitive to input formatting and was trained with pair_format=qwen3_template
  • Training data is weakly supervised and not publicly released
  • No official hosted inference SLA or latency benchmark is provided in this repository

Recommendations

  • Preserve the qwen3_template formatting logic at inference time
  • Use this model only after a first-stage retriever has produced a candidate set
  • Validate quality on your own financial-document distribution before production use
  • Do not treat reranker scores as calibrated probabilities or final answers

How to Get Started with the Model

Python

from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo_id = "souflex56/qanchor-reranker-qwen3-0.6b-merged"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

Training-time formatting

This model was trained with pair_format=qwen3_template, not a plain raw (query, document) pair.

Conceptually, each pair is formatted as:

<|im_start|>system
Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>
<|im_start|>user
<Instruct>: Given a web search query, retrieve relevant passages that answer the query
<Query>: {query}
<Document>: {document}<|im_end|>
<|im_start|>assistant
<think>

</think>

A minimal helper is:

def format_qwen3_template(query: str, document: str) -> str:
    return (
        '<|im_start|>system\n'
        'Judge whether the Document meets the requirements based on the Query and the Instruct provided. '
        'Note that the answer can only be "yes" or "no".<|im_end|>\n'
        '<|im_start|>user\n'
        '<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n'
        f'<Query>: {query}\n'
        f'<Document>: {document}<|im_end|>\n'
        '<|im_start|>assistant\n'
        '<think>\n\n</think>\n\n'
    )

Training Details

Training Data

This model was fine-tuned on weakly supervised reranker data constructed in the QAnchor pipeline.

The released model does not include the training dataset. Public release covers model artifacts, metadata, and documentation only.

Training data characteristics:

  • Chinese financial QA domain
  • A-share annual reports and related financial filings
  • Sample structure: query + pos_text + neg_texts
  • Reverse-mined weak supervision with blacklist-based isolation from gold evaluation data

Dataset statistics for the best released run:

  • Train queries: 179
  • Dev queries: 20
  • Train samples: 1274
  • Dev samples: 247

Training Procedure

Preprocessing

Key preprocessing steps in the QAnchor pipeline:

  • PDF chunking into parent/child hierarchical chunks
  • first-stage retrieval with embedding + BM25 + RRF
  • reverse mining to construct positive / hard-negative triplets
  • blacklist filtering and query-level train/dev splitting

Training Hyperparameters

  • Training regime: no mixed precision flags enabled (fp16=false, bf16=false)
  • Pair format: qwen3_template
  • Max length: 768
  • Max negatives: 7
  • Learning rate: 2e-5
  • Epochs: 3
  • Batch size: 1
  • Gradient accumulation steps: 8

Speeds, Sizes, Times

  • Hardware type: NVIDIA GeForce RTX 4090
  • Training runtime (best released run): approximately 1752.8s (29.2 min)
  • Deployment note: no official hosted inference benchmark is published in this repository

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed on the QAnchor Stage 1 gold evaluation setting:

  • Gold eval queries: 50
  • Candidates per query: 20
  • Candidate source: Hybrid RRF retrieval output

Factors

The main evaluation setting measures reranking quality for:

  • Chinese financial QA
  • same-document reranking
  • candidate chunks from annual reports and related filings

Technical Specifications

Model Architecture and Objective

  • Base architecture: Qwen3 reranker family
  • Fine-tuning interface: sequence classification
  • Objective: candidate reranking for query-document pairs
  • Training objective in QAnchor: listwise softmax cross-entropy over grouped candidates

Compute Infrastructure

Hardware

  • Single NVIDIA GeForce RTX 4090 GPU

Software

  • Transformers
  • PEFT
  • Accelerate
  • PyTorch

Citation

If you use this model, please cite the QAnchor repository and the upstream Qwen3 reranker family.

@misc{qanchor_reranker_2026,
  title={QAnchor Qwen3 Reranker Release},
  author={souflex56},
  year={2026},
  howpublished={\url{https://github.com/souflex56/QAnchor}}
}

Model Card Contact

For questions about this release, please open an issue in the QAnchor repository.

Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for souflex56/qanchor-reranker-qwen3-0.6b-merged

Finetuned
(4)
this model