Instructions to use Despina/Qwen2.5-0.5B-Instruct-re_gentune-2-shot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Despina/Qwen2.5-0.5B-Instruct-re_gentune-2-shot with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "Despina/Qwen2.5-0.5B-Instruct-re_gentune-2-shot") - Notebooks
- Google Colab
- Kaggle
license: mit
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: peft
language:
- en
pipeline_tag: text-generation
tags:
- relation-extraction
- information-extraction
- qlora
- lora
- peft
- nlp
datasets:
- Despina/re_gentune
Qwen2.5-0.5B-Instruct — RE GenTune (2-shot)
A sub-billion language model fine-tuned for relation extraction (RE). This is the headline checkpoint from the paper "Sub-Billion, Super-Frontier: Fine-Tuned Small Language Models Rival Zero-Shot Frontier LLMs on General and Literary Relation Extraction" (arXiv:2606.22606).
Despite having only 0.5B parameters, this model reaches 0.83 general-domain average (positive-class micro-F1), compared with 0.69 for GPT-5.4 and 0.66 for Claude Sonnet 4.6 under the same minimal zero-shot protocol. This does not imply that small models are intrinsically stronger than frontier LLMs; it shows that targeted task adaptation lets a 4-bit model deployable on a single consumer GPU outperform general-purpose frontier systems under this protocol. An in-domain RoBERTa baseline also exceeds both frontier models, indicating the advantage stems from task adaptation rather than generative decoding.
It is a QLoRA (LoRA) adapter on top of
Qwen/Qwen2.5-0.5B-Instruct, tuned on the
GenTune general-domain mixture using the 2-shot prompt style.
What it does
Given a sentence and two marked entities, the model outputs only the relation label that holds between them (one label, no explanation).
Usage
This repo is a PEFT LoRA adapter, so load the base model and attach the adapter:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen2.5-0.5B-Instruct"
ADAPTER = "Despina/Qwen2.5-0.5B-Instruct-re_gentune-2-shot"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
system_prompt = (
"You are a relation extraction system. Be concise and direct. "
"Output ONLY the relation type that holds between the two mentioned entities. "
"Do not output any explanation, punctuation, or extra text — only the label."
)
user_prompt = (
"Sentence: Steve Jobs co-founded Apple in Cupertino.\n"
"Entity 1: Steve Jobs\n"
"Entity 2: Apple\n"
"Relation:"
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=16, do_sample=False)
print(tokenizer.decode(out[0, inputs.shape[-1]:], skip_special_tokens=True).strip())
For best results, match the format the model was trained on: a system prompt asking for the label only, and (optionally) two in-context examples before the query — this is the 2-shot regime. A schema-enumerated variant, where the allowed label set for the target dataset is injected into the system prompt, gives the strongest results in the paper.
Training
| Base model | Qwen/Qwen2.5-0.5B-Instruct |
| Method | QLoRA (4-bit NF4, bf16 compute, double quant) |
| LoRA | r = 32, α = 64, dropout = 0.05; targets: q/k/v/o + gate/up/down proj |
| Training data | Despina/re_gentune (GenTune general-domain mixture), 2-shot prompts |
| Objective | Generate the relation label only |
| Epochs | 2 |
| Learning rate | 1e-4 |
| Effective batch | 4 × 2 grad-accum = 8 |
| Max sequence length | 1024 |
GenTune aggregates seven general-domain RE datasets: TACRED, SemEval-2010 Task 8, CoNLL04, NYT11, GIDS, Re-DocRED, and REBEL.
Evaluation
Scored with positive-class micro-F1 (the no-relation class is excluded from the average). On the general-domain benchmarks the model scores 0.83 general-domain average, versus zero-shot GPT-5.4 (0.69) and Claude Sonnet 4.6 (0.66) under a minimal zero-shot protocol. As the paper stresses, this reflects targeted task adaptation rather than any intrinsic superiority of small models. See the paper for the full 30-configuration matrix, literary-domain results, and the RoBERTa discriminative baseline.
Limitations
- Trained to emit a single relation label; it is not a general-purpose chat model.
- Tuned on general-domain text; expect degradation on out-of-domain / literary inputs (see the cross-domain analysis in the paper).
- Inherits the biases and licensing constraints of its underlying datasets.
Links
- Paper: arXiv:2606.22606
- Code / reproduction: https://github.com/DespinaChristou/compact-relex
- Training dataset:
Despina/re_gentune
Citation
If you use this model, please cite:
@article{christou2026subbillion,
title = {Sub-Billion, Super-Frontier: Small Language Models Rival
Zero-Shot Frontier LLMs on General and Literary Relation Extraction},
author = {Christou, Despina and Tsoumakas, Grigorios},
journal = {arXiv preprint arXiv:2606.22606},
year = {2026},
url = {https://arxiv.org/abs/2606.22606}
}