Instructions to use Despina/Llama-3.2-3B-Instruct-re_gentune-2-shot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Despina/Llama-3.2-3B-Instruct-re_gentune-2-shot with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") model = PeftModel.from_pretrained(base_model, "Despina/Llama-3.2-3B-Instruct-re_gentune-2-shot") - Notebooks
- Google Colab
- Kaggle
Llama-3.2-3B-Instruct — RE GenTune (2-shot)
Built with Llama. This is a fine-tuned derivative of Meta's Llama-3.2-3B-Instruct and is governed by the Llama 3.2 Community License.
A 3B language model fine-tuned for relation extraction (RE). This is the best-performing general-domain checkpoint from the paper "Sub-Billion, Super-Frontier: Fine-Tuned Small Language Models Rival Zero-Shot Frontier LLMs on General and Literary Relation Extraction" (arXiv:2606.22606).
It reaches a 0.844 general-domain average (positive-class micro-F1) — the single highest general-domain score across all 30 tuned configurations in the paper — compared with 0.69 for GPT-5.4 and 0.66 for Claude Sonnet 4.6 under the same minimal zero-shot protocol. As the paper stresses, this does not imply that small models are intrinsically stronger than frontier LLMs; it shows that targeted task adaptation lets a compact 4-bit model deployable on a single consumer GPU outperform general-purpose frontier systems under this protocol. An in-domain RoBERTa baseline also exceeds both frontier models, indicating the advantage stems from task adaptation rather than generative decoding.
It is a QLoRA (LoRA) adapter on top of
meta-llama/Llama-3.2-3B-Instruct,
tuned on the GenTune general-domain mixture using the 2-shot prompt style.
What it does
Given a sentence and two marked entities, the model outputs only the relation label that holds between them (one label, no explanation).
Usage
This repo is a PEFT LoRA adapter, so load the base model and attach the adapter:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "meta-llama/Llama-3.2-3B-Instruct"
ADAPTER = "Despina/Llama-3.2-3B-Instruct-re_gentune-2-shot"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
system_prompt = (
"You are a relation extraction system. Be concise and direct. "
"Output ONLY the relation type that holds between the two mentioned entities. "
"Do not output any explanation, punctuation, or extra text — only the label."
)
user_prompt = (
"Sentence: Steve Jobs co-founded Apple in Cupertino.\n"
"Entity 1: Steve Jobs\n"
"Entity 2: Apple\n"
"Relation:"
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=16, do_sample=False)
print(tokenizer.decode(out[0, inputs.shape[-1]:], skip_special_tokens=True).strip())
For best results, match the format the model was trained on: a system prompt asking for the label only, and (optionally) two in-context examples before the query — this is the 2-shot regime. A schema-enumerated variant, where the allowed label set for the target dataset is injected into the system prompt, gives the strongest results in the paper.
Training
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Method | QLoRA (4-bit NF4, bf16 compute, double quant) |
| LoRA | r = 64, α = 128, dropout = 0.05; targets: q/k/v/o + gate/up/down proj |
| Training data | Despina/re_gentune (GenTune general-domain mixture), 2-shot prompts |
| Objective | Generate the relation label only |
| Epochs | 2 |
| Learning rate | 1e-4 |
| Effective batch | 4 × 2 grad-accum = 8 |
| Max sequence length | 1024 |
GenTune aggregates seven general-domain RE datasets: TACRED, SemEval-2010 Task 8, CoNLL04, NYT11, GIDS, Re-DocRED, and REBEL.
Evaluation
Scored with positive-class micro-F1 (the no-relation class is excluded from the average). On the general-domain benchmarks the model scores 0.844 general-domain average — the top score in the paper — versus zero-shot GPT-5.4 (0.69) and Claude Sonnet 4.6 (0.66) under a minimal zero-shot protocol. As the paper stresses, this reflects targeted task adaptation rather than any intrinsic superiority of small models. See the paper for the full 30-configuration matrix, literary-domain results, and the RoBERTa discriminative baseline.
Limitations
- Trained to emit a single relation label; it is not a general-purpose chat model.
- Tuned on general-domain text; expect degradation on out-of-domain / literary inputs (see the cross-domain analysis in the paper).
- Inherits the biases and licensing constraints of its underlying datasets.
Links
- Paper: arXiv:2606.22606
- Code / reproduction: https://github.com/DespinaChristou/compact-relex
- Training dataset:
Despina/re_gentune
License
This model is a derivative of Meta Llama 3.2 and is licensed under the Llama 3.2 Community License. Use is subject to Meta's Acceptable Use Policy. "Built with Llama."
Citation
If you use this model, please cite:
@article{christou2026subbillion,
title = {Sub-Billion, Super-Frontier: Small Language Models Rival
Zero-Shot Frontier LLMs on General and Literary Relation Extraction},
author = {Christou, Despina and Tsoumakas, Grigorios},
journal = {arXiv preprint arXiv:2606.22606},
year = {2026},
url = {https://arxiv.org/abs/2606.22606}
}
- Downloads last month
- -
Model tree for Despina/Llama-3.2-3B-Instruct-re_gentune-2-shot
Base model
meta-llama/Llama-3.2-3B-Instruct