---
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
library_name: peft
language:
- en
pipeline_tag: text-generation
tags:
- relation-extraction
- information-extraction
- literary-nlp
- qlora
- lora
- peft
- llama
- nlp
datasets:
- Despina/re_mixtune
---

# Llama-3.2-3B-Instruct — RE MixTune (2-shot)

> Built with Llama. This is a fine-tuned derivative of Meta's Llama-3.2-3B-Instruct and is
> governed by the [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/).

A 3B language model fine-tuned for **relation extraction (RE)** across **both general-domain and
literary text**. This is the best single "does-both" checkpoint from the paper *"Sub-Billion,
Super-Frontier: Fine-Tuned Small Language Models Rival Zero-Shot Frontier LLMs on General and
Literary Relation Extraction"* ([arXiv:2606.22606](https://arxiv.org/abs/2606.22606)).

Trained on a domain-balanced mixture, it handles both domains at once, scoring **0.827
general-domain average** and **0.825 literary average (positive-class micro-F1)** simultaneously —
close to each domain specialist's in-domain peak. For reference, zero-shot frontier LLMs under the
same minimal protocol reach 0.69 (GPT-5.4) and 0.66 (Claude Sonnet 4.6) on general-domain RE, and
GPT-5.4 reaches 0.578 on the two-benchmark literary average. As the paper stresses, this reflects
targeted task adaptation rather than any intrinsic superiority of small models over frontier LLMs.

It is a **QLoRA (LoRA) adapter** on top of
[`meta-llama/Llama-3.2-3B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct),
tuned on the **MixTune** balanced general+literary mixture using the **2-shot** prompt style.

## What it does

Given a sentence and two marked entities, the model outputs **only the relation label** that
holds between them (one label, no explanation). Unlike the domain specialists, this checkpoint is
meant to serve both general and literary inputs from a single model.

## Usage

This repo is a PEFT LoRA adapter, so load the base model and attach the adapter:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "meta-llama/Llama-3.2-3B-Instruct"
ADAPTER = "Despina/Llama-3.2-3B-Instruct-re_mixtune-2-shot"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

system_prompt = (
    "You are a relation extraction system. Be concise and direct. "
    "Output ONLY the relation type that holds between the two mentioned entities. "
    "Do not output any explanation, punctuation, or extra text — only the label."
)
user_prompt = (
    "Sentence: Steve Jobs co-founded Apple in Cupertino.\n"
    "Entity 1: Steve Jobs\n"
    "Entity 2: Apple\n"
    "Relation:"
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

out = model.generate(inputs, max_new_tokens=16, do_sample=False)
print(tokenizer.decode(out[0, inputs.shape[-1]:], skip_special_tokens=True).strip())
```

For best results, match the format the model was trained on: a system prompt asking for the
label only, and (optionally) two in-context examples before the query — this is the **2-shot**
regime. A **schema-enumerated** variant, where the allowed label set for the target dataset is
injected into the system prompt, gives the strongest results in the paper.

## Training

| | |
|---|---|
| Base model | `meta-llama/Llama-3.2-3B-Instruct` |
| Method | QLoRA (4-bit NF4, bf16 compute, double quant) |
| LoRA | r = 64, α = 128, dropout = 0.05; targets: q/k/v/o + gate/up/down proj |
| Training data | `Despina/re_mixtune` (domain-balanced general+literary mixture), 2-shot prompts |
| Objective | Generate the relation label only |
| Epochs | 2 |
| Learning rate | 1e-4 |
| Effective batch | 4 × 2 grad-accum = 8 |
| Max sequence length | 1024 |

**MixTune** is a domain-balanced (~50/50) mixture drawing equal numbers of general and literary
examples: the seven general-domain datasets (TACRED, SemEval-2010 Task 8, CoNLL04, NYT11, GIDS,
Re-DocRED, REBEL) and the two literary datasets (Biographical, PG-Fiction).

## Evaluation

Scored with **positive-class micro-F1** (the no-relation class is excluded from the average).
Evaluated on all nine benchmarks, the model scores **0.827 general-domain average** and **0.825
literary average** simultaneously — the strongest single-model choice when one model must cover
both domains. For reference, zero-shot GPT-5.4 / Claude Sonnet 4.6 reach 0.69 / 0.66 on general
RE, and GPT-5.4 reaches 0.578 on literary RE, under a minimal zero-shot protocol. As the paper
stresses, this reflects targeted task adaptation rather than any intrinsic superiority of small
models. See the paper for the full 30-configuration matrix and the RoBERTa discriminative
baseline.

## Limitations

- Trained to emit a single relation label; it is not a general-purpose chat model.
- A single-model generalist: a domain specialist (GenTune or LitTune) may edge it out slightly
  on its own domain.
- PG-Fiction labels are annotated by a GPT-4-class model, so the model partly learns that
  annotator's label distribution on literary inputs.
- Inherits the biases and licensing constraints of its underlying datasets.

## Links

- **Paper:** [arXiv:2606.22606](https://arxiv.org/abs/2606.22606)
- **Code / reproduction:** https://github.com/DespinaChristou/compact-relex
- **Training dataset:** [`Despina/re_mixtune`](https://huggingface.co/datasets/Despina/re_mixtune)

## License

This model is a derivative of Meta Llama 3.2 and is licensed under the
[Llama 3.2 Community License](https://www.llama.com/llama3_2/license/). Use is subject to Meta's
Acceptable Use Policy. "Built with Llama."

## Citation

If you use this model, please cite:

```bibtex
@article{christou2026subbillion,
  title        = {Sub-Billion, Super-Frontier: Small Language Models Rival
                  Zero-Shot Frontier LLMs on General and Literary Relation Extraction},
  author       = {Christou, Despina and Tsoumakas, Grigorios},
  journal      = {arXiv preprint arXiv:2606.22606},
  year         = {2026},
  url          = {https://arxiv.org/abs/2606.22606}
}
```