editlens-qwen3-4b-merged-v3

Qwen3-4B fine-tuned with QLoRA on a 2026-vintage 3-generator reproduction of the EditLens dataset, with the LoRA adapter merged into the base in bf16. Successor to editlens-qwen3-4b-merged-v2: adds Gemini-3 Flash as a third generator (alongside Claude Sonnet 4.6 and GPT-5.3) by replacing 1/3 of the existing v2 ai_edited/ai_generated rows with Gemini-generated equivalents, preserving the 1:1:1 human:edited:generated ratio.

Trained for the EditLens task (arXiv:2510.03154): classify text by how much AI editing it has received. Predicts a continuous score in [0, 1] from a 4-bucket softmax (bucket_pred ∈ {0, 1, 2, 3} mapped to score = bucket / 3).

What changed vs `editlens-qwen3-4b-merged-v2`

	v2	v3 (this model)
Generators (mix)	Claude Sonnet 4.6 + GPT-5.3 (50/50)	Claude Sonnet 4.6 + GPT-5.3 + Gemini-3-Flash-Preview (~33/33/33)
Source domains	6 (5 + Twitter)	6 (same)
Word-count window	20–800	20–800 (same)
Train rows	75,375	75,316
Val / Test rows	3,200 / 7,567	3,234 / 7,658
Editing prompts	301 (paper Appendix K subset)	301 (same)
Embedding model (cosine teacher)	Linq-Embed-Mistral	Linq-Embed-Mistral (same)
Bucket thresholds	lo=0.03, hi=0.15	lo=0.03, hi=0.15 (same)

When to use which model

If you want detection on...	Best choice
2026-vintage Sonnet/GPT/Gemini outputs	v3
2026-vintage Sonnet/GPT (Gemini-aware not needed)	v2
2025-vintage GPT-4.1 / Claude Sonnet 4 / Gemini 2.5 Flash / Llama 3.3	v1 (`editlens-qwen3-4b-merged`)
Older 2022-2023 generators (chatgpt, gpt4, mistral-chat) on RAID-style benchmarks	v1

For ensembling: v1 + v3 covers most generator vintages from 2022 → 2026.

Quick start

The score head is a custom LayerNorm + Linear (NormedLinear) module rather than a bare Linear, so it doesn't auto-load via from_pretrained. Reattach and copy weights manually:

import torch
import torch.nn as nn
from safetensors import safe_open
from transformers import AutoModelForSequenceClassification, AutoTokenizer

class NormedLinear(nn.Module):
    def __init__(self, hidden_size, num_labels, dtype=torch.bfloat16):
        super().__init__()
        self.norm = nn.LayerNorm(hidden_size, dtype=dtype)
        self.linear = nn.Linear(hidden_size, num_labels, bias=False, dtype=dtype)
    def forward(self, x):
        return self.linear(self.norm(x))

MODEL = "DarrenJiaImbue/editlens-qwen3-4b-merged-v3"
tok = AutoTokenizer.from_pretrained(MODEL)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token
    tok.padding_side = "left"

model = AutoModelForSequenceClassification.from_pretrained(MODEL, dtype=torch.bfloat16).to("cuda")
n = model.config.num_labels
model.score = NormedLinear(model.config.hidden_size, n).to("cuda", dtype=torch.bfloat16)

from huggingface_hub import hf_hub_download
sf = hf_hub_download(MODEL, "model.safetensors")
with safe_open(sf, framework="pt") as f:
    model.score.norm.weight.data.copy_(f.get_tensor("score.norm.weight"))
    model.score.norm.bias.data.copy_(f.get_tensor("score.norm.bias"))
    model.score.linear.weight.data.copy_(f.get_tensor("score.linear.weight"))
model.config.pad_token_id = tok.pad_token_id
model.eval()

text = "The original text..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024).to("cuda")
with torch.no_grad(), torch.autocast("cuda", dtype=torch.bfloat16):
    logits = model(**enc).logits
probs = logits.float().softmax(-1).cpu().numpy()[0]
bucket = int(probs.argmax())
score = float(probs @ [0, 1, 2, 3]) / 3
print(f"bucket={bucket} score={score:.3f}")

Bucket interpretation

Bucket	Approx cosine distance from source	Meaning
0	≤ 0.03	Verbatim human
1	0.03–0.07	Light AI touch-up
2	0.07–0.15	Heavier AI rewrite
3	≥ 0.15	AI-generated

Test-set accuracy

Threshold calibrated on each model's own val.csv. Numbers are ternary accuracy (4-class output collapsed to 3-class human/edited/AI-gen).

Test set	description	v1	v2	v3
`pangram/editlens_iclr` test (paper)	2025-vintage GPT-4.1 / Sonnet 4 / Gemini 2.5 / Llama 3.3	0.912	0.858	0.904
v3 in-domain test	2026-vintage Sonnet 4.6 / GPT-5.3 / Gemini-3-Flash	0.846	0.915	0.930
Enron holdout (OOD email domain)	same generators as paper	0.860	0.868	0.874
RAID 10K (binary)	2022-2023 chatgpt / gpt4 / llama-chat / mistral-chat	0.982	0.940	0.964

v3 binary modes (in-domain test)

Mode	Accuracy	Macro F1
human_vs_ai	0.998	0.998
human_vs_rest	0.959	0.954
ai_vs_rest	0.972	0.968

Per-generator on the paper's test set (the rebalance question)

The v2 model's notable weakness was Gemini-2.5 ai_generated detection, where it scored 0.852 vs v1's 0.951. v3's 3-way generator mix closes that gap and then some:

generator	v1	v2	v3
gemini-2.5-flash ai_generated	0.951	0.852	0.957
claude-sonnet-4 ai_generated	0.976	0.973	0.989
gpt-4.1 ai_generated	0.958	0.867	0.937
llama-3.3-70B ai_generated	1.000	0.946	0.973

License

CC BY-NC-SA 4.0 (matches the original EditLens release).

Citation

@misc{thai2025editlensquantifyingextentai,
  title={EditLens: Quantifying the Extent of AI Editing in Text},
  author={Katherine Thai and Bradley Emi and Elyas Masrour and Mohit Iyyer},
  year={2025},
  eprint={2510.03154},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
}

Downloads last month: 9

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for DarrenJiaImbue/editlens-qwen3-4b-merged-v3

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(704)

this model

Paper for DarrenJiaImbue/editlens-qwen3-4b-merged-v3

EditLens: Quantifying the Extent of AI Editing in Text

Paper • 2510.03154 • Published Oct 3, 2025