editlens-qwen3-4b-merged-v3
Qwen3-4B fine-tuned with QLoRA on a 2026-vintage 3-generator reproduction of the EditLens dataset, with the LoRA adapter merged into the base in bf16. Successor to editlens-qwen3-4b-merged-v2: adds Gemini-3 Flash as a third generator (alongside Claude Sonnet 4.6 and GPT-5.3) by replacing 1/3 of the existing v2 ai_edited/ai_generated rows with Gemini-generated equivalents, preserving the 1:1:1 human:edited:generated ratio.
Trained for the EditLens task (arXiv:2510.03154): classify text by how much AI editing it has received. Predicts a continuous score in [0, 1] from a 4-bucket softmax (bucket_pred ∈ {0, 1, 2, 3} mapped to score = bucket / 3).
What changed vs editlens-qwen3-4b-merged-v2
| v2 | v3 (this model) | |
|---|---|---|
| Generators (mix) | Claude Sonnet 4.6 + GPT-5.3 (50/50) | Claude Sonnet 4.6 + GPT-5.3 + Gemini-3-Flash-Preview (~33/33/33) |
| Source domains | 6 (5 + Twitter) | 6 (same) |
| Word-count window | 20–800 | 20–800 (same) |
| Train rows | 75,375 | 75,316 |
| Val / Test rows | 3,200 / 7,567 | 3,234 / 7,658 |
| Editing prompts | 301 (paper Appendix K subset) | 301 (same) |
| Embedding model (cosine teacher) | Linq-Embed-Mistral | Linq-Embed-Mistral (same) |
| Bucket thresholds | lo=0.03, hi=0.15 | lo=0.03, hi=0.15 (same) |
When to use which model
| If you want detection on... | Best choice |
|---|---|
| 2026-vintage Sonnet/GPT/Gemini outputs | v3 |
| 2026-vintage Sonnet/GPT (Gemini-aware not needed) | v2 |
| 2025-vintage GPT-4.1 / Claude Sonnet 4 / Gemini 2.5 Flash / Llama 3.3 | v1 (editlens-qwen3-4b-merged) |
| Older 2022-2023 generators (chatgpt, gpt4, mistral-chat) on RAID-style benchmarks | v1 |
For ensembling: v1 + v3 covers most generator vintages from 2022 → 2026.
Quick start
The score head is a custom LayerNorm + Linear (NormedLinear) module rather than a bare Linear, so it doesn't auto-load via from_pretrained. Reattach and copy weights manually:
import torch
import torch.nn as nn
from safetensors import safe_open
from transformers import AutoModelForSequenceClassification, AutoTokenizer
class NormedLinear(nn.Module):
def __init__(self, hidden_size, num_labels, dtype=torch.bfloat16):
super().__init__()
self.norm = nn.LayerNorm(hidden_size, dtype=dtype)
self.linear = nn.Linear(hidden_size, num_labels, bias=False, dtype=dtype)
def forward(self, x):
return self.linear(self.norm(x))
MODEL = "DarrenJiaImbue/editlens-qwen3-4b-merged-v3"
tok = AutoTokenizer.from_pretrained(MODEL)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
tok.padding_side = "left"
model = AutoModelForSequenceClassification.from_pretrained(MODEL, dtype=torch.bfloat16).to("cuda")
n = model.config.num_labels
model.score = NormedLinear(model.config.hidden_size, n).to("cuda", dtype=torch.bfloat16)
from huggingface_hub import hf_hub_download
sf = hf_hub_download(MODEL, "model.safetensors")
with safe_open(sf, framework="pt") as f:
model.score.norm.weight.data.copy_(f.get_tensor("score.norm.weight"))
model.score.norm.bias.data.copy_(f.get_tensor("score.norm.bias"))
model.score.linear.weight.data.copy_(f.get_tensor("score.linear.weight"))
model.config.pad_token_id = tok.pad_token_id
model.eval()
text = "The original text..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024).to("cuda")
with torch.no_grad(), torch.autocast("cuda", dtype=torch.bfloat16):
logits = model(**enc).logits
probs = logits.float().softmax(-1).cpu().numpy()[0]
bucket = int(probs.argmax())
score = float(probs @ [0, 1, 2, 3]) / 3
print(f"bucket={bucket} score={score:.3f}")
Bucket interpretation
| Bucket | Approx cosine distance from source | Meaning |
|---|---|---|
| 0 | ≤ 0.03 | Verbatim human |
| 1 | 0.03–0.07 | Light AI touch-up |
| 2 | 0.07–0.15 | Heavier AI rewrite |
| 3 | ≥ 0.15 | AI-generated |
Test-set accuracy
Threshold calibrated on each model's own val.csv. Numbers are ternary accuracy (4-class output collapsed to 3-class human/edited/AI-gen).
| Test set | description | v1 | v2 | v3 |
|---|---|---|---|---|
pangram/editlens_iclr test (paper) |
2025-vintage GPT-4.1 / Sonnet 4 / Gemini 2.5 / Llama 3.3 | 0.912 | 0.858 | 0.904 |
| v3 in-domain test | 2026-vintage Sonnet 4.6 / GPT-5.3 / Gemini-3-Flash | 0.846 | 0.915 | 0.930 |
| Enron holdout (OOD email domain) | same generators as paper | 0.860 | 0.868 | 0.874 |
| RAID 10K (binary) | 2022-2023 chatgpt / gpt4 / llama-chat / mistral-chat | 0.982 | 0.940 | 0.964 |
v3 binary modes (in-domain test)
| Mode | Accuracy | Macro F1 |
|---|---|---|
| human_vs_ai | 0.998 | 0.998 |
| human_vs_rest | 0.959 | 0.954 |
| ai_vs_rest | 0.972 | 0.968 |
Per-generator on the paper's test set (the rebalance question)
The v2 model's notable weakness was Gemini-2.5 ai_generated detection, where it scored 0.852 vs v1's 0.951. v3's 3-way generator mix closes that gap and then some:
| generator | v1 | v2 | v3 |
|---|---|---|---|
| gemini-2.5-flash ai_generated | 0.951 | 0.852 | 0.957 |
| claude-sonnet-4 ai_generated | 0.976 | 0.973 | 0.989 |
| gpt-4.1 ai_generated | 0.958 | 0.867 | 0.937 |
| llama-3.3-70B ai_generated | 1.000 | 0.946 | 0.973 |
License
CC BY-NC-SA 4.0 (matches the original EditLens release).
Citation
@misc{thai2025editlensquantifyingextentai,
title={EditLens: Quantifying the Extent of AI Editing in Text},
author={Katherine Thai and Bradley Emi and Elyas Masrour and Mohit Iyyer},
year={2025},
eprint={2510.03154},
archivePrefix={arXiv},
primaryClass={cs.CL},
}
- Downloads last month
- 9