editlens-qwen3-4b-merged-v2

Qwen3-4B fine-tuned with QLoRA on a 2026-vintage reproduction of the EditLens dataset, with the LoRA adapter merged into the base in bf16. Like editlens-qwen3-4b-merged, but trained on data generated by newer LLMs and with an added Twitter source domain.

Trained for the EditLens task (arXiv:2510.03154): classify text by how much AI editing it has received. Predicts a continuous score in [0, 1] from a 4-bucket softmax (bucket_pred ∈ {0, 1, 2, 3} mapped to score = bucket / 3).

What changed vs editlens-qwen3-4b-merged (v1)

v1 v2 (this model)
Source-text domains 5 (amazon/google reviews, reddit writing prompts, fineweb_edu, news) 6 — adds Twitter
Word-count window 75–800 20–800 (admits short-form text)
Generators (mix) GPT-4.1 + Sonnet 4 + Gemini 2.5 Flash + Llama-3.3-70B Claude Sonnet 4.6 + GPT-5.3
Embedding model (cosine) Linq-AI-Research/Linq-Embed-Mistral Linq-AI-Research/Linq-Embed-Mistral (same)
Bucket thresholds lo=0.03, hi=0.15 lo=0.03, hi=0.15 (same)
Train rows 60,000 75,375
Val / Test rows 2,400 / 6,000 3,200 / 7,567
Editing prompts 303 (paper Appendix K) 301 (verified subset of 303)

Use v2 if you care about detection on 2026-era LLM outputs or short-form social-media text. Use v1 if you specifically need detection that has seen the paper's generator mix (incl. Gemini 2.5 Flash and Llama 3.3-70B).

Quick start

The score head is a custom LayerNorm + Linear (NormedLinear) module rather than a bare Linear, so it doesn't auto-load via from_pretrained. Reattach and copy weights manually:

import torch
import torch.nn as nn
from safetensors import safe_open
from transformers import AutoModelForSequenceClassification, AutoTokenizer

class NormedLinear(nn.Module):
    def __init__(self, hidden_size, num_labels, dtype=torch.bfloat16):
        super().__init__()
        self.norm = nn.LayerNorm(hidden_size, dtype=dtype)
        self.linear = nn.Linear(hidden_size, num_labels, bias=False, dtype=dtype)
    def forward(self, x):
        return self.linear(self.norm(x))

MODEL = "DarrenJiaImbue/editlens-qwen3-4b-merged-v2"
tok = AutoTokenizer.from_pretrained(MODEL)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token
    tok.padding_side = "left"

model = AutoModelForSequenceClassification.from_pretrained(MODEL, dtype=torch.bfloat16).to("cuda")
n = model.config.num_labels
model.score = NormedLinear(model.config.hidden_size, n).to("cuda", dtype=torch.bfloat16)

from huggingface_hub import hf_hub_download
sf = hf_hub_download(MODEL, "model.safetensors")
with safe_open(sf, framework="pt") as f:
    model.score.norm.weight.data.copy_(f.get_tensor("score.norm.weight"))
    model.score.norm.bias.data.copy_(f.get_tensor("score.norm.bias"))
    model.score.linear.weight.data.copy_(f.get_tensor("score.linear.weight"))
model.config.pad_token_id = tok.pad_token_id
model.eval()

text = "The original text..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024).to("cuda")
with torch.no_grad(), torch.autocast("cuda", dtype=torch.bfloat16):
    logits = model(**enc).logits
probs = logits.float().softmax(-1).cpu().numpy()[0]
bucket = int(probs.argmax())
score = float(probs @ [0, 1, 2, 3]) / 3
print(f"bucket={bucket} score={score:.3f}")

Bucket interpretation

Bucket Approx cosine distance from source Meaning
0 ≤ 0.03 Verbatim human
1 0.03–0.07 Light AI touch-up
2 0.07–0.15 Heavier AI rewrite
3 ≥ 0.15 AI-generated

Test-set accuracy

Threshold calibrated on val.csv, evaluated on test.csv from the v2 dataset (~7.6K rows, 6 domains incl. Twitter):

Mode Accuracy Macro F1
human_vs_ai 0.999 0.999
human_vs_rest 0.958 0.953
ai_vs_rest 0.957 0.953
ternary 0.915 0.914

For comparison, evaluated on the original pangram/editlens_iclr test set (out-of-distribution for v2 — uses generators v2 never trained on):

Mode v2 (this model) v1 (editlens-qwen3-4b-merged)
human_vs_ai 1.000 1.000
human_vs_rest 0.915 0.937
ai_vs_rest 0.943 0.975
ternary 0.858 0.912

The v2 model trades some performance on v1-era generators (which it never saw) for stronger performance on 2026 generators and short-form text. If you want both, use both models in ensemble.

License

CC BY-NC-SA 4.0 (matches the original EditLens release).

Citation

@misc{thai2025editlensquantifyingextentai,
  title={EditLens: Quantifying the Extent of AI Editing in Text},
  author={Katherine Thai and Bradley Emi and Elyas Masrour and Mohit Iyyer},
  year={2025},
  eprint={2510.03154},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
}
Downloads last month
5
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DarrenJiaImbue/editlens-qwen3-4b-merged-v2

Finetuned
Qwen/Qwen3-4B
Finetuned
(704)
this model

Paper for DarrenJiaImbue/editlens-qwen3-4b-merged-v2