--- language: en license: mit base_model: microsoft/deberta-v3-base tags: - text-classification - ai-text-detection - sentence-level - hat-bench - deberta - hatbench-detector datasets: - HAT-Baselines/hatbench library_name: peft --- # deberta-v3-base-hatbench-AD-flipmargin-seed0 Sentence-level AI-text detector for **HAT-Bench** — variant **A+D (hard BCE + flip-margin + LoRA)**, seed 0. Part of the HAT-Baselines detector suite. This model predicts a per-sentence `y_score ∈ [0, 1]` indicating whether each sentence has been AI-modified. ## Test metrics (HAT-Bench pooled test set) | Metric | Value | |---|---| | Headline macro-F1 (pooled) | **0.8544** | | Human F1 | 0.8439 | | AI F1 | 0.8649 | | v1–v7 macro-F1 (partial-AI only) | 0.8256 | | Accuracy | 0.8551 | | AUROC | 0.9405 | ## Training recipe - **base_model**: microsoft/deberta-v3-base - **max_seq_len**: 512 - **fine_tuning**: LoRA (r=16, α=32, dropout=0.1, targets=query_proj/key_proj/value_proj) - **loss**: BCE + flip-margin (flip_weight=0.3, flip_margin=1.0) - **sampler**: EssayGroupBatchSampler - **batch_size**: sampler-driven (1 essay per batch, ~9 sentences/version per essay) - **grad_accum**: sampler-driven - **effective_batch_size**: 1 essay group (all versions jointly) per optimizer step - **epochs**: 5 - **lr**: 2e-5 - **weight_decay**: 0.01 - **warmup_frac**: 0.1 - **bf16**: yes - **seed**: 0 - **best-ckpt selection**: dev macro_f1 Reproduction command (from the sentence-trajectory research worktree): ```bash conda run -n omni-text python research/exp/41_flip_margin.py --epochs 5 --seed 0 ``` **W&B run:** https://wandb.ai/jiacheng-liu-19-mbzuai/hat_bench/runs/7c5d1vac ## Loading ```python import torch from transformers import AutoTokenizer, AutoModel from peft import PeftModel from huggingface_hub import hf_hub_download REPO = "HAT-Baselines/deberta-v3-base-hatbench-AD-flipmargin-seed0" BASE = "microsoft/deberta-v3-base" tok = AutoTokenizer.from_pretrained(REPO, subfolder="tokenizer") base = AutoModel.from_pretrained(BASE) encoder = PeftModel.from_pretrained(base, REPO, subfolder="adapter") # sentence-classifier head: Linear(768, 256) -> GELU -> Dropout -> Linear(256, 1) head = torch.nn.Sequential( torch.nn.Linear(768, 256), torch.nn.GELU(), torch.nn.Dropout(0.1), torch.nn.Linear(256, 1), ) head_path = hf_hub_download(REPO, "head.pt") head.load_state_dict(torch.load(head_path, map_location="cpu")) ``` Apply per-sentence: mean-pool encoder hidden states over each sentence's token span, feed the pooled vector through `head`, and sigmoid the logit to get `y_score`. See `research/utils/data.py :: segment_mean` in the research worktree for the exact pooling. ## Citation If you use this model, please cite HAT-Bench (TBD).