HAT-Baselines
/

deberta-v3-base-hatbench-AD-flipmargin-seed0

@@ -36,21 +36,19 @@ Part of the HAT-Baselines detector suite. This model predicts a per-sentence
 - **base_model**: microsoft/deberta-v3-base
 - **max_seq_len**: 512
-- **batch_size**: 4 (A) / 2 (A+B) / sampler-driven (A+D)
-- **grad_accum**: 2 / 4 / sampler-driven
 - **lr**: 2e-5
 - **weight_decay**: 0.01
 - **warmup_frac**: 0.1
 - **bf16**: yes
 - **seed**: 0
-- **epochs**: 5
 - **best-ckpt selection**: dev macro_f1
-- **loss**: BCE + flip-margin (weight=0.3, margin=1.0)
-- **lora_r**: 16
-- **lora_alpha**: 32
-- **lora_dropout**: 0.1
-- **lora_targets**: query_proj, key_proj, value_proj
-- **sampler**: EssayGroupBatchSampler (1 essay = 1 batch)
 Reproduction command (from the sentence-trajectory research worktree):
 ```bash

 - **base_model**: microsoft/deberta-v3-base
 - **max_seq_len**: 512
+- **fine_tuning**: LoRA (r=16, α=32, dropout=0.1, targets=query_proj/key_proj/value_proj)
+- **loss**: BCE + flip-margin (flip_weight=0.3, flip_margin=1.0)
+- **sampler**: EssayGroupBatchSampler
+- **batch_size**: sampler-driven (1 essay per batch, ~9 sentences/version per essay)
+- **grad_accum**: sampler-driven
+- **effective_batch_size**: 1 essay group (all versions jointly) per optimizer step
+- **epochs**: 5
 - **lr**: 2e-5
 - **weight_decay**: 0.01
 - **warmup_frac**: 0.1
 - **bf16**: yes
 - **seed**: 0
 - **best-ckpt selection**: dev macro_f1
 Reproduction command (from the sentence-trajectory research worktree):
 ```bash