zcahjl3 commited on
Commit
d267e2f
·
verified ·
1 Parent(s): 6804017

initial push: A+D (hard BCE + flip-margin + LoRA) seed 0 (macro_f1=0.8544)

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -36,21 +36,19 @@ Part of the HAT-Baselines detector suite. This model predicts a per-sentence
36
 
37
  - **base_model**: microsoft/deberta-v3-base
38
  - **max_seq_len**: 512
39
- - **batch_size**: 4 (A) / 2 (A+B) / sampler-driven (A+D)
40
- - **grad_accum**: 2 / 4 / sampler-driven
 
 
 
 
 
41
  - **lr**: 2e-5
42
  - **weight_decay**: 0.01
43
  - **warmup_frac**: 0.1
44
  - **bf16**: yes
45
  - **seed**: 0
46
- - **epochs**: 5
47
  - **best-ckpt selection**: dev macro_f1
48
- - **loss**: BCE + flip-margin (weight=0.3, margin=1.0)
49
- - **lora_r**: 16
50
- - **lora_alpha**: 32
51
- - **lora_dropout**: 0.1
52
- - **lora_targets**: query_proj, key_proj, value_proj
53
- - **sampler**: EssayGroupBatchSampler (1 essay = 1 batch)
54
 
55
  Reproduction command (from the sentence-trajectory research worktree):
56
  ```bash
 
36
 
37
  - **base_model**: microsoft/deberta-v3-base
38
  - **max_seq_len**: 512
39
+ - **fine_tuning**: LoRA (r=16, α=32, dropout=0.1, targets=query_proj/key_proj/value_proj)
40
+ - **loss**: BCE + flip-margin (flip_weight=0.3, flip_margin=1.0)
41
+ - **sampler**: EssayGroupBatchSampler
42
+ - **batch_size**: sampler-driven (1 essay per batch, ~9 sentences/version per essay)
43
+ - **grad_accum**: sampler-driven
44
+ - **effective_batch_size**: 1 essay group (all versions jointly) per optimizer step
45
+ - **epochs**: 5
46
  - **lr**: 2e-5
47
  - **weight_decay**: 0.01
48
  - **warmup_frac**: 0.1
49
  - **bf16**: yes
50
  - **seed**: 0
 
51
  - **best-ckpt selection**: dev macro_f1
 
 
 
 
 
 
52
 
53
  Reproduction command (from the sentence-trajectory research worktree):
54
  ```bash