phammminhhieu commited on
Commit
212ec41
·
verified ·
1 Parent(s): d91ab76

Add model card

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - vi
5
+ tags:
6
+ - mamba
7
+ - hypernetwork
8
+ - persona
9
+ - grpo
10
+ - personalization
11
+ license: mit
12
+ ---
13
+
14
+ # Mamba Hypernetwork Personalization v2
15
+
16
+ Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.
17
+
18
+ ## Architecture
19
+ - **Hypernetwork:** Mamba SSM encoder + delta heads (LoRA-style)
20
+ - **Target LLM:** Injected via forward hooks on q_proj / v_proj (8 layers)
21
+ - **Training:** GRPO with combined reward (RM + CR + PL + DIV)
22
+
23
+ ## Training Config
24
+ - LR: 1e-5 (cosine schedule)
25
+ - LAMBDA_GRPO: 0.2
26
+ - LAMBDA_KL: 0.08
27
+ - DELTA_SCALE: 0.003
28
+ - Epochs: 5 | Steps: 350
29
+
30
+ ## Reward Weights
31
+ | Metric | Weight | Description |
32
+ |--------|--------|-------------|
33
+ | RM | +0.55 | Persona grounding |
34
+ | CR | +0.25 | Context relevance |
35
+ | PL | -0.30 | Persona leakage penalty |
36
+ | DIV | +0.10 | Response diversity |
37
+
38
+ ## Checkpoint Info
39
+ - Saved at: epoch 5, step 400
40
+ - Date: 2026-05-13
41
+
42
+ ## Files
43
+ - `mamba_weights_only.pt` — model weights only (for inference)
44
+ - `ckpt_e5_s350.pt` — full checkpoint (for resume training)