Add model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- vi
|
| 5 |
+
tags:
|
| 6 |
+
- mamba
|
| 7 |
+
- hypernetwork
|
| 8 |
+
- persona
|
| 9 |
+
- grpo
|
| 10 |
+
- personalization
|
| 11 |
+
license: mit
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Mamba Hypernetwork Personalization v2
|
| 15 |
+
|
| 16 |
+
Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.
|
| 17 |
+
|
| 18 |
+
## Architecture
|
| 19 |
+
- **Hypernetwork:** Mamba SSM encoder + delta heads (LoRA-style)
|
| 20 |
+
- **Target LLM:** Injected via forward hooks on q_proj / v_proj (8 layers)
|
| 21 |
+
- **Training:** GRPO with combined reward (RM + CR + PL + DIV)
|
| 22 |
+
|
| 23 |
+
## Training Config
|
| 24 |
+
- LR: 1e-5 (cosine schedule)
|
| 25 |
+
- LAMBDA_GRPO: 0.2
|
| 26 |
+
- LAMBDA_KL: 0.08
|
| 27 |
+
- DELTA_SCALE: 0.003
|
| 28 |
+
- Epochs: 5 | Steps: 350
|
| 29 |
+
|
| 30 |
+
## Reward Weights
|
| 31 |
+
| Metric | Weight | Description |
|
| 32 |
+
|--------|--------|-------------|
|
| 33 |
+
| RM | +0.55 | Persona grounding |
|
| 34 |
+
| CR | +0.25 | Context relevance |
|
| 35 |
+
| PL | -0.30 | Persona leakage penalty |
|
| 36 |
+
| DIV | +0.10 | Response diversity |
|
| 37 |
+
|
| 38 |
+
## Checkpoint Info
|
| 39 |
+
- Saved at: epoch 5, step 400
|
| 40 |
+
- Date: 2026-05-13
|
| 41 |
+
|
| 42 |
+
## Files
|
| 43 |
+
- `mamba_weights_only.pt` — model weights only (for inference)
|
| 44 |
+
- `ckpt_e5_s350.pt` — full checkpoint (for resume training)
|