phammminhhieu
/

mamba-hypernetwork-personalization_v2

personalization

Model card Files Files and versions

phammminhhieu commited on 24 days ago

Commit

212ec41

·

verified ·

1 Parent(s): d91ab76

Add model card

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+language:
+- en
+- vi
+tags:
+- mamba
+- hypernetwork
+- persona
+- grpo
+- personalization
+license: mit
+---
+# Mamba Hypernetwork Personalization v2
+Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.
+## Architecture
+- **Hypernetwork:** Mamba SSM encoder + delta heads (LoRA-style)
+- **Target LLM:** Injected via forward hooks on q_proj / v_proj (8 layers)
+- **Training:** GRPO with combined reward (RM + CR + PL + DIV)
+## Training Config
+- LR: 1e-5 (cosine schedule)
+- LAMBDA_GRPO: 0.2
+- LAMBDA_KL: 0.08
+- DELTA_SCALE: 0.003
+- Epochs: 5 | Steps: 350
+## Reward Weights
+| Metric | Weight | Description |
+|--------|--------|-------------|
+| RM     | +0.55  | Persona grounding |
+| CR     | +0.25  | Context relevance |
+| PL     | -0.30  | Persona leakage penalty |
+| DIV    | +0.10  | Response diversity |
+## Checkpoint Info
+- Saved at: epoch 5, step 400
+- Date: 2026-05-13
+## Files
+- `mamba_weights_only.pt` — model weights only (for inference)
+- `ckpt_e5_s350.pt` — full checkpoint (for resume training)