ashishnair commited on
Commit
06a22fe
·
verified ·
1 Parent(s): f34c3a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -5
README.md CHANGED
@@ -126,26 +126,21 @@ The `moral_scenarios` drop is the most significant. MMLU moral scenarios test ri
126
 
127
  ### Stage 1 — Base merge (DARE-TIES)
128
  Merged `meta-llama/Llama-3.1-8B` (weight 0.3 / density 0.5) with `Gurubot/self-after-dark` (weight 0.7 / density 0.8). Personality-dominant — the character model leads; the base provides structural stability.
129
- Output: `merged_weird1`
130
 
131
  ### Stage 2 — SFT round 1
132
  Fine-tuned on 2,000-sample curated human texting corpus — short, emotionally varied conversational fragments.
133
  Train loss: 1.7368 · Runtime: 44.8 min · 2.23 samples/sec
134
- Output: `merged_weird1_sft`
135
 
136
  ### Stage 3 — Second merge (DARE-TIES)
137
  Merged `merged_weird1_sft` (weight 0.7 / density 0.8) with `meta-llama/Llama-3.1-8B-Instruct` (weight 0.3 / density 0.5). Recovers instruction-following capacity at low weight without overwriting persona.
138
- Output: `merged_B_weird2`
139
 
140
  ### Stage 4 — SFT round 2
141
  Fine-tuned on ~900-sample GPT-generated multi-persona instruction dataset covering 10+ named personas (Zoe, Maya, Iris, Hana, Tess, Kim, Vera, Cass, Nora, and others) across varied professional backgrounds and emotional states.
142
  Train loss: 1.1821 · Runtime: 31.4 min · 1.44 samples/sec
143
- Output: `merged_weird2_sft1`
144
 
145
  ### Stage 5 — SFT round 3 (final)
146
  Final fine-tuning pass on the original dialogue corpus to re-anchor conversational naturalness after instruction alignment.
147
  Train loss: 1.4733 · Runtime: 44.8 min · 2.23 samples/sec
148
- Output: **Llama-Ione-8B-roleplay-v1**
149
 
150
  ### Training statistics
151
 
 
126
 
127
  ### Stage 1 — Base merge (DARE-TIES)
128
  Merged `meta-llama/Llama-3.1-8B` (weight 0.3 / density 0.5) with `Gurubot/self-after-dark` (weight 0.7 / density 0.8). Personality-dominant — the character model leads; the base provides structural stability.
 
129
 
130
  ### Stage 2 — SFT round 1
131
  Fine-tuned on 2,000-sample curated human texting corpus — short, emotionally varied conversational fragments.
132
  Train loss: 1.7368 · Runtime: 44.8 min · 2.23 samples/sec
 
133
 
134
  ### Stage 3 — Second merge (DARE-TIES)
135
  Merged `merged_weird1_sft` (weight 0.7 / density 0.8) with `meta-llama/Llama-3.1-8B-Instruct` (weight 0.3 / density 0.5). Recovers instruction-following capacity at low weight without overwriting persona.
 
136
 
137
  ### Stage 4 — SFT round 2
138
  Fine-tuned on ~900-sample GPT-generated multi-persona instruction dataset covering 10+ named personas (Zoe, Maya, Iris, Hana, Tess, Kim, Vera, Cass, Nora, and others) across varied professional backgrounds and emotional states.
139
  Train loss: 1.1821 · Runtime: 31.4 min · 1.44 samples/sec
 
140
 
141
  ### Stage 5 — SFT round 3 (final)
142
  Final fine-tuning pass on the original dialogue corpus to re-anchor conversational naturalness after instruction alignment.
143
  Train loss: 1.4733 · Runtime: 44.8 min · 2.23 samples/sec
 
144
 
145
  ### Training statistics
146