joelleachkar commited on
Commit
e85fa38
·
verified ·
1 Parent(s): 5f63d12

Update README for v3 GRPO general knowledge model

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ## v3 GRPO general knowledge model
6
+
7
+ Updated: 2026-06-04 11:22 UTC
8
+
9
+ This repository stores the final v3 GRPO general knowledge model for the CS-552 2026 Databand project.
10
+
11
+ Model source on the training cluster:
12
+
13
+ /scratch/general_knowledge_sft_v3_lora_grpo/outputs/grpo_v3_maxredux_4000/final
14
+
15
+ The model was trained from the v3 LoRA SFT model using GRPO on the MMLU-Pro / MMLU-Redux general-knowledge data split.
16
+
17
+ The final model files were verified locally before upload, including:
18
+
19
+ - config.json
20
+ - generation_config.json
21
+ - model.safetensors
22
+ - tokenizer.json
23
+ - tokenizer_config.json
24
+ - chat_template.jinja
25
+
26
+ Important generation/config fields:
27
+
28
+ - bos_token_id = 151643
29
+ - eos_token_id = 151645
30
+ - pad_token_id = 151643
31
+ - use_cache = True
32
+ - generation eos_token_id = [151645, 151643]
33
+ - temperature = 0.1
34
+ - top_k = 20
35
+ - top_p = 0.8
36
+
37
+ Expected output format:
38
+
39
+ \boxed{LETTER}
40
+