--- license: apache-2.0 base_model: Qwen/Qwen3-8B tags: - llm-as-rnn - clinical - mimic-iv - grpo - lora-merged --- # gabriel-xiong/qwen3-8b-grpo-v2-epoch1 GRPO-trained memory-update head for the LLM-as-RNN clinical pipeline. LoRA merged into base model for direct inference (no PEFT runtime needed). - **Base model**: `Qwen/Qwen3-8B` - **Training experiment**: `phase1_v2judge_run02_Qwen3_8b_3epoch` - **Source checkpoint**: `global_step_17` - **Repo SHA at merge time**: `68ca31d` ## Run manifest ``` # Run manifest — generated by run_phase1.sh project_name : llmrnn_grpo experiment_name : phase1_v2judge_run02_Qwen3_8b_3epoch launched_at : 2026-06-01T16:13:50Z hostname : c315-005.ls6.tacc.utexas.edu # Code identity git_sha : 68ca31d git_status : clean launcher_script : /var/spool/slurmd/job3200030/slurm_script # Judge / reward rubric_yaml : training/configs/rubric_v2_rubricARM_scalar.yaml rubric_yaml_sha1 : e2f89fecfbcce51b98fcaad84a4b83128cb5c64d judge_model : OpenRubrics/RubricARM-8B-Judge judge_endpoint : http://localhost:8001 # Data train_parquet : /scratch/11566/gabriel_xiong/data/llm_as_rnn/train.parquet val_parquet : /scratch/11566/gabriel_xiong/data/llm_as_rnn/train.parquet # Policy + GRPO base_model : Qwen/Qwen3-8B lora_rank : 16 lora_alpha : 32 lr : 3e-6 kl_coef : 0.001 kl_loss_type : low_var_kl G_rollouts : 8 train_batch : 16 ppo_mini_batch : 16 total_epochs : 3 ```