gabriel-xiong/qwen3-8b-grpo-v6-epoch1
GRPO-trained memory-update head for the LLM-as-RNN clinical pipeline. LoRA merged into base model for direct inference (no PEFT runtime needed).
- Base model:
Qwen/Qwen3-8B - Training experiment:
phase1_v6judge_run01_Qwen3_8b_3epoch - Source checkpoint:
global_step_17 - Repo SHA at merge time:
68ca31d
Run manifest
# Run manifest — generated by run_phase1.sh
project_name : llmrnn_grpo
experiment_name : phase1_v6judge_run01_Qwen3_8b_3epoch
launched_at : 2026-06-09T07:05:19Z
hostname : c316-008.ls6.tacc.utexas.edu
# Code identity
git_sha : 68ca31d
git_status : dirty
launcher_script : /var/spool/slurmd/job3217956/slurm_script
# Judge / reward
rubric_yaml : training/configs/rubric_v6_structured_postmortem.yaml
rubric_yaml_sha1 : 726ddf579973de17bc6800e744107a19f235ca54
judge_model : OpenRubrics/RubricARM-8B-Judge
judge_endpoint : http://localhost:8001
# Data
train_parquet : /scratch/11566/gabriel_xiong/data/llm_as_rnn/train.parquet
val_parquet : /scratch/11566/gabriel_xiong/data/llm_as_rnn/train.parquet
# Policy + GRPO
base_model : Qwen/Qwen3-8B
lora_rank : 16
lora_alpha : 32
lr : 3e-6
kl_coef : 0.001
kl_loss_type : low_var_kl
G_rollouts : 8
train_batch : 16
ppo_mini_batch : 16
total_epochs : 3
- Downloads last month
- 20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support