--- library_name: transformers tags: - generated_from_trainer - trl - grpo - hf_jobs - clinical-recruitment - openenv - long-horizon - lora license: mit base_model: Qwen/Qwen3-1.7B --- # Clinical Recruitment GRPO Agent (Best Run — 80 Steps) This is the best trained LoRA adapter for the [Adaptive Clinical Recruitment Environment](https://huggingface.co/spaces/pratimassaravanan/clinical-recruitment), produced by SFT warmup followed by 80-step GRPO on Qwen3-1.7B (NVIDIA L4, 24GB). **Hackathon positioning: Theme #2 (Super Long-Horizon Planning).** ## Training Summary | Metric | Value | |--------|-------| | Base model | Qwen/Qwen3-1.7B | | Method | SFT warmup + 80-step GRPO | | GPU | NVIDIA L4 (24GB) via HF Jobs | | Duration | 142 min total (3.5 min SFT + 141.7 min GRPO) | | Reward (start) | 0.269 | | Reward (end) | 0.331 | | Tool calls/step (start) | 3.5 | | Tool calls/step (end) | 11 | | Enrollment/rollout | 3-4 patients | | Zero-std collapse rate | 10% (8/80 steps, intermittent) | | LoRA rank | 16 | | LoRA alpha | 16 | | LoRA dropout | 0.05 | ## What the Model Learned - Calls `screen_patient` as first action (previously collapsed to `adjust_strategy`) - Follows the correct pipeline: screen -> recontact -> enrollment - Enrolls 3-4 patients per episode on easy_bench - Makes 11+ tool calls per rollout (up from 3.5) - Recovers from intermittent collapse without sustained degradation ## Training Plots ![GRPO 80-Step Training](https://huggingface.co/pratimassaravanan/clinical-recruitment-artifacts/resolve/main/article_assets/grpo_80step_training.png) ![All Runs Comparison](https://huggingface.co/pratimassaravanan/clinical-recruitment-artifacts/resolve/main/article_assets/all_runs_comparison.png) ## Honest Limits - Enrollment stays at 3-4/80 target patients per episode - Zero-std collapse still occurs on ~10% of steps - Reward plateaued at 0.33 after 80 steps ## Links - **Live environment**: [pratimassaravanan-clinical-recruitment.hf.space](https://pratimassaravanan-clinical-recruitment.hf.space) - **HF Space**: [pratimassaravanan/clinical-recruitment](https://huggingface.co/spaces/pratimassaravanan/clinical-recruitment) - **Training script**: `train_sft_grpo_hfjob.py` in the Space repo - **30-step first run**: [pratimassaravanan/grpo_output](https://huggingface.co/pratimassaravanan/grpo_output) - **SFT+REINFORCE model**: [pratimassaravanan/clinical-qwen3-4b-sft-lora](https://huggingface.co/pratimassaravanan/clinical-qwen3-4b-sft-lora) - **Artifacts**: [pratimassaravanan/clinical-recruitment-artifacts](https://huggingface.co/pratimassaravanan/clinical-recruitment-artifacts) ## Framework Versions - TRL: 1.2.0 - Transformers: 5.6.2 - PyTorch: 2.11.0 - PEFT: 0.19.1