ZeroTime-Bot: Medical Triage Alignment with GRPO and Unsloth

Community Article Published January 16, 2026

From "Over-Triage" to Clinical Accuracy
The Challenge: Taming AI's Safety Bias in Medical Triage
The Solution: Unleashing GRPO with Unsloth
Our Training Journey: Overcoming Colab's Challenges
The Heart of Alignment: Our Reward Function
The Proof: Evaluation Results
🚀 Experience ZeroTime-Bot Yourself!
🔗 Project Links
From "Over-Triage" to Clinical Accuracy

The Challenge: Taming AI's Safety Bias in Medical Triage

In critical sectors like healthcare, AI's inherent "safety-first" bias often leads to over-triage. A model, eager not to miss a critical condition, might classify even a minor injury (like a stubbed toe) as an "Emergency" (Level 1). While seemingly benign, this "triage inflation" can dangerously overburden emergency rooms, diverting resources from truly critical patients.

Our goal with ZeroTime-Bot was to create an AI medical triage agent that:

Maintains patient safety for genuine emergencies.
Accurately de-escalates minor injuries to appropriate non-urgent levels.
Provides transparent reasoning for its decisions.

The Solution: Unleashing GRPO with Unsloth

To tackle this challenge, we leveraged a powerful combination of technologies:

Unsloth: For efficient and significantly faster fine-tuning of Large Language Models (LLMs). Unsloth's optimizations were critical in iterating quickly on our RL experiments within Google Colab's High-RAM environment.
GRPO (Group Relative Policy Optimization): A cutting-edge Reinforcement Learning (RL) algorithm designed for LLM alignment. Unlike traditional Supervised Fine-Tuning (SFT) which teaches imitation, GRPO enables the model to reason through a reward-based feedback loop.

Why GRPO for Medical Triage?

GRPO offers several key advantages for this specific task:

"Critic-Less" RL: It sidesteps the need for a separate Value (Critic) model, which drastically reduces VRAM requirements and training time. This allowed us to align a robust Llama-3.1 8B model effectively.
Comparative Learning: For each medical scenario, GRPO generates a "group" of possible reasoning paths and triage decisions. It then compares these generated responses against a human-defined reward function.
Targeted Alignment: This comparative approach allowed us to specifically penalize "over-triage" while rewarding clinically sound and safe decisions.

Our Training Journey: Overcoming Colab's Challenges

The path to ZeroTime-Bot wasn't without its hurdles:

Colab Runtime Instability: Initial attempts faced frequent NameError and ModuleNotFoundError due to Colab runtime resets and library version mismatches.
High-RAM Requirement: The 16-bit merge phase of the 8B Llama model required a dedicated High-RAM Colab Pro instance to prevent memory crashes.
GGUF Export Issues: Ensuring the final 5GB .gguf file correctly synced to Google Drive for Hugging Face upload required specific shutil.copy commands to bypass Colab's syncing latency.
Ollama Integration: Getting the model running locally on Mac via Ollama involved precise Modelfile creation and troubleshooting invalid model name errors, reinforcing the importance of clean terminal commands.

The Heart of Alignment: Our Reward Function

The core of GRPO's success lay in our custom reward functions, especially the correctness_reward_func. This function wasn't just about matching an exact answer; it was about shaping the model's judgment:

def correctness_reward_func(prompts, completions, answer, **kwargs):
    """
    Rewards the model for accurately assigning triage levels (1, 2, or 3),
    with a specific focus on penalizing over-triage for minor conditions.
    """
    responses = [completion[0]['content'] for completion in completions]
    extracted = [re.search(r"<answer>(.*?)</answer>", r).group(1).strip() if re.search(r"<answer>(.*?)</answer>", r) else "" for r in responses]
    
    rewards = []
    for ext, ans in zip(extracted, answer):
        # High reward for correct match, especially for de-escalation
        if str(ext) == str(ans):
            rewards.append(2.0)
        # Moderate penalty for over-triage (e.g., Level 3 becomes Level 1/2)
        elif int(ext) < int(ans): # If the model triaged higher than it should have
            rewards.append(-1.0)
        else: # Incorrect or under-triage
            rewards.append(0.0) # Or a smaller penalty
    return rewards

# An additional reward function encouraged structured <reasoning> and <answer> tags.

This reward structure directly addressed the "stubbed toe" problem, teaching the model that a stable patient with a non-critical injury should not be triaged as an emergency, even if they report pain.

The Proof: Evaluation Results

Our GRPO-aligned ZeroTime-Bot significantly improved triage accuracy compared to the base Llama-3.1 model:

Patient Scenario	Base Llama-3.1	ZeroTime-Bot (GRPO)	Status
Stubbed Toe (Walks, Redness)	Level 1 (Emergency)	Level 3 (Non-Urgent)	✅ Fixed Bias
Chest Pain (Shortness of breath)	Level 1 (Emergency)	Level 1 (Emergency)	✅ Kept Safety
Minor Fever (Stable Vitals)	Level 2 (Urgent)	Level 3 (Non-Urgent)	✅ Fixed Bias

🚀 Experience ZeroTime-Bot Yourself!

You can test ZeroTime-Bot locally on your own machine:

Install Ollama: Download and install Ollama from ollama.com.
Download Model Weights: Get the medical_triage_q4_k_m.gguf file from the "Files and versions" tab of our Hugging Face Repository.

Prepare Modelfile: In the same directory as the GGUF, create a Modelfile with the following content (update the FROM path to your downloaded GGUF):

FROM /Users/yourname/path/to/Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf

PARAMETER temperature 0.1

SYSTEM """
You are a professional medical triage officer. 
Assign a level: 1 (Emergency), 2 (Urgent), or 3 (Non-Urgent).
Be precise and do not over-triage minor injuries.
"""

Create and Run: Open your terminal in the same directory and execute:
```
ollama create medicalbot -f Modelfile
ollama run medicalbot
```

🔗 Project Links

Hugging Face Model: hjogidasani/medical-triage-llama-3.1-8b
GitHub Repository: DevCraft89/MedicalBot

Models mentioned in this article 1

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote