RoBERTa Medical Triage

This repository contains a fine-tuned RoBERTa model for medical pre-triage text classification. The model receives a natural-language symptom description and predicts one of three triage-oriented risk levels:

  • self_monitor
  • consult_gp
  • urgent

This model was developed as part of a bachelor's thesis project by Cristian Untaru at the West University of Timișoara, Faculty of Informatics.

Model Description

The model is based on FacebookAI/roberta-base and was fine-tuned for a three-class medical pre-triage classification task.

RoBERTa is a general-purpose Transformer encoder model that improves the original BERT pretraining procedure through an optimized training strategy. In this project, it is used to evaluate whether a general-language RoBERTa model can perform effectively on symptom-based medical pre-triage classification after fine-tuning.

The model is not a diagnostic system. It is designed for academic and experimental use in the context of natural language processing and medical pre-triage research.

Intended Use

The model can be used for:

  • classifying symptom descriptions into preliminary triage levels;
  • supporting an academic medical pre-triage assistant prototype;
  • comparing transformer-based models for medical text classification;
  • experimenting with NLP-based symptom interpretation.

Example input:

I have chest pain, shortness of breath, and I feel dizzy.

Example output:

urgent

Labels

The model predicts one of the following labels:

Label Meaning
self_monitor The symptoms may be monitored by the patient, assuming no worsening or additional warning signs.
consult_gp The patient should consider consulting a general practitioner or a non-emergency medical professional.
urgent The symptoms may require urgent medical attention or emergency evaluation.

The label mapping is also available in the repository in label_map.json.

Dataset

The classifier was fine-tuned on the cristian-untaru/symcat-medical-triage-dataset, a SymCAT-derived triage dataset created for symptom-based medical pre-triage classification.

The dataset contains natural-language medical symptom examples labeled into three triage categories:

  • self_monitor
  • consult_gp
  • urgent

MedQuAD was not used for fine-tuning this classifier. Instead, the processed MedQuAD corpus is used separately in the broader assistant system as a retrieval dataset for contextual medical question-answer information:

cristian-untaru/medquad-retrieval-pretriage

Dataset split used for fine-tuning:

Split Number of examples
Train 490
Validation 105
Test 106
Total 701

The split was stratified in order to preserve the distribution of the three triage classes across train, validation, and test subsets.

Training Details

Base model:

FacebookAI/roberta-base

Training configuration:

Parameter Value
Maximum sequence length 128
Epochs 5
Train batch size 16
Evaluation batch size 32
Learning rate 2e-5
Weight decay 0.01
Warmup ratio 0.1
Seed 42
Mixed precision FP16
Early stopping patience 3
Best model metric Macro F1
GPU Tesla T4

Additional training metadata is included in:

  • training_args.bin
  • training_config.json

Evaluation Results

The model was evaluated on the held-out test set.

Metric Value
Accuracy 0.6792
Macro Precision 0.6908
Macro Sensitivity 0.6742
Macro Specificity 0.8355
Macro F1 0.6772
Macro AUC OvR 0.8523
Macro IoU / Jaccard 0.5181
Test Loss 0.8294

Per-class precision, sensitivity, and F1-score:

Class Precision Sensitivity F1-score Support
self_monitor 0.68 0.70 0.69 33
consult_gp 0.63 0.60 0.62 40
urgent 0.65 0.67 0.66 33

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo_id = "cristian-untaru/roberta-medical-triage"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

text = "I have chest pain, shortness of breath, and I feel dizzy."

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=-1)
    predicted_class_id = torch.argmax(probabilities, dim=-1).item()

id2label = model.config.id2label

predicted_label = id2label[predicted_class_id]
confidence = probabilities[0][predicted_class_id].item()

print("Predicted label:", predicted_label)
print("Confidence:", round(confidence, 4))

Repository Files

File Description
config.json Model architecture and classification configuration.
model.safetensors Fine-tuned model weights.
tokenizer.json Tokenizer vocabulary and processing pipeline.
tokenizer_config.json Tokenizer configuration.
vocab.json RoBERTa tokenizer vocabulary file.
merges.txt RoBERTa byte-pair encoding merge rules.
label_map.json Mapping between class IDs and triage labels.
training_args.bin Training arguments saved by the Hugging Face Trainer.
training_config.json Additional training configuration for reproducibility.
.gitattributes Git LFS configuration for large model files.
README.md Model Card documentation.

Related Dataset and Model Repositories

Related dataset repositories:

Related model repositories:

Additional fourth model repository may be added separately after training and publication.

Limitations

This model has several important limitations:

  • It was trained on a small academic dataset.
  • It should not be used as a standalone medical decision-making system.
  • It does not replace professional medical advice, diagnosis, or treatment.
  • It may produce incorrect predictions for ambiguous, incomplete, rare, or complex symptom descriptions.
  • It does not have access to patient history, vital signs, medical records, age, comorbidities, medication history, or physical examination findings.
  • The model was trained in English and should not be assumed to perform reliably on other languages without additional validation.
  • The labels were derived through weak supervision and rule-based triage logic, not through clinically validated manual annotation.
  • The model is based on a general-purpose RoBERTa checkpoint, not a biomedical-domain RoBERTa model.
  • The final test performance is lower than the validation performance, which may reflect the small dataset size, weak-supervision labeling noise, and the difficulty of generalizing across symptom-condition examples.
  • In manual qualitative tests, the model may show uncertainty or a tendency to favor consult_gp for some ambiguous or severe symptom descriptions; results should therefore be interpreted with caution.

Medical Disclaimer

This model is intended only for academic, research, and prototype development purposes. It does not provide medical diagnosis and must not be used as a substitute for professional medical judgment.

In case of severe, worsening, or life-threatening symptoms, users should contact emergency medical services or a qualified healthcare professional.

Author

Cristian Untaru
Faculty of Informatics
West University of Timișoara


Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cristian-untaru/roberta-medical-triage

Finetuned
(2346)
this model

Dataset used to train cristian-untaru/roberta-medical-triage

Evaluation results