RoBERTa Medical Triage

This repository contains a fine-tuned RoBERTa model for medical pre-triage text classification. The model receives a natural-language symptom description and predicts one of three triage-oriented risk levels:

self_monitor
consult_gp
urgent

This model was developed as part of a bachelor's thesis project by Cristian Untaru at the West University of Timișoara, Faculty of Informatics.

Model Description

The model is based on FacebookAI/roberta-base and was fine-tuned for a three-class medical pre-triage classification task.

RoBERTa is a general-purpose Transformer encoder model that improves the original BERT pretraining procedure through an optimized training strategy. In this project, it is used to evaluate whether a general-language RoBERTa model can perform effectively on symptom-based medical pre-triage classification after fine-tuning.

The model is not a diagnostic system. It is designed for academic and experimental use in the context of natural language processing and medical pre-triage research.

Intended Use

The model can be used for:

classifying symptom descriptions into preliminary triage levels;
supporting an academic medical pre-triage assistant prototype;
comparing transformer-based models for medical text classification;
experimenting with NLP-based symptom interpretation.

Example input:

I have chest pain, shortness of breath, and I feel dizzy.

Example output:

urgent

Labels

The model predicts one of the following labels:

Label	Meaning
`self_monitor`	The symptoms may be monitored by the patient, assuming no worsening or additional warning signs.
`consult_gp`	The patient should consider consulting a general practitioner or a non-emergency medical professional.
`urgent`	The symptoms may require urgent medical attention or emergency evaluation.

The label mapping is also available in the repository in label_map.json.

Dataset

The classifier was fine-tuned on the cristian-untaru/symcat-medical-triage-dataset, a SymCAT-derived triage dataset created for symptom-based medical pre-triage classification.

The dataset contains natural-language medical symptom examples labeled into three triage categories:

self_monitor
consult_gp
urgent

MedQuAD was not used for fine-tuning this classifier. Instead, the processed MedQuAD corpus is used separately in the broader assistant system as a retrieval dataset for contextual medical question-answer information:

cristian-untaru/medquad-retrieval-pretriage

Dataset split used for fine-tuning:

Split	Number of examples
Train	490
Validation	105
Test	106
Total	701

The split was stratified in order to preserve the distribution of the three triage classes across train, validation, and test subsets.

Training Details

Base model:

FacebookAI/roberta-base

Training configuration:

Parameter	Value
Maximum sequence length	128
Epochs	5
Train batch size	16
Evaluation batch size	32
Learning rate	2e-5
Weight decay	0.01
Warmup ratio	0.1
Seed	42
Mixed precision	FP16
Early stopping patience	3
Best model metric	Macro F1
GPU	Tesla T4

Additional training metadata is included in:

training_args.bin
training_config.json

Evaluation Results

The model was evaluated on the held-out test set.

Metric	Value
Accuracy	0.6792
Macro Precision	0.6908
Macro Sensitivity	0.6742
Macro Specificity	0.8355
Macro F1	0.6772
Macro AUC OvR	0.8523
Macro IoU / Jaccard	0.5181
Test Loss	0.8294

Per-class precision, sensitivity, and F1-score:

Class	Precision	Sensitivity	F1-score	Support
`self_monitor`	0.68	0.70	0.69	33
`consult_gp`	0.63	0.60	0.62	40
`urgent`	0.65	0.67	0.66	33

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo_id = "cristian-untaru/roberta-medical-triage"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

text = "I have chest pain, shortness of breath, and I feel dizzy."

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=-1)
    predicted_class_id = torch.argmax(probabilities, dim=-1).item()

id2label = model.config.id2label

predicted_label = id2label[predicted_class_id]
confidence = probabilities[0][predicted_class_id].item()

print("Predicted label:", predicted_label)
print("Confidence:", round(confidence, 4))

Repository Files

File	Description
`config.json`	Model architecture and classification configuration.
`model.safetensors`	Fine-tuned model weights.
`tokenizer.json`	Tokenizer vocabulary and processing pipeline.
`tokenizer_config.json`	Tokenizer configuration.
`vocab.json`	RoBERTa tokenizer vocabulary file.
`merges.txt`	RoBERTa byte-pair encoding merge rules.
`label_map.json`	Mapping between class IDs and triage labels.
`training_args.bin`	Training arguments saved by the Hugging Face Trainer.
`training_config.json`	Additional training configuration for reproducibility.
`.gitattributes`	Git LFS configuration for large model files.
`README.md`	Model Card documentation.

Related Dataset and Model Repositories

Related dataset repositories:

Related model repositories:

Additional fourth model repository may be added separately after training and publication.

Limitations

This model has several important limitations:

It was trained on a small academic dataset.
It should not be used as a standalone medical decision-making system.
It does not replace professional medical advice, diagnosis, or treatment.
It may produce incorrect predictions for ambiguous, incomplete, rare, or complex symptom descriptions.
It does not have access to patient history, vital signs, medical records, age, comorbidities, medication history, or physical examination findings.
The model was trained in English and should not be assumed to perform reliably on other languages without additional validation.
The labels were derived through weak supervision and rule-based triage logic, not through clinically validated manual annotation.
The model is based on a general-purpose RoBERTa checkpoint, not a biomedical-domain RoBERTa model.
The final test performance is lower than the validation performance, which may reflect the small dataset size, weak-supervision labeling noise, and the difficulty of generalizing across symptom-condition examples.
In manual qualitative tests, the model may show uncertainty or a tendency to favor consult_gp for some ambiguous or severe symptom descriptions; results should therefore be interpreted with caution.

Medical Disclaimer

This model is intended only for academic, research, and prototype development purposes. It does not provide medical diagnosis and must not be used as a substitute for professional medical judgment.

In case of severe, worsening, or life-threatening symptoms, users should contact emergency medical services or a qualified healthcare professional.

Author

Cristian Untaru
Faculty of Informatics
West University of Timișoara

Downloads last month: 12

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for cristian-untaru/roberta-medical-triage

Base model

FacebookAI/roberta-base

Finetuned

(2346)

this model

Dataset used to train cristian-untaru/roberta-medical-triage

Evaluation results

Accuracy
self-reported

0.679
Macro F1
self-reported

0.677
Macro Precision
self-reported

0.691
Macro Sensitivity
self-reported

0.674
Macro AUC OvR
self-reported

0.852
Macro Specificity
self-reported

0.836
Macro IoU / Jaccard
self-reported

0.518