Upload README.md with huggingface_hub

97fea57 verified 6 months ago

1.99 kB

license: mit
language: en
datasets:
  - SetFit/amazon_reviews_multi_en
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - roberta
  - multi-class-classification

RoBERTa Fine-tuned on Amazon Reviews (5-Star Rating)

Model Description

This model is a fine-tuned version of roberta-base for 5-class sentiment classification, predicting star ratings (1-5) from Amazon product reviews.

Comparison with DistilBERT

This model was trained as part of a model comparison study:

Model	Parameters	Accuracy	Off-by-one Accuracy	Inference Speed
DistilBERT	67M	54.95%	92.45%	1.83x faster
RoBERTa	125M	59.90%	95.10%	Baseline

RoBERTa provides ~5 percentage points higher accuracy at the cost of slower inference.

Training Data

Dataset: SetFit/amazon_reviews_multi_en
Train samples: 20,000 (subset)
Test samples: 2,000 (subset)
Classes: 1 star, 2 stars, 3 stars, 4 stars, 5 stars

Training Procedure

Base model: roberta-base
Epochs: 3
Batch size: 16
Learning rate: 2e-5
Max sequence length: 256

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="Nav772/roberta-amazon-reviews-5star")
result = classifier("This product exceeded my expectations! Great quality.")
print(result)

When to Use This Model

Choose RoBERTa when accuracy is the priority and latency is less critical
Choose DistilBERT when you need faster inference or have resource constraints

Demo

Try the model comparison demo: sentiment-model-comparison

Limitations

Trained on Amazon product reviews; may not generalize to other review domains
Adjacent star ratings (e.g., 2 vs 3 stars) are inherently difficult to distinguish
English language only