File size: 1,991 Bytes
97fea57
21b71d0
97fea57
 
 
 
 
 
 
 
 
 
 
21b71d0
 
97fea57
21b71d0
97fea57
21b71d0
97fea57
21b71d0
97fea57
21b71d0
97fea57
21b71d0
97fea57
 
 
 
21b71d0
97fea57
21b71d0
97fea57
21b71d0
97fea57
 
 
 
21b71d0
97fea57
21b71d0
97fea57
 
 
 
 
21b71d0
97fea57
 
 
21b71d0
97fea57
 
 
 
21b71d0
97fea57
21b71d0
97fea57
 
21b71d0
97fea57
21b71d0
97fea57
21b71d0
97fea57
21b71d0
97fea57
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

---
license: mit
language: en
datasets:
- SetFit/amazon_reviews_multi_en
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- sentiment-analysis
- roberta
- multi-class-classification
---

# RoBERTa Fine-tuned on Amazon Reviews (5-Star Rating)

## Model Description

This model is a fine-tuned version of `roberta-base` for 5-class sentiment classification, predicting star ratings (1-5) from Amazon product reviews.

## Comparison with DistilBERT

This model was trained as part of a model comparison study:

| Model | Parameters | Accuracy | Off-by-one Accuracy | Inference Speed |
|-------|------------|----------|---------------------|-----------------|
| DistilBERT | 67M | 54.95% | 92.45% | 1.83x faster |
| **RoBERTa** | **125M** | **59.90%** | **95.10%** | Baseline |

RoBERTa provides ~5 percentage points higher accuracy at the cost of slower inference.

## Training Data

- **Dataset**: SetFit/amazon_reviews_multi_en
- **Train samples**: 20,000 (subset)
- **Test samples**: 2,000 (subset)
- **Classes**: 1 star, 2 stars, 3 stars, 4 stars, 5 stars

## Training Procedure

- **Base model**: roberta-base
- **Epochs**: 3
- **Batch size**: 16
- **Learning rate**: 2e-5
- **Max sequence length**: 256

## Usage
```python
from transformers import pipeline

classifier = pipeline("text-classification", model="Nav772/roberta-amazon-reviews-5star")
result = classifier("This product exceeded my expectations! Great quality.")
print(result)
```

## When to Use This Model

- Choose **RoBERTa** when accuracy is the priority and latency is less critical
- Choose **DistilBERT** when you need faster inference or have resource constraints

## Demo

Try the model comparison demo: [sentiment-model-comparison](https://huggingface.co/spaces/Nav772/sentiment-model-comparison)

## Limitations

- Trained on Amazon product reviews; may not generalize to other review domains
- Adjacent star ratings (e.g., 2 vs 3 stars) are inherently difficult to distinguish
- English language only