Instructions to use FpOliveira/tupi-bert-base-portuguese-cased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FpOliveira/tupi-bert-base-portuguese-cased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="FpOliveira/tupi-bert-base-portuguese-cased")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("FpOliveira/tupi-bert-base-portuguese-cased") model = AutoModelForSequenceClassification.from_pretrained("FpOliveira/tupi-bert-base-portuguese-cased") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("FpOliveira/tupi-bert-base-portuguese-cased")
model = AutoModelForSequenceClassification.from_pretrained("FpOliveira/tupi-bert-base-portuguese-cased")Quick Links
Introduction
Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the BERTimbau base, TuPi-Base is refinde solution for addressing hate speech concerns. For more details or specific inquiries, please refer to the BERTimbau repository.
The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the TuPi Hate Speech DataSet, sourced from diverse social networks.
Available models
| Model | Arch. | #Layers | #Params |
|---|---|---|---|
FpOliveira/tupi-bert-base-portuguese-cased |
BERT-Base | 12 | 109M |
FpOliveira/tupi-bert-large-portuguese-cased |
BERT-Large | 24 | 334M |
FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel |
BERT-Base | 12 | 109M |
FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel |
BERT-Large | 24 | 334M |
Example usage usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import torch
import numpy as np
from scipy.special import softmax
def classify_hate_speech(model_name, text):
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
# Tokenize input text and prepare model input
model_input = tokenizer(text, padding=True, return_tensors="pt")
# Get model output scores
with torch.no_grad():
output = model(**model_input)
scores = softmax(output.logits.numpy(), axis=1)
ranking = np.argsort(scores[0])[::-1]
# Print the results
for i, rank in enumerate(ranking):
label = config.id2label[rank]
score = scores[0, rank]
print(f"{i + 1}) Label: {label} Score: {score:.4f}")
# Example usage
model_name = "FpOliveira/tupi-bert-base-portuguese-cased"
text = "Bom dia, flor do dia!!"
classify_hate_speech(model_name, text)
- Downloads last month
- 19
Model tree for FpOliveira/tupi-bert-base-portuguese-cased
Dataset used to train FpOliveira/tupi-bert-base-portuguese-cased
Updated • 12 • 3
Spaces using FpOliveira/tupi-bert-base-portuguese-cased 3
FpOliveira/portuguese-hate-speech-classifier
🧐
Veronyka/radar-social-lgbtqia-v2.1
🤗🤬
Silly-Machine/portuguese-hate-speech-classifier
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="FpOliveira/tupi-bert-base-portuguese-cased")