Instructions to use FpOliveira/tupi-bert-base-portuguese-cased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FpOliveira/tupi-bert-base-portuguese-cased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="FpOliveira/tupi-bert-base-portuguese-cased")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("FpOliveira/tupi-bert-base-portuguese-cased") model = AutoModelForSequenceClassification.from_pretrained("FpOliveira/tupi-bert-base-portuguese-cased") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| datasets: | |
| - FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary | |
| language: | |
| - pt | |
| metrics: | |
| - accuracy | |
| - precision | |
| - recall | |
| - f1 | |
| pipeline_tag: text-classification | |
| base_model: neuralmind/bert-base-portuguese-cased | |
| widget: | |
| - text: 'Bom dia, flor do dia!!' | |
| ## Introduction | |
| Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), TuPi-Base is refinde solution for addressing hate speech concerns. | |
| For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/). | |
| The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks. | |
| ## Available models | |
| | Model | Arch. | #Layers | #Params | | |
| | ---------------------------------------- | ---------- | ------- | ------- | | |
| | `FpOliveira/tupi-bert-base-portuguese-cased` | BERT-Base |12 |109M| | |
| | `FpOliveira/tupi-bert-large-portuguese-cased` | BERT-Large | 24 | 334M | | |
| | `FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel` | BERT-Base | 12 | 109M | | |
| | `FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel` | BERT-Large | 24 | 334M | | |
| ## Example usage usage | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig | |
| import torch | |
| import numpy as np | |
| from scipy.special import softmax | |
| def classify_hate_speech(model_name, text): | |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| config = AutoConfig.from_pretrained(model_name) | |
| # Tokenize input text and prepare model input | |
| model_input = tokenizer(text, padding=True, return_tensors="pt") | |
| # Get model output scores | |
| with torch.no_grad(): | |
| output = model(**model_input) | |
| scores = softmax(output.logits.numpy(), axis=1) | |
| ranking = np.argsort(scores[0])[::-1] | |
| # Print the results | |
| for i, rank in enumerate(ranking): | |
| label = config.id2label[rank] | |
| score = scores[0, rank] | |
| print(f"{i + 1}) Label: {label} Score: {score:.4f}") | |
| # Example usage | |
| model_name = "FpOliveira/tupi-bert-base-portuguese-cased" | |
| text = "Bom dia, flor do dia!!" | |
| classify_hate_speech(model_name, text) | |
| ``` |