Instructions to use tum-nlp/Deberta_Human_Value_Detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tum-nlp/Deberta_Human_Value_Detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="tum-nlp/Deberta_Human_Value_Detector", trust_remote_code=True)# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("tum-nlp/Deberta_Human_Value_Detector", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: openrail++ | |
| language: | |
| - en | |
| This is a fine-tuned Deberta model to detect human values in arguments. | |
| The model is part of the ensemble that was the best-performing system in the SemEval2023 task: [Detecting Human Values in arguments](https://touche.webis.de/semeval23/touche23-web/index.html) | |
| It was trained and tested on a dataset of 9324 annotated [arguments](https://zenodo.org/record/7550385#.ZEPzcfzP330). | |
| The whole ensemble system achieved a F1-Score of 0.56 in the competiton. This model achieves a F1-Score of 0.55. | |
| Code for retraining the ensemble is accessible in this [repo](https://github.com/danielschroter/human_value_detector) | |
| ## Model Usage | |
| This model is built on custom code. So the inference api cannot be used directly. | |
| To use the model please follow the steps below... | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("tum-nlp/Deberta_Human_Value_Detector") | |
| trained_model = AutoModelForSequenceClassification.from_pretrained("tum-nlp/Deberta_Human_Value_Detector", trust_remote_code=True) | |
| example_text ='We should ban whaling because whales are a species at the risk of distinction' | |
| encoding = tokenizer.encode_plus( | |
| example_text, | |
| add_special_tokens=True, | |
| max_length=512, | |
| return_token_type_ids=False, | |
| padding="max_length", | |
| return_attention_mask=True, | |
| return_tensors='pt', | |
| ) | |
| with torch.no_grad(): | |
| test_prediction = trained_model(encoding["input_ids"], encoding["attention_mask"]) | |
| test_prediction = test_prediction["output"].flatten().numpy() | |
| ``` | |
| ## Prediction | |
| To make a prediction and map the the outputs to the correct labels. | |
| During the competiton a threshold of 0.25 was used to binarize the output. | |
| ```python | |
| THRESHOLD = 0.25 | |
| LABEL_COLUMNS = ['Self-direction: thought','Self-direction: action','Stimulation','Hedonism','Achievement','Power: dominance','Power: resources','Face','Security: personal', | |
| 'Security: societal','Tradition','Conformity: rules','Conformity: interpersonal','Humility','Benevolence: caring','Benevolence: dependability','Universalism: concern','Universalism: nature','Universalism: tolerance','Universalism: objectivity'] | |
| print(f"Predictions:") | |
| for label, prediction in zip(LABEL_COLUMNS, test_prediction): | |
| if prediction < THRESHOLD: | |
| continue | |
| print(f"{label}: {prediction}") | |
| ``` | |
| ## Citation | |
| ``` | |
| @inproceedings{schroter-etal-2023-adam, | |
| title = "{A}dam-Smith at {S}em{E}val-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models", | |
| author = "Schroter, Daniel and | |
| Dementieva, Daryna and | |
| Groh, Georg", | |
| editor = {Ojha, Atul Kr. and | |
| Do{\u{g}}ru{\"o}z, A. Seza and | |
| Da San Martino, Giovanni and | |
| Tayyar Madabushi, Harish and | |
| Kumar, Ritesh and | |
| Sartori, Elisa}, | |
| booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)", | |
| month = jul, | |
| year = "2023", | |
| address = "Toronto, Canada", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2023.semeval-1.74", | |
| doi = "10.18653/v1/2023.semeval-1.74", | |
| pages = "532--541", | |
| abstract = "This paper presents the best-performing approach alias {``}Adam Smith{''} for the SemEval-2023 Task 4: {``}Identification of Human Values behind Arguments{''}. The goal of the task was to create systems that automatically identify the values within textual arguments. We train transformer-based models until they reach their loss minimum or f1-score maximum. Ensembling the models by selecting one global decision threshold that maximizes the f1-score leads to the best-performing system in the competition. Ensembling based on stacking with logistic regressions shows the best performance on an additional dataset provided to evaluate the robustness ({``}Nahj al-Balagha{''}). Apart from outlining the submitted system, we demonstrate that the use of the large ensemble model is not necessary and that the system size can be significantly reduced.", | |
| } | |
| ``` |