--- language: en license: apache-2.0 base_model: microsoft/deberta-v3-base tags: - text-classification - deberta-v3 datasets: - ealvaradob/phishing-dataset - ucberkeley-dlab/measuring-hate-speech - cardiffnlp/tweet_eval - lmsys/toxic-chat - tasksource/jigsaw_toxicity - KoalaAI/Text-Moderation-Multilingual --- # Constellation One An experimental text classification model fine-tuned from Microsoft/DeBERTa-V3 base for [Cockatoo](https://cockatoo.dev/) This model is licensed under the `Apache-2.0` license. **Available Labels:** ```json: "id2label": { "0": "scam", "1": "violence", "2": "harassment", "3": "hate_speech", "4": "toxicity", "5": "obscenity" } ``` ## Performance Constellation One achieves a near-SOTA levels of performance within its weight class, specifically excelling in detecting scams and harassment. By default, the model has very high recall values (~0.9) in all categories. After tuning threshold values, recall values will drop to ~0.81, but F1 will increase to ~0.74. ### Evaluation (Untuned Thresholds): **Thresholds:** ```python LABEL_THRESHOLDS = { 'scam': 0.5, 'violence': 0.5, 'harassment': 0.5, 'hate_speech': 0.5, 'toxicity': 0.5, 'obscenity': 0.5 } ``` ![Recall Metrics](assets/graphs/untuned/recall_deberta.png) ![Precision Metrics](assets/graphs/untuned/precision_deberta.png) ![F1 Metrics](assets/graphs/untuned/f1_deberta.png) --- ### Evaluation (Tuned Thresholds): **Thresholds:** ```python LABEL_THRESHOLDS = { 'scam': 0.60, 'violence': 0.73, 'harassment': 0.70, 'hate_speech': 0.80, 'toxicity': 0.75, 'obscenity': 0.85 } ``` ![Recall Metrics](assets/graphs/tuned/recall_deberta.png) ![Precision Metrics](assets/graphs/tuned/precision_deberta.png) ![F1 Metrics](assets/graphs/tuned/f1_deberta.png) --- ## Resources: Training/Inferencing server: https://github.com/DominicTWHV/Cockatoo_ML_Training/ Training Metrics: https://cockatoo.dev/ml-training.html ## Datasets Used | Citations | Dataset | License | Link | | --- | --- | --- | | **Phishing Dataset** | MIT | [Hugging Face](https://huggingface.co/datasets/ealvaradob/phishing-dataset) | | **Measuring Hate Speech** | CC-BY-4.0 | [Hugging Face](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) | | **Tweet Eval (SemEval-2019)** | [See Citation]* | [Hugging Face](https://huggingface.co/datasets/cardiffnlp/tweet_eval) | | **Toxic Chat** | CC-BY-NC-4.0 | [Hugging Face](https://huggingface.co/datasets/lmsys/toxic-chat) | | **Jigsaw Toxicity** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/tasksource/jigsaw_toxicity) | | **Text Moderation Multilingual** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/KoalaAI/Text-Moderation-Multilingual) | --- ### Citation: ucberkeley-dlab/measuring-hate-speech ```bibtex @article{kennedy2020constructing, title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application}, author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia}, journal={arXiv preprint arXiv:2009.10277}, year={2020} } ``` ### Citation: cardiffnlp/tweet_eval ```bibtex @inproceedings{basile-etal-2019-semeval, title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter", author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela", booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation", year = "2019", address = "Minneapolis, Minnesota, USA", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/S19-2007", doi = "10.18653/v1/S19-2007", pages = "54--63" } ``` ### Citation: lmsys/toxic-chat ```bibtex @misc{lin2023toxicchat, title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang}, year={2023}, eprint={2310.17389}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ### Citation: KoalaAI/Text-Moderation-Multilingual ```bibtex @misc{text-moderation-large, title={Text-Moderation-Multilingual: A Multilingual Text Moderation Dataset}, author={[KoalaAI]}, year={2025}, note={Aggregated from ifmain's and OpenAI's moderation datasets} } ```