Instructions to use syslen/SearleSpeechActBert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use syslen/SearleSpeechActBert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="syslen/SearleSpeechActBert")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("syslen/SearleSpeechActBert") model = AutoModelForSequenceClassification.from_pretrained("syslen/SearleSpeechActBert") - Notebooks
- Google Colab
- Kaggle
SearleSpeechActBert
A DistilBERT-based classifier for automatic speech act classification based on J.R. Searle's taxonomy of illocutionary acts.
The model was developed and presented in the following publication:
Klaus Schmidt, Andreas Niekler, Cathleen Kantner, and Manuel Burghardt. 2023. Classifying speech acts in political communication: A transformer-based approach with weak supervision and active learning. In 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS), pages 739โ748. IEEE.
Labels
The classifier predicts the following speech act categories:
- assertive
- expressive
- commissive
- directive
- declarative
- none
The none category is used for sentences that do not contain a recognizable speech act.
Model Details
- Base model:
distilbert-base-uncased - Architecture:
DistilBertForSequenceClassification - Fine-tuning framework: Transformers
- Language: English
The original training process employed active learning using the small-text library with:
- 10 iterations
- 20 queried samples per iteration
- PredictionEntropy query strategy
The best-performing iteration checkpoint was selected as the final model.
Intended Use
This model is intended for:
- computational pragmatics research
- political communication research
- discourse analysis
Potential use cases include:
- analysis of political speeches
- annotation assistance
- corpus exploration
The model is not intended for high-stakes decision-making.
Training Data
Training data consisted of:
- State of the Union speeches (1918โ2018)
- United Nations General Debate speeches
Initial labels were generated using weak supervision with skweak, followed by additional annotation through active learning.
Evaluation
Evaluation set size: 118 instances
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Assertive | 0.71 | 0.89 | 0.79 |
| Expressive | 0.82 | 0.60 | 0.69 |
| Commissive | 0.83 | 1.00 | 0.91 |
| Directive | 0.78 | 0.96 | 0.86 |
| Declarative | 1.00 | 0.67 | 0.80 |
| None | 0.80 | 0.50 | 0.62 |
| Macro Avg | 0.82 | 0.77 | 0.78 |
Limitations
- Small evaluation dataset
- Uncertainty around edge cases and ambiguous speech acts
- Cannot handle sentences with multiple embedded speech acts
- Performance may degrade outside political discourse
Ethical Considerations
The training data consists of speeches by public political figures delivered in public settings. It includes mentions of armed conflict, humanitarian crises, ethnic violence, and sensitive geopolitical topics. Users should exercise caution when applying the model to politically sensitive analyses.
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="syslen/SearleSpeechActBert"
)
classifier("We urge all nations to cooperate.")
Example output:
[{'label': 'directive', 'score': 0.7415215373039246}]
Citation
If you use this model in academic work, please cite the associated publication.
@INPROCEEDINGS{schmidt23,
author={Schmidt, Klaus and Niekler, Andreas and Kantner, Cathleen and Burghardt, Manuel},
booktitle={2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS)},
title={Classifying Speech Acts in Political Communication: A Transformer-based Approach with Weak Supervision and Active Learning},
year={2023},
pages={739-748},
doi={10.15439/2023F3485}
}
- Downloads last month
- 79