SearleSpeechActBert

A DistilBERT-based classifier for automatic speech act classification based on J.R. Searle's taxonomy of illocutionary acts.

The model was developed and presented in the following publication:

Klaus Schmidt, Andreas Niekler, Cathleen Kantner, and Manuel Burghardt. 2023. Classifying speech acts in political communication: A transformer-based approach with weak supervision and active learning. In 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS), pages 739–748. IEEE.

Labels

The classifier predicts the following speech act categories:

assertive
expressive
commissive
directive
declarative
none

The none category is used for sentences that do not contain a recognizable speech act.

Model Details

Base model: distilbert-base-uncased
Architecture: DistilBertForSequenceClassification
Fine-tuning framework: Transformers
Language: English

The original training process employed active learning using the small-text library with:

10 iterations
20 queried samples per iteration
PredictionEntropy query strategy

The best-performing iteration checkpoint was selected as the final model.

Intended Use

This model is intended for:

computational pragmatics research
political communication research
discourse analysis

Potential use cases include:

analysis of political speeches
annotation assistance
corpus exploration

The model is not intended for high-stakes decision-making.

Training Data

Training data consisted of:

State of the Union speeches (1918–2018)
United Nations General Debate speeches

Initial labels were generated using weak supervision with skweak, followed by additional annotation through active learning.

Evaluation

Evaluation set size: 118 instances

Class	Precision	Recall	F1
Assertive	0.71	0.89	0.79
Expressive	0.82	0.60	0.69
Commissive	0.83	1.00	0.91
Directive	0.78	0.96	0.86
Declarative	1.00	0.67	0.80
None	0.80	0.50	0.62
Macro Avg	0.82	0.77	0.78

Limitations

Small evaluation dataset
Uncertainty around edge cases and ambiguous speech acts
Cannot handle sentences with multiple embedded speech acts
Performance may degrade outside political discourse

Ethical Considerations

The training data consists of speeches by public political figures delivered in public settings. It includes mentions of armed conflict, humanitarian crises, ethnic violence, and sensitive geopolitical topics. Users should exercise caution when applying the model to politically sensitive analyses.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="syslen/SearleSpeechActBert"
)

classifier("We urge all nations to cooperate.")

Example output:

[{'label': 'directive', 'score': 0.7415215373039246}]

Citation

If you use this model in academic work, please cite the associated publication.

@INPROCEEDINGS{schmidt23,
  author={Schmidt, Klaus and Niekler, Andreas and Kantner, Cathleen and Burghardt, Manuel},
  booktitle={2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS)},
  title={Classifying Speech Acts in Political Communication: A Transformer-based Approach with Weak Supervision and Active Learning},
  year={2023},
  pages={739-748},
  doi={10.15439/2023F3485}
}

Downloads last month: 79

Safetensors

Model size

67M params

Tensor type

F32