SearleSpeechActBert

A DistilBERT-based classifier for automatic speech act classification based on J.R. Searle's taxonomy of illocutionary acts.

The model was developed and presented in the following publication:

Klaus Schmidt, Andreas Niekler, Cathleen Kantner, and Manuel Burghardt. 2023. Classifying speech acts in political communication: A transformer-based approach with weak supervision and active learning. In 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS), pages 739โ€“748. IEEE.

Labels

The classifier predicts the following speech act categories:

  • assertive
  • expressive
  • commissive
  • directive
  • declarative
  • none

The none category is used for sentences that do not contain a recognizable speech act.

Model Details

  • Base model: distilbert-base-uncased
  • Architecture: DistilBertForSequenceClassification
  • Fine-tuning framework: Transformers
  • Language: English

The original training process employed active learning using the small-text library with:

  • 10 iterations
  • 20 queried samples per iteration
  • PredictionEntropy query strategy

The best-performing iteration checkpoint was selected as the final model.

Intended Use

This model is intended for:

  • computational pragmatics research
  • political communication research
  • discourse analysis

Potential use cases include:

  • analysis of political speeches
  • annotation assistance
  • corpus exploration

The model is not intended for high-stakes decision-making.

Training Data

Training data consisted of:

  • State of the Union speeches (1918โ€“2018)
  • United Nations General Debate speeches

Initial labels were generated using weak supervision with skweak, followed by additional annotation through active learning.

Evaluation

Evaluation set size: 118 instances

Class Precision Recall F1
Assertive 0.71 0.89 0.79
Expressive 0.82 0.60 0.69
Commissive 0.83 1.00 0.91
Directive 0.78 0.96 0.86
Declarative 1.00 0.67 0.80
None 0.80 0.50 0.62
Macro Avg 0.82 0.77 0.78

Limitations

  • Small evaluation dataset
  • Uncertainty around edge cases and ambiguous speech acts
  • Cannot handle sentences with multiple embedded speech acts
  • Performance may degrade outside political discourse

Ethical Considerations

The training data consists of speeches by public political figures delivered in public settings. It includes mentions of armed conflict, humanitarian crises, ethnic violence, and sensitive geopolitical topics. Users should exercise caution when applying the model to politically sensitive analyses.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="syslen/SearleSpeechActBert"
)

classifier("We urge all nations to cooperate.")

Example output:

[{'label': 'directive', 'score': 0.7415215373039246}]

Citation

If you use this model in academic work, please cite the associated publication.

@INPROCEEDINGS{schmidt23,
  author={Schmidt, Klaus and Niekler, Andreas and Kantner, Cathleen and Burghardt, Manuel},
  booktitle={2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS)},
  title={Classifying Speech Acts in Political Communication: A Transformer-based Approach with Weak Supervision and Active Learning},
  year={2023},
  pages={739-748},
  doi={10.15439/2023F3485}
}
Downloads last month
79
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support