mmbert-is-political

mmbert-is-political is a fine-tuned ModernBERT sequence classification model for detecting whether a text is political or non-political.

The model predicts one of two labels:

non_political
political

Model Details

Model type: ModernBERT for sequence classification
Architecture: ModernBertForSequenceClassification
Task: Binary text classification
Labels:
- 0: non_political
- 1: political
Maximum sequence length: 8192 tokens
Pooling: Mean pooling
Problem type: Single-label classification

Intended Use

This model is intended to classify social media posts and news-style texts as political or non-political.

A text is considered political if it discusses political actors or institutions, elections, public policy, governance, macroeconomic issues, or international/geopolitical affairs. Examples include texts about politicians, parties, immigration policy, healthcare reform, inflation, NATO, the EU, or the war in Ukraine.

A text is considered non-political if it focuses on topics unrelated to politics or public policy. Examples include entertainment, sports, lifestyle, travel, food, technology products, weather, nature, or personal well-being.

Training Data

The model was trained on texts from multiple source types:

Social media posts from politicians on Instagram, X, and Facebook
Newspaper articles from German, British, and US outlets

The political actors and outlets represented in the training data come from Germany, the United Kingdom, and the United States.

The training labels are synthetic labels generated using Llama 3 70B. The model was trained on these synthetic annotations.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

repo_id = "Sami92/mmbert-is-political"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    truncation=True,
)

text = "The government announced a new immigration policy today."
result = classifier(text)

print(result)

Evaluation

Metrics

Accuracy

Results

The model was tested on a dataset of 100 texts (UK, US, DE news articles and social media posts), which were labeled by two annotators.

Overall Accuracy: 0.85
Accuracy on news: 0.88
Accuracy on posts: 0.82
Accuracy EN: 0.88
Accuracy DE: 0.80

Limitations

The model was trained on synthetic labels rather than manually verified annotations. As a result, predictions may reflect labeling errors, ambiguities, or biases from the annotation process.

The training data focuses on German, British, and US political and media contexts. Performance may differ for texts from other countries, languages, political systems, or media environments.

Downloads last month: 43

Safetensors

Model size

0.3B params

Tensor type

F32