mmbert-is-political

mmbert-is-political is a fine-tuned ModernBERT sequence classification model for detecting whether a text is political or non-political.

The model predicts one of two labels:

  • non_political
  • political

Model Details

  • Model type: ModernBERT for sequence classification
  • Architecture: ModernBertForSequenceClassification
  • Task: Binary text classification
  • Labels:
    • 0: non_political
    • 1: political
  • Maximum sequence length: 8192 tokens
  • Pooling: Mean pooling
  • Problem type: Single-label classification

Intended Use

This model is intended to classify social media posts and news-style texts as political or non-political.

A text is considered political if it discusses political actors or institutions, elections, public policy, governance, macroeconomic issues, or international/geopolitical affairs. Examples include texts about politicians, parties, immigration policy, healthcare reform, inflation, NATO, the EU, or the war in Ukraine.

A text is considered non-political if it focuses on topics unrelated to politics or public policy. Examples include entertainment, sports, lifestyle, travel, food, technology products, weather, nature, or personal well-being.

Training Data

The model was trained on texts from multiple source types:

  • Social media posts from politicians on Instagram, X, and Facebook
  • Newspaper articles from German, British, and US outlets

The political actors and outlets represented in the training data come from Germany, the United Kingdom, and the United States.

The training labels are synthetic labels generated using Llama 3 70B. The model was trained on these synthetic annotations.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

repo_id = "Sami92/mmbert-is-political"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    truncation=True,
)

text = "The government announced a new immigration policy today."
result = classifier(text)

print(result)

Evaluation

Metrics

  • Accuracy

Results

The model was tested on a dataset of 100 texts (UK, US, DE news articles and social media posts), which were labeled by two annotators.

  • Overall Accuracy: 0.85
  • Accuracy on news: 0.88
  • Accuracy on posts: 0.82
  • Accuracy EN: 0.88
  • Accuracy DE: 0.80

Limitations

The model was trained on synthetic labels rather than manually verified annotations. As a result, predictions may reflect labeling errors, ambiguities, or biases from the annotation process.

The training data focuses on German, British, and US political and media contexts. Performance may differ for texts from other countries, languages, political systems, or media environments.

Downloads last month
43
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support