---
license: cc-by-4.0
language:
- ar
base_model: UBC-NLP/MARBERT
pipeline_tag: text-classification
tags:
- arabic
- extremism-detection
- content-moderation
- isis
- distillation
- marbert
datasets:
- custom
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: marbert-isis-detector
  results:
  - task:
      type: text-classification
      name: Arabic ISIS binary classification
    metrics:
    - type: accuracy
      value: 0.90
    - type: f1
      value: 0.91
      name: ISIS F1
    - type: precision
      value: 0.88
      name: ISIS precision
    - type: recall
      value: 0.94
      name: ISIS recall
---

# marbert-isis-detector

A MARBERT-based binary classifier for Arabic ISIS content, fine-tuned on a corpus of 500,000 Arabic tweets labeled by a taxonomy-guided LLM pipeline. The model identifies pro-ISIS (`ISIS`) versus non-ISIS (`NOT-ISIS`) tweets at the post level and is intended to serve as an efficient first-pass filter that a more expensive LLM classifier can then verify.

This checkpoint accompanies the paper *"Extremism Detection and Counter-Messaging with Large Language Models"* (Alfifi, Kaghazgaran, Caverlee). The code for training and evaluation, the 2,000-tweet evaluation set with LLM predictions, and the prompts used to generate the training labels are released in a companion GitHub repository: [majidalfifi/extremism-llm](https://github.com/majidalfifi/extremism-llm).

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("majidalfifi/marbert-isis-detector")
model = AutoModelForSequenceClassification.from_pretrained("majidalfifi/marbert-isis-detector")

text = "your Arabic tweet here"
inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
with torch.no_grad():
    logits = model(**inputs).logits
label = model.config.id2label[int(logits.argmax(dim=-1))]
print(label)  # -> "ISIS" or "NOT-ISIS"
```

For batch inference over a CSV or line-delimited file, see [`train_marbert.py`](https://github.com/majidalfifi/extremism-llm/blob/main/train_marbert.py) in the companion repository, which supports an `--eval-only --checkpoint majidalfifi/marbert-isis-detector` mode.

## Label mapping

| id | label |
|---|---|
| 0 | ISIS |
| 1 | NOT-ISIS |

## Training data

The model was fine-tuned on a balanced 500,000-tweet corpus (250,000 pro-ISIS + 250,000 NOT-ISIS) constructed by the LLM-labeling pipeline described in the paper:

1. An LLM iteratively induces a taxonomy of extremist content from 20,000 seed pro-ISIS + 20,000 seed NOT-ISIS tweets.
2. The taxonomy is then used to classify 1,000,000 random Arabic tweets and to refine labels for a larger pool of pro-ISIS-account tweets; the 500,000 final labels feed this model.

The underlying corpus was drawn from a 2015 Arabic Twitter archive and is not redistributed with this model (Twitter/X Terms of Service).

## Training procedure

The classifier was fine-tuned from `UBC-NLP/MARBERT` using a standard BERT-for-sequence-classification head. Hyperparameters match those reported in the paper:

| Setting | Value |
|---|---|
| Base model | [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT) |
| Max sequence length | 128 |
| Batch size | 64 |
| Epochs | 5 |
| Learning rate | 2e-6 |
| Optimizer | AdamW |
| LR schedule | Linear warmup (10%) → linear decay |
| Gradient clipping | 1.0 (max norm) |
| Hardware | 4× NVIDIA RTX A6000 (DataParallel) |
| Random seed | 42 |

## Evaluation

On a 10% held-out test split (~50,000 tweets), the paper reports:

| Metric | ISIS class | NOT-ISIS class | Overall |
|---|---|---|---|
| Precision | 0.88 | 0.93 | — |
| Recall | 0.94 | 0.87 | — |
| F1 | 0.91 | 0.90 | — |
| Accuracy | — | — | 0.90 |

See Table 3 of the paper for scaling results at 1K, 10K, 100K, 250K, and 500K training sizes.

## Intended use and limitations

**Intended use.** This model is intended as an automated first-pass filter for detecting pro-ISIS Arabic social media content in research settings — for example, as a cost-effective precursor to a more expensive taxonomy-guided LLM classifier that verifies flagged posts, or as a baseline for Arabic extremism-detection research. The label `ISIS` should be read as "appears to endorse, recruit for, or glorify ISIS-affiliated groups" rather than as "mentions ISIS"; the training corpus includes many NOT-ISIS tweets that reference ISIS in neutral or opposing terms.

**Out-of-scope use.** The model has not been validated for:

- Non-Arabic languages (including Arabic-script text in other languages).
- Extremism from other ideological movements (e.g., far-right, other jihadist groups, white-supremacist content). It is trained specifically on ISIS-era material.
- Automated enforcement, account suspension, or any high-stakes moderation decision without human review. False positives on this task have real consequences for individuals.

**Limitations.**

- **Temporal drift.** The underlying Twitter archive is from 2015, when ISIS messaging took specific linguistic forms. Current extremist rhetoric — ISIS-inspired or otherwise — may differ. Performance on recent content is unlikely to match the reported numbers.
- **Dialectal coverage.** Although MARBERT was pretrained on dialectal Arabic, the training labels were generated by an LLM and may underrepresent some dialects and script variants.
- **Label noise.** Training labels come from an LLM (GPT-4o with a taxonomy-guided prompt), not human adjudication. While the paper validates the taxonomy against human judgment on a 2,000-tweet evaluation set, individual training labels may be noisy.
- **Content-type mismatch.** The model was trained on short tweets. Longer documents or multimodal content will be truncated at 128 tokens and performance is undefined.

**Ethical considerations.** Extremism classification is sensitive. Users should consult the accompanying paper's *Limitations* and *Ethical Considerations* sections before deploying this model in any setting beyond research. The model is released under CC-BY-4.0 to encourage responsible use, but redistribution of model outputs on individuals should comply with relevant laws and platform policies.