--- license: cc-by-4.0 language: - ar base_model: UBC-NLP/MARBERT pipeline_tag: text-classification tags: - arabic - extremism-detection - content-moderation - isis - distillation - marbert datasets: - custom metrics: - accuracy - f1 - precision - recall model-index: - name: marbert-isis-detector results: - task: type: text-classification name: Arabic ISIS binary classification metrics: - type: accuracy value: 0.90 - type: f1 value: 0.91 name: ISIS F1 - type: precision value: 0.88 name: ISIS precision - type: recall value: 0.94 name: ISIS recall --- # marbert-isis-detector A MARBERT-based binary classifier for Arabic ISIS content, fine-tuned on a corpus of 500,000 Arabic tweets labeled by a taxonomy-guided LLM pipeline. The model identifies pro-ISIS (`ISIS`) versus non-ISIS (`NOT-ISIS`) tweets at the post level and is intended to serve as an efficient first-pass filter that a more expensive LLM classifier can then verify. This checkpoint accompanies the paper *"Extremism Detection and Counter-Messaging with Large Language Models"* (Alfifi, Kaghazgaran, Caverlee). The code for training and evaluation, the 2,000-tweet evaluation set with LLM predictions, and the prompts used to generate the training labels are released in a companion GitHub repository: [majidalfifi/extremism-llm](https://github.com/majidalfifi/extremism-llm). ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("majidalfifi/marbert-isis-detector") model = AutoModelForSequenceClassification.from_pretrained("majidalfifi/marbert-isis-detector") text = "your Arabic tweet here" inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding="max_length") with torch.no_grad(): logits = model(**inputs).logits label = model.config.id2label[int(logits.argmax(dim=-1))] print(label) # -> "ISIS" or "NOT-ISIS" ``` For batch inference over a CSV or line-delimited file, see [`train_marbert.py`](https://github.com/majidalfifi/extremism-llm/blob/main/train_marbert.py) in the companion repository, which supports an `--eval-only --checkpoint majidalfifi/marbert-isis-detector` mode. ## Label mapping | id | label | |---|---| | 0 | ISIS | | 1 | NOT-ISIS | ## Training data The model was fine-tuned on a balanced 500,000-tweet corpus (250,000 pro-ISIS + 250,000 NOT-ISIS) constructed by the LLM-labeling pipeline described in the paper: 1. An LLM iteratively induces a taxonomy of extremist content from 20,000 seed pro-ISIS + 20,000 seed NOT-ISIS tweets. 2. The taxonomy is then used to classify 1,000,000 random Arabic tweets and to refine labels for a larger pool of pro-ISIS-account tweets; the 500,000 final labels feed this model. The underlying corpus was drawn from a 2015 Arabic Twitter archive and is not redistributed with this model (Twitter/X Terms of Service). ## Training procedure The classifier was fine-tuned from `UBC-NLP/MARBERT` using a standard BERT-for-sequence-classification head. Hyperparameters match those reported in the paper: | Setting | Value | |---|---| | Base model | [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT) | | Max sequence length | 128 | | Batch size | 64 | | Epochs | 5 | | Learning rate | 2e-6 | | Optimizer | AdamW | | LR schedule | Linear warmup (10%) → linear decay | | Gradient clipping | 1.0 (max norm) | | Hardware | 4× NVIDIA RTX A6000 (DataParallel) | | Random seed | 42 | ## Evaluation On a 10% held-out test split (~50,000 tweets), the paper reports: | Metric | ISIS class | NOT-ISIS class | Overall | |---|---|---|---| | Precision | 0.88 | 0.93 | — | | Recall | 0.94 | 0.87 | — | | F1 | 0.91 | 0.90 | — | | Accuracy | — | — | 0.90 | See Table 3 of the paper for scaling results at 1K, 10K, 100K, 250K, and 500K training sizes. ## Intended use and limitations **Intended use.** This model is intended as an automated first-pass filter for detecting pro-ISIS Arabic social media content in research settings — for example, as a cost-effective precursor to a more expensive taxonomy-guided LLM classifier that verifies flagged posts, or as a baseline for Arabic extremism-detection research. The label `ISIS` should be read as "appears to endorse, recruit for, or glorify ISIS-affiliated groups" rather than as "mentions ISIS"; the training corpus includes many NOT-ISIS tweets that reference ISIS in neutral or opposing terms. **Out-of-scope use.** The model has not been validated for: - Non-Arabic languages (including Arabic-script text in other languages). - Extremism from other ideological movements (e.g., far-right, other jihadist groups, white-supremacist content). It is trained specifically on ISIS-era material. - Automated enforcement, account suspension, or any high-stakes moderation decision without human review. False positives on this task have real consequences for individuals. **Limitations.** - **Temporal drift.** The underlying Twitter archive is from 2015, when ISIS messaging took specific linguistic forms. Current extremist rhetoric — ISIS-inspired or otherwise — may differ. Performance on recent content is unlikely to match the reported numbers. - **Dialectal coverage.** Although MARBERT was pretrained on dialectal Arabic, the training labels were generated by an LLM and may underrepresent some dialects and script variants. - **Label noise.** Training labels come from an LLM (GPT-4o with a taxonomy-guided prompt), not human adjudication. While the paper validates the taxonomy against human judgment on a 2,000-tweet evaluation set, individual training labels may be noisy. - **Content-type mismatch.** The model was trained on short tweets. Longer documents or multimodal content will be truncated at 128 tokens and performance is undefined. **Ethical considerations.** Extremism classification is sensitive. Users should consult the accompanying paper's *Limitations* and *Ethical Considerations* sections before deploying this model in any setting beyond research. The model is released under CC-BY-4.0 to encourage responsible use, but redistribution of model outputs on individuals should comply with relevant laws and platform policies.