|
|
| # ArgueBot — Argument Scheme & Fallacy Classifier |
|
|
| This model is a fine-tuned version of `roberta-large` trained on a combined dataset of argument schemes and logical fallacies. It classifies any input text into one of **24 categories** — either a valid argument scheme type or a logical fallacy type — in a single inference pass. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Base Model | `roberta-large` | |
| | Task | 24-class text classification | |
| | Argument Scheme Classes | 11 | |
| | Fallacy Classes | 13 | |
| | Total Classes | 24 | |
|
|
| --- |
|
|
| ## Training Parameters |
|
|
| These are the exact parameters used to train this model: |
|
|
| | Parameter | Value | Description | |
| |---|---|---| |
| | Learning Rate | `2e-5` | Step size for AdamW optimiser | |
| | Batch Size | `8` | Samples per training step | |
| | Max Token Length | `128` | Maximum input tokens per sample | |
| | Weight Decay | `0.01` | L2 regularisation in AdamW | |
| | Max Epochs | `20` | Training ceiling — early stopping cuts this short | |
| | Gradient Clip | `1.0` | Max gradient norm to prevent exploding gradients | |
| | Train / Val / Test Split | `70 / 15 / 15` | Stratified split | |
| | Optimiser | `AdamW` | Adaptive learning rate with weight decay | |
|
|
| ### Early Stopping |
|
|
| | Parameter | Value | Description | |
| |---|---|---| |
| | Patience | `3` | Stops after 3 consecutive epochs with no improvement | |
| | Min Delta | `0.001` | Minimum change that counts as an improvement | |
| | Monitor | `val_loss` | Watches validation loss to detect overfitting | |
|
|
| The best model checkpoint is saved automatically whenever validation loss improves. When early stopping triggers, the best weights are restored before evaluation. |
|
|
| ### Data Preprocessing |
|
|
| | Step | What it does | |
| |---|---| |
| | Exact deduplication | Removes rows where both text and label are identical | |
| | Near-duplicate removal | Removes rows where the same text appears with a different label | |
| | Short text removal | Drops texts with fewer than 5 words | |
| | Class weight balancing | Computes `sklearn.compute_class_weight("balanced")` across all 24 classes | |
| | Shuffle | Fixed `random_state=42` for reproducibility | |
|
|
| --- |
|
|
| ## Supported Labels |
|
|
| ### ✅ Argument Schemes (valid arguments) |
|
|
| 1. Argument from Analogy |
| 2. Argument from Alternatives |
| 3. Argument from Cause to Effect |
| 4. Argument from Commitment |
| 5. Argument from Example |
| 6. Argument from Expert Opinion |
| 7. Argument from Negative Consequences |
| 8. Argument from Positive Consequences |
| 9. Argument from Practical Reasoning |
| 10. Argument from Sign |
| 11. Argument from Values |
|
|
| ### ⚡ Fallacy Types |
|
|
| 1. Ad Hominem |
| 2. Ad Populum |
| 3. Appeal to Emotion |
| 4. Circular Reasoning |
| 5. Equivocation |
| 6. Fallacy of Credibility |
| 7. Fallacy of Extension |
| 8. Fallacy of Logic |
| 9. Fallacy of Relevance |
| 10. False Causality |
| 11. False Dilemma |
| 12. Faulty Generalization |
| 13. Intentional |
|
|
| ## Dataset |
|
|
| | Source | Type | Classes | Samples | |
| |---|---|---|---| |
| | EthiX + Macagno | Argument schemes | 11 | 1829 | |
| | LOGIC dataset | Fallacy types | 13 | 4124 | |
|
|
| - **Text column**: `Argument` |
| - **Label column**: `Label` |
| - Both datasets combined into a single CSV before training |
|
|
|
|
| ## Results |
|
|
| | Metric | Value | |
| |---|---| |
| |Accuracy | 0.54 |
| | Macro f1-score | 0.49 |
|
|
| ## How to Load and Use the Model |
|
|
| ### Step 1 — Install |
|
|
| ```bash |
| pip install transformers torch |
| ``` |
|
|
| ### Step 2 — Load from Hugging Face |
|
|
| ```python |
| |
| |
| import json, torch, requests |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
| MODEL_PATH = "Isabelbinu/roberta-large-argumentscheme-fallacy-classifier" |
| |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) |
| model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH) |
| model.eval() |
| |
| meta = requests.get(f"https://huggingface.co/{MODEL_PATH}/resolve/main/metadata.json").json() |
| label_map = {int(k): v for k, v in meta["label_map"].items()} |
| scheme_ids = set(meta["scheme_ids"]) |
| |
| def predict(text): |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
| with torch.no_grad(): |
| probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze() |
| pred_id = int(probs.argmax()) |
| return { |
| "verdict": "✅ Valid Argument" if pred_id in scheme_ids else "⚡ Fallacy", |
| "label": label_map[pred_id], |
| "confidence": f"{probs[pred_id]:.1%}", |
| } |
| |
| # Test 1 |
| print(predict("Introducing a four-day work week will boost employee wellbeing, reduce burnout, and ultimately increase overall productivity.")) |
| |
| |
| # Test 2 |
| print(predict("Don't trust him — he was caught lying before, so everything he says is wrong.")) |
| |
| |
| ``` |
|
|
|
|
|
|
| ### Applications |
|
|
| - **Education**: Teach critical thinking by identifying argument types in real texts |
| - **Debate Analysis**: Evaluate the quality of reasoning in speeches and essays |
| - **Fact-checking**: Flag logically flawed reasoning in news and social media |
| - **Media Literacy**: Help readers identify manipulation tactics in persuasive content |
| - **AI Assistants**: Add argumentation reasoning to conversational AI systems |
|
|
|
|
|
|
|
|
|
|
| ## License |
|
|
| This model is licensed under the MIT License. |
|
|