--- language: - en license: mit tags: - text-classification - phishing-detection - security - fraud-detection - distilbert - onnx pipeline_tag: text-classification --- # FraudFoxAI Phishing Detection Model Fine-tuned DistilBERT model for detecting phishing and fraudulent emails. Trained on 565,000+ curated emails with 99.71% accuracy. ## Model Details - **Base Model**: distilbert-base-uncased - **Training Data**: 565,293 curated emails from multiple sources - **Inference Runtime**: ONNX Runtime (PyTorch + ONNX available) - **Classes**: - LABEL_0: Legitimate Email - LABEL_1: Phishing/Fraud Email ## Performance | Metric | Score | |---|---| | **Accuracy** | 99.71% | | **F1 Score** | 0.9871 | | **Precision** | 0.9897 | | **Recall** | 0.9846 | ## Training Data Trained on **565,293 curated emails** from multiple sources: - Corporate email archives (legitimate emails) - Reported phishing samples - Known 419/advance-fee fraud emails - Community-sourced spam and scam samples Continuously improved with user feedback. ## Training Configuration - Epochs: 2 - Batch Size: 32 - Warmup Steps: 1,000 - Weight Decay: 0.01 - Max Length: 512 tokens - Framework: PyTorch + Transformers - Training Time: ~12 hours on Colab GPU ## Usage ### ONNX Runtime (recommended, low memory) ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("xanderabim/fraudfoxai-phishing") model = ORTModelForSequenceClassification.from_pretrained("xanderabim/fraudfoxai-phishing") inputs = tokenizer("URGENT: Verify your account now!", return_tensors="np", truncation=True) outputs = model(**inputs) ``` ### PyTorch ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("xanderabim/fraudfoxai-phishing") model = AutoModelForSequenceClassification.from_pretrained("xanderabim/fraudfoxai-phishing") inputs = tokenizer("URGENT: Verify your account now!", return_tensors="pt", truncation=True) outputs = model(**inputs) ``` ## Sample Predictions | Email | Phishing Score | Verdict | |---|---|---| | "URGENT: Your PayPal account has been suspended!" | 99.99% | PHISHING | | "Hi team, meeting at 2pm tomorrow" | 0.00% | SAFE | | "Congratulations! You've won $1,000,000!" | 98.66% | PHISHING | | "Meeting notes from yesterday attached" | 0.00% | SAFE | | "Dear valued customer, your package delivery failed" | 99.92% | PHISHING | ## Production API Deployed at: https://fraudfoxai.xanderabim.workers.dev Or forward any email to: **check@fraudfox.ai** ## Limitations - English language only - Max 512 tokens per input - May flag aggressive marketing emails as phishing - Subject-only inputs are less accurate than full email (subject + body) ## License MIT ## Author [@xanderabim](https://huggingface.co/xanderabim)