---
language:
- en
license: mit
tags:
- text-classification
- phishing-detection
- security
- fraud-detection
- distilbert
- onnx
pipeline_tag: text-classification
---

# FraudFoxAI Phishing Detection Model

Fine-tuned DistilBERT model for detecting phishing and fraudulent emails. Trained on 565,000+ curated emails with 99.71% accuracy.

## Model Details

- **Base Model**: distilbert-base-uncased
- **Training Data**: 565,293 curated emails from multiple sources
- **Inference Runtime**: ONNX Runtime (PyTorch + ONNX available)
- **Classes**: 
  - LABEL_0: Legitimate Email
  - LABEL_1: Phishing/Fraud Email

## Performance

| Metric | Score |
|---|---|
| **Accuracy** | 99.71% |
| **F1 Score** | 0.9871 |
| **Precision** | 0.9897 |
| **Recall** | 0.9846 |

## Training Data

Trained on **565,293 curated emails** from multiple sources:

- Corporate email archives (legitimate emails)
- Reported phishing samples
- Known 419/advance-fee fraud emails
- Community-sourced spam and scam samples

Continuously improved with user feedback.

## Training Configuration

- Epochs: 2
- Batch Size: 32
- Warmup Steps: 1,000
- Weight Decay: 0.01
- Max Length: 512 tokens
- Framework: PyTorch + Transformers
- Training Time: ~12 hours on Colab GPU

## Usage

### ONNX Runtime (recommended, low memory)

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xanderabim/fraudfoxai-phishing")
model = ORTModelForSequenceClassification.from_pretrained("xanderabim/fraudfoxai-phishing")

inputs = tokenizer("URGENT: Verify your account now!", return_tensors="np", truncation=True)
outputs = model(**inputs)
```

### PyTorch

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("xanderabim/fraudfoxai-phishing")
model = AutoModelForSequenceClassification.from_pretrained("xanderabim/fraudfoxai-phishing")

inputs = tokenizer("URGENT: Verify your account now!", return_tensors="pt", truncation=True)
outputs = model(**inputs)
```

## Sample Predictions

| Email | Phishing Score | Verdict |
|---|---|---|
| "URGENT: Your PayPal account has been suspended!" | 99.99% | PHISHING |
| "Hi team, meeting at 2pm tomorrow" | 0.00% | SAFE |
| "Congratulations! You've won $1,000,000!" | 98.66% | PHISHING |
| "Meeting notes from yesterday attached" | 0.00% | SAFE |
| "Dear valued customer, your package delivery failed" | 99.92% | PHISHING |

## Production API

Deployed at: https://fraudfoxai.xanderabim.workers.dev

Or forward any email to: **check@fraudfox.ai**

## Limitations

- English language only
- Max 512 tokens per input
- May flag aggressive marketing emails as phishing
- Subject-only inputs are less accurate than full email (subject + body)

## License

MIT

## Author

[@xanderabim](https://huggingface.co/xanderabim)