---
language:
- en
license: mit
tags:
- text-classification
- phishing-detection
- email-security
- deberta-v3
- transformers
datasets:
- zefang-liu/phishing-email-dataset
metrics:
- accuracy
- f1
- precision
- recall
base_model: microsoft/deberta-v3-large
pipeline_tag: text-classification
---

# Phishing Email Detector (DeBERTa-v3-large)

フィッシングメール検出のためにファインチューニングされたDeBERTa-v3-largeモデル

## Model Description

このモデルは`microsoft/deberta-v3-large`をベースに、フィッシングメールと安全なメールを分類するためにファインチューニングされています。

### 🔒 100% Recall達成

閾値を0.0007に設定することで、**フィッシングメールを100%検出**できます。

## Performance

### デフォルト設定（閾値0.5）

| Metric | Value |
|--------|-------|
| Accuracy | 97.59% |
| F1-score | 96.99% |
| Precision | 95.01% |
| Recall | 99.04% |

### 最大セキュリティ設定（閾値0.0007）- **Recall 100%**

| Metric | Value |
|--------|-------|
| Accuracy | 95.23% |
| F1-score | 94.26% |
| Precision | 89.15% |
| **Recall** | **100.00%** |

## Usage

### Basic Usage (Default Threshold)

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="takumi123xxx/phishing-email-detector-deberta-v3")
result = classifier("Your email text here")
print(result)
```

### Maximum Security (100% Recall)

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")
tokenizer = AutoTokenizer.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")

THRESHOLD = 0.0007  # For 100% Recall

def detect_phishing(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        phishing_prob = probs[0][1].item()
    
    return {
        "is_phishing": phishing_prob >= THRESHOLD,
        "phishing_probability": phishing_prob,
        "label": "Phishing Email" if phishing_prob >= THRESHOLD else "Safe Email"
    }

# Example
result = detect_phishing("Congratulations! You've won $1,000,000. Click here to claim your prize!")
print(result)
```

## Training Details

- **Base Model**: microsoft/deberta-v3-large
- **Dataset**: [zefang-liu/phishing-email-dataset](https://huggingface.co/datasets/zefang-liu/phishing-email-dataset)
- **Training Samples**: 14,904
- **Validation Samples**: 1,863
- **Test Samples**: 1,864
- **Epochs**: 2.15 (Early Stopping)
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Max Length**: 512

## Labels

- `0`: Safe Email
- `1`: Phishing Email

## Threshold Recommendation

| Use Case | Threshold | Recall | False Positives |
|----------|-----------|--------|-----------------|
| Balanced | 0.5 | 99.04% | 38 |
| High Security | 0.0007 | 100.00% | 89 |

## Limitations

- Trained on English emails only
- May not detect novel phishing techniques not present in training data
- False positives increase when using lower thresholds

## License

MIT License