xanderabim commited on
Commit
fc0c984
·
verified ·
1 Parent(s): 4e18cf6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +10 -16
README.md CHANGED
@@ -9,23 +9,17 @@ tags:
9
  - fraud-detection
10
  - distilbert
11
  - onnx
12
- datasets:
13
- - CEAS_08
14
- - Phishing_Email
15
- - enron_data_fraud_labeled
16
- - Nigerian_5
17
- - TREC_07
18
  pipeline_tag: text-classification
19
  ---
20
 
21
  # FraudFoxAI Phishing Detection Model
22
 
23
- Fine-tuned DistilBERT model for detecting phishing and fraudulent emails. Trained on 565,000+ emails from 5 datasets with 99.71% accuracy.
24
 
25
  ## Model Details
26
 
27
  - **Base Model**: distilbert-base-uncased
28
- - **Training Data**: 565,293 emails from 5 combined datasets
29
  - **Inference Runtime**: ONNX Runtime (PyTorch + ONNX available)
30
  - **Classes**:
31
  - LABEL_0: Legitimate Email
@@ -42,14 +36,14 @@ Fine-tuned DistilBERT model for detecting phishing and fraudulent emails. Traine
42
 
43
  ## Training Data
44
 
45
- | Dataset | Emails | Description |
46
- |---|---|---|
47
- | Enron Fraud | 447,417 | Corporate fraud/legitimate emails |
48
- | TREC_07 | 53,757 | TREC spam detection corpus |
49
- | CEAS_08 | 39,154 | Conference on Email and Anti-Spam |
50
- | Phishing_Email | 18,634 | Labeled phishing/safe emails |
51
- | Nigerian_5 | 6,331 | 419/advance-fee fraud emails |
52
- | **Total** | **565,293** | |
53
 
54
  ## Training Configuration
55
 
 
9
  - fraud-detection
10
  - distilbert
11
  - onnx
 
 
 
 
 
 
12
  pipeline_tag: text-classification
13
  ---
14
 
15
  # FraudFoxAI Phishing Detection Model
16
 
17
+ Fine-tuned DistilBERT model for detecting phishing and fraudulent emails. Trained on 565,000+ curated emails with 99.71% accuracy.
18
 
19
  ## Model Details
20
 
21
  - **Base Model**: distilbert-base-uncased
22
+ - **Training Data**: 565,293 curated emails from multiple sources
23
  - **Inference Runtime**: ONNX Runtime (PyTorch + ONNX available)
24
  - **Classes**:
25
  - LABEL_0: Legitimate Email
 
36
 
37
  ## Training Data
38
 
39
+ Trained on **565,293 curated emails** from multiple sources:
40
+
41
+ - Corporate email archives (legitimate emails)
42
+ - Reported phishing samples
43
+ - Known 419/advance-fee fraud emails
44
+ - Community-sourced spam and scam samples
45
+
46
+ Continuously improved with user feedback.
47
 
48
  ## Training Configuration
49