shogun-the-great's picture
Update README.md
57d09ce verified
|
Raw
History Blame Contribute Delete
2.88 kB
---
library_name: transformers
tags:
- phishing-detection
- binary-classification
- bert
- nlp
---
# Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification
## Model Details
### Model Description
This model is a fine-tuned version of [BERT-Base-Uncased](https://huggingface.co/google-bert/bert-base-uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.
- **Developed by:** [shogun-the-great](https://huggingface.co/shogun-the-great)
- **Model type:** Binary Classification (Safe vs Not Safe)
- **Language(s):** English
- **License:** Apache-2.0 (or specify your license)
- **Finetuned from model:** `google/bert-base-uncased`
- ** **
### Model Sources
- **Dataset:** [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification)
## Uses
### Direct Use
This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:
- Integrating with browser extensions for real-time website classification.
- Analyzing textual data for phishing indicators.
### Downstream Use
Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains.
### Out-of-Scope Use
This model might not perform well for:
- Non-English text.
- Adversarial phishing attacks or heavily obfuscated text.
- Tasks unrelated to text-based classification.
## Bias, Risks, and Limitations
### Bias
The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions.
### Risks
- False positives: Legitimate websites flagged as phishing.
- False negatives: Some phishing sites might not be detected.
- Potential vulnerabilities to adversarial examples.
### Recommendations
- Regularly update the dataset and model to stay aligned with emerging phishing patterns.
- Use in combination with other security measures for robust phishing detection.
## How to Get Started with the Model
You can load the fine-tuned model directly from the Hugging Face Hub:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the tokenizer and model from Hugging Face Hub
model_name = "shogun-the-great/finetuned-bert-phishing-site-classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example usage
text = "Enter your login credentials to claim a free reward!"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
# Get the predicted label
logits = outputs.logits
prediction = logits.argmax(dim=-1).item()
print("Prediction:", "Not Safe" if prediction == 1 else "Safe")