shogun-the-great
/

finetuned-bert-phishing-site-classification

Text Classification

phishing-detection

binary-classification

text-embeddings-inference

Model card Files Files and versions

finetuned-bert-phishing-site-classification / README.md

shogun-the-great's picture

shogun-the-great

Update README.md

57d09ce verified over 1 year ago

|

History Blame Contribute Delete

2.88 kB

	---
	library_name: transformers
	tags:
	- phishing-detection
	- binary-classification
	- bert
	- nlp
	---

	# Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification

	## Model Details

	### Model Description

	This model is a fine-tuned version of [BERT-Base-Uncased](https://huggingface.co/google-bert/bert-base-uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.

	- Developed by: [shogun-the-great](https://huggingface.co/shogun-the-great)
	- Model type: Binary Classification (Safe vs Not Safe)
	- Language(s): English
	- License: Apache-2.0 (or specify your license)
	- Finetuned from model: `google/bert-base-uncased`
	-

	### Model Sources

	- Dataset: [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification)

	## Uses

	### Direct Use

	This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:

	- Integrating with browser extensions for real-time website classification.
	- Analyzing textual data for phishing indicators.

	### Downstream Use

	Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains.

	### Out-of-Scope Use

	This model might not perform well for:
	- Non-English text.
	- Adversarial phishing attacks or heavily obfuscated text.
	- Tasks unrelated to text-based classification.

	## Bias, Risks, and Limitations

	### Bias

	The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions.

	### Risks

	- False positives: Legitimate websites flagged as phishing.
	- False negatives: Some phishing sites might not be detected.
	- Potential vulnerabilities to adversarial examples.

	### Recommendations

	- Regularly update the dataset and model to stay aligned with emerging phishing patterns.
	- Use in combination with other security measures for robust phishing detection.

	## How to Get Started with the Model

	You can load the fine-tuned model directly from the Hugging Face Hub:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load the tokenizer and model from Hugging Face Hub
	model_name = "shogun-the-great/finetuned-bert-phishing-site-classification"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example usage
	text = "Enter your login credentials to claim a free reward!"
	inputs = tokenizer(text, return_tensors="pt", truncation=True)
	outputs = model(**inputs)

	# Get the predicted label
	logits = outputs.logits
	prediction = logits.argmax(dim=-1).item()
	print("Prediction:", "Not Safe" if prediction == 1 else "Safe")