Text Classification
Transformers
Safetensors
bert
phishing-detection
binary-classification
nlp
text-embeddings-inference
Instructions to use shogun-the-great/finetuned-bert-phishing-site-classification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shogun-the-great/finetuned-bert-phishing-site-classification with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="shogun-the-great/finetuned-bert-phishing-site-classification")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("shogun-the-great/finetuned-bert-phishing-site-classification") model = AutoModelForSequenceClassification.from_pretrained("shogun-the-great/finetuned-bert-phishing-site-classification") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - phishing-detection | |
| - binary-classification | |
| - bert | |
| - nlp | |
| # Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification | |
| ## Model Details | |
| ### Model Description | |
| This model is a fine-tuned version of [BERT-Base-Uncased](https://huggingface.co/google-bert/bert-base-uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input. | |
| - **Developed by:** [shogun-the-great](https://huggingface.co/shogun-the-great) | |
| - **Model type:** Binary Classification (Safe vs Not Safe) | |
| - **Language(s):** English | |
| - **License:** Apache-2.0 (or specify your license) | |
| - **Finetuned from model:** `google/bert-base-uncased` | |
| - ** ** | |
| ### Model Sources | |
| - **Dataset:** [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification) | |
| ## Uses | |
| ### Direct Use | |
| This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include: | |
| - Integrating with browser extensions for real-time website classification. | |
| - Analyzing textual data for phishing indicators. | |
| ### Downstream Use | |
| Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains. | |
| ### Out-of-Scope Use | |
| This model might not perform well for: | |
| - Non-English text. | |
| - Adversarial phishing attacks or heavily obfuscated text. | |
| - Tasks unrelated to text-based classification. | |
| ## Bias, Risks, and Limitations | |
| ### Bias | |
| The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions. | |
| ### Risks | |
| - False positives: Legitimate websites flagged as phishing. | |
| - False negatives: Some phishing sites might not be detected. | |
| - Potential vulnerabilities to adversarial examples. | |
| ### Recommendations | |
| - Regularly update the dataset and model to stay aligned with emerging phishing patterns. | |
| - Use in combination with other security measures for robust phishing detection. | |
| ## How to Get Started with the Model | |
| You can load the fine-tuned model directly from the Hugging Face Hub: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| # Load the tokenizer and model from Hugging Face Hub | |
| model_name = "shogun-the-great/finetuned-bert-phishing-site-classification" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) | |
| # Example usage | |
| text = "Enter your login credentials to claim a free reward!" | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True) | |
| outputs = model(**inputs) | |
| # Get the predicted label | |
| logits = outputs.logits | |
| prediction = logits.argmax(dim=-1).item() | |
| print("Prediction:", "Not Safe" if prediction == 1 else "Safe") |