---
license: apache-2.0
language:
- en
tags:
- nsfw,
- FALCONSAI,
---
# Falconsai/nsfw_image_detection for NSFW Image Classification (2026 Edition)

## Model Description

The **Fine-Tuned Vision Transformer (ViT) V2** is a state-of-the-art transformer encoder architecture adapted for high-precision image classification. Building upon the baseline "google/vit-base-patch16-224-in21k", this 2026 edition has been rigorously retrained and fine-tuned to deliver unprecedented accuracy in moderating visual content.

During the 2026 training phase, we implemented an optimized dynamic learning rate scheduler (starting at 3e-5) and an effective batch size of 64 using gradient accumulation. This configuration maximizes computational efficiency while allowing the model to process complex, high-resolution visual contexts more effectively than its predecessor.

The most significant upgrade in this release is the **expanded, deeply optimized dataset**. Moving beyond the legacy 80,000-image corpus, this model was trained on a meticulously curated proprietary dataset of over **1.2 million images**. This dataset introduces a massive degree of variability, carefully balancing the "normal" and "nsfw" classes to reduce false positives (e.g., classifying classical art or medical imagery correctly) and capture highly nuanced, borderline visual patterns.

The result is a highly robust, enterprise-ready model that sets a new benchmark for automated content safety, moderation, and trust-and-safety compliance.

---

## Gated Model Access

This model is **gated**. To use it in your environment:
1. **Request Access:** Log in to Hugging Face and click "Agree" on the [Falconsai/nsfw_image_detection_2026](https://huggingface.co/Falconsai/nsfw_image_detection_2026) page.
2. **Authentication:** You must provide a [Hugging Face User Access Token](https://huggingface.co/settings/tokens) (with 'Read' permissions) via the `token` parameter in your code or by running `huggingface-cli login`.

---

## Intended Uses & Limitations

### Intended Uses

* **Automated Content Moderation**: The primary use is the real-time classification and filtering of NSFW (Not Safe for Work) images across social platforms, forums, and cloud storage.
* **Trust and Safety Pipelines**: Acts as a high-confidence first pass in multi-tiered human-in-the-loop moderation systems.
* **Edge-Device Deployment**: The ONNX/YOLO-compatible versions are optimized for fast inference on edge devices or mobile environments.

### Limitations

* **Domain Specificity**: This model is strictly an expert at NSFW image classification. Its attention heads and weights are highly specialized; applying it to general object detection or unrelated classification tasks will yield poor results.
* **Cultural Context**: While heavily optimized, the definition of NSFW can vary culturally. Users should calibrate confidence thresholds based on their specific community guidelines.

---

## How to Use

### 1. Using Hugging Face `pipeline` (High-Level Helper)

```python
from PIL import Image
from transformers import pipeline

# Load image
img = Image.open("<path_to_image_file>")

# Initialize 2026 pipeline
classifier = pipeline(
    "image-classification", 
    model="Falconsai/nsfw_image_detection_2026",
    token="YOUR_HF_TOKEN_HERE"
)
result = classifier(img)

print(result)

```

### 2. Loading the Model Directly (PyTorch)

```python
import torch
from PIL import Image
from transformers import AutoModelForImageClassification, ViTImageProcessor

# Load image
img = Image.open("<path_to_image_file>")

# Initialize model and processor
model_name = "Falconsai/nsfw_image_detection_2026"
model = AutoModelForImageClassification.from_pretrained(model_name, token="YOUR_HF_TOKEN_HERE")
processor = ViTImageProcessor.from_pretrained(model_name, token="YOUR_HF_TOKEN_HERE")

# Run inference
with torch.no_grad():
    inputs = processor(images=img, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits

# Extract prediction
predicted_label = logits.argmax(-1).item()
print(f"Predicted Class: {model.config.id2label[predicted_label]}")

```

### 3. Running the ONNX / YOLOv9 Version

For high-speed, localized inference, you can use the ONNX exported model.

```python
import os
import json
import numpy as np
import onnxruntime as ort
import matplotlib.pyplot as plt
from PIL import Image

def predict_with_yolov9(image_path, model_path, labels_path, input_size):
    """Run inference using the converted YOLOv9 ONNX model."""
    
    with open(labels_path, "r") as f:
        labels = json.load(f)

    # Preprocess image
    original_image = Image.open(image_path).convert("RGB")
    image_resized = original_image.resize(input_size, Image.Resampling.BILINEAR)
    image_np = np.array(image_resized, dtype=np.float32) / 255.0
    image_np = np.transpose(image_np, (2, 0, 1))  # [C, H, W]
    input_tensor = np.expand_dims(image_np, axis=0).astype(np.float32)

    # Load YOLOv9 ONNX model
    session = ort.InferenceSession(model_path)
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name

    # Run inference
    outputs = session.run([output_name], {input_name: input_tensor})
    predictions = outputs[0]

    # Postprocess
    predicted_index = np.argmax(predictions)
    predicted_label = labels[str(predicted_index)]

    return predicted_label, original_image

def display_single_prediction(image_path, model_path, labels_path, input_size=(224, 224)):
    """Predicts and visually displays the result."""
    try:
        prediction, img = predict_with_yolov9(image_path, model_path, labels_path, input_size)
        
        fig, ax = plt.subplots(1, 1, figsize=(6, 6))
        ax.imshow(img)
        ax.set_title(f"Prediction: {prediction}", fontsize=14, fontweight='bold')
        ax.axis("off")
        
        plt.tight_layout()
        plt.show()
    except Exception as e:
        print(f"Error processing {image_path}: {e}")

# --- Execution Example ---
if __name__ == "__main__":
    from huggingface_hub import hf_hub_download
    
    # 1. Configuration
    hf_token = "YOUR_HF_TOKEN_HERE"  # Replace with your actual Hugging Face Read Token
    repo_id = "Falconsai/nsfw_image_detection_2026"
    img_path = "path/to/your/single_image.jpg"
    
    # 2. Download gated files from the specific 'yolo' subfolder
    try:
        # Using the actual filenames from your repository
        model_onnx = hf_hub_download(
            repo_id=repo_id, 
            filename="falconsai_yolov9_nsfw_model.pt", 
            subfolder="yolo", 
            token=hf_token
        )
        labels_json = hf_hub_download(
            repo_id=repo_id, 
            filename="labels.json", 
            subfolder="yolo", 
            token=hf_token
        )
        
        # 3. Run Inference
        if os.path.exists(img_path):
            # Note: predict_with_yolov9 uses onnxruntime. 
            # If 'falconsai_yolov9_nsfw_model.pt' is an ONNX file with a .pt extension, this works.
            display_single_prediction(img_path, model_onnx, labels_json)
        else:
            print(f"Image file not found at: {img_path}")
            
    except Exception as e:
        print(f"Access Denied or File Not Found: {e}")
        print("Ensure you have accepted the gate terms on HF and your token is correct.")

```

---

## Training Data & 2026 Metrics

### Dataset Expansion

The 2026 iteration leverages a deeply optimized, proprietary dataset of **1,250,000 images** (a 15x increase from the legacy version). The dataset underwent rigorous deduplication, bias mitigation, and edge-case augmentation (e.g., handling complex lighting, varying resolutions, and non-photographic explicit material like digital art).

### Performance Comparison

This comprehensive dataset, paired with modernized training infrastructure, resulted in significantly tighter evaluation metrics and faster runtime processing.

| Metric | Legacy Version (80k dataset) | 2026 Version (1.2M dataset) | Improvement |
| --- | --- | --- | --- |
| **Evaluation Loss** | 0.0746 | **0.0124** | *Significant reduction in errors* |
| **Evaluation Accuracy** | 98.03% | **99.71%** | *+1.68% absolute accuracy gain* |
| **Eval Runtime** | 304.98s | **184.20s** | *Faster evaluation cycles* |
| **Samples per Second** | 52.46 | **86.15** | *+64% throughput* |

---

## Ethical Considerations & Disclaimer

It is essential to use this model responsibly and ethically. Automated moderation models should be implemented alongside human oversight, especially when dealing with sensitive content, account bans, or legal compliance.

*Disclaimer:* The model's performance reflects the data it was fine-tuned on. While rigorous bias mitigation was performed, edge cases may still result in false positives or negatives. Users must assess the model's suitability against their specific community guidelines.

## References

* [Hugging Face Model Hub](https://huggingface.co/models)
* [Vision Transformer (ViT) Paper](https://arxiv.org/abs/2010.11929)
* [ImageNet-21k Dataset](http://www.image-net.org/)