---
license: apache-2.0
language:
- en
tags:
- deepfake
- ai-detection
- computer-vision
- image-classification
- pytorch
- transformers
- siglip2
- synthetic-media
datasets:
- manjilkarki/deepfake-and-real-images
- hamzaboulahia/hardfakevsrealfaces
- muhammadbilal6305/200k-real-vs-ai-visuals-by-mbilal
- ayushmandatta1/deepdetect-2025
- hiddenplant/sut-project
metrics:
- accuracy
- f1
- precision
- recall
- auc
pipeline_tag: image-classification
base_model: google/siglip2-base-patch16-224
library_name: transformers
---

# DeepGuard (Deepfake Model)

## Model Overview

This is a fine-tuned version of **`google/siglip2-base-patch16-224`**, specifically trained for binary image classification to detect AI-generated and deepfake images. It is the core inference engine powering the [DeepGuard AI Media Forensics App](https://huggingface.co/spaces/king1oo1/deepguard-ai-detector).

The model distinguishes between `Real` photographs and `Fake` (AI-generated or deepfake) images. By leveraging the powerful SigLIP2 vision-language encoder and training it on a diverse, multi-source dataset of over 330,000 images, this model demonstrates robust performance in identifying synthetic media, including outputs from modern generators like Midjourney, Stable Diffusion, and DALL·E.

| Metric | Value |
| :--- | :--- |
| **Architecture** | SigLIP2 (Vision Transformer) |
| **Base Model** | `google/siglip2-base-patch16-224` |
| **Input Resolution** | 224x224 pixels |
| **Number of Classes** | 2 (`Real`, `Fake`) |
| **Model Size** | ~372 MB |
| **License** | Apache 2.0 |

## Datasets

The model was trained on a carefully curated, balanced dataset of **40,000 images** (20,000 real, 20,000 fake), sampled from five diverse, high-quality sources to ensure robustness and generalization across various forgery types.

| Dataset Name | Source | Description |
| :--- | :--- | :--- |
| **Deepfake and Real Images** | `manjilkarki/deepfake-and-real-images` | A foundational dataset of 190k human faces, split evenly between real and manipulated images created by various deepfake techniques. Images are 256x256 pixels[reference:0]. |
| **HardFake vs Real Faces** | `hamzaboulahia/hardfakevsrealfaces` | A challenging test-oriented dataset of 1,288 high-quality images (700 fake, 589 real) designed to push the limits of detection models. Fake faces are generated using StyleGAN2, and real faces feature diverse attributes[reference:1]. |
| **GRAVEX-200K** | `muhammadbilal6305/200k-real-vs-ai-visuals-by-mbilal` | A comprehensive multisource dataset of 200,000 face images, curated from six major sources including FaceForensics++, DFDC, Celeb-DF, and Stable Diffusion outputs (SD 1.5, 2.1, XL)[reference:2]. |
| **DeepDetect-2025** | `ayushmandatta1/deepdetect-2025` | A large-scale dataset of over 112,000 images spanning diverse categories (people, animals, nature, urban, artworks), generated by cutting-edge models like DALL·E 3, Midjourney, and Stable Diffusion 3. |
| **Super GenAI (SUT-Project)** | `hiddenplant/sut-project` | A dataset featuring high-fidelity images from the latest generative models, including Midjourney V6, Flux, and NanoBanana (SDXL), covering landscapes, portraits, and urban scenes. |

## Training Procedure

The model was fine-tuned using a progressive unfreezing strategy to adapt the pre-trained SigLIP2 encoder while preventing catastrophic forgetting. All training was performed on a Tesla T4 GPU in Google Colab.

### Training Hyperparameters

| Stage | Epochs | Learning Rate | Trainable Parameters | Description |
| :--- | :--- | :--- | :--- | :--- |
| **Stage 1** | 2 | 1e-3 | Classifier head only | Warm-up phase to adapt the new binary classification head. |
| **Stage 2** | 3 | 5e-5 | Classifier + Top 6 Transformer Blocks | Gradual unfreezing to allow the model to learn task-specific features. |
| **Stage 3** | 2 | 1e-5 | All layers | Full model fine-tuning with a very low learning rate for final convergence. |

- **Batch Size:** 32
- **Optimizer:** AdamW
- **Scheduler:** Cosine Annealing
- **Loss Function:** Cross-Entropy Loss
- **Data Augmentation:** Random Horizontal Flip, Random Rotation (10°), Color Jitter

### Performance Metrics

Evaluation on a held-out validation set results:

| Metric | Score |
| :--- | :--- |
| **Accuracy** | 78.5% |
| AUC                  | > 0.86    |
| F1 Score             | ~0.78     |

## Usage

You can load and use this model directly with the Hugging Face `transformers` library.

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "king1oo1/ai-vs-real-deepfake-model"  # Replace with your actual model ID
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)
model.eval()

# Load and preprocess an image
image = Image.open("path/to/your/image.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    fake_prob = probs[0][1].item() * 100
    real_prob = probs[0][0].item() * 100

print(f"Fake probability: {fake_prob:.2f}%")
print(f"Real probability: {real_prob:.2f}%")
print(f"Verdict: {'FAKE' if fake_prob > 50 else 'REAL'}")