--- license: apache-2.0 language: - en tags: - deepfake - ai-detection - computer-vision - image-classification - pytorch - transformers - siglip2 - synthetic-media datasets: - manjilkarki/deepfake-and-real-images - hamzaboulahia/hardfakevsrealfaces - muhammadbilal6305/200k-real-vs-ai-visuals-by-mbilal - ayushmandatta1/deepdetect-2025 - hiddenplant/sut-project metrics: - accuracy - f1 - precision - recall - auc pipeline_tag: image-classification base_model: google/siglip2-base-patch16-224 library_name: transformers --- # DeepGuard (Deepfake Model) ## Model Overview This is a fine-tuned version of **`google/siglip2-base-patch16-224`**, specifically trained for binary image classification to detect AI-generated and deepfake images. It is the core inference engine powering the [DeepGuard AI Media Forensics App](https://huggingface.co/spaces/king1oo1/deepguard-ai-detector). The model distinguishes between `Real` photographs and `Fake` (AI-generated or deepfake) images. By leveraging the powerful SigLIP2 vision-language encoder and training it on a diverse, multi-source dataset of over 330,000 images, this model demonstrates robust performance in identifying synthetic media, including outputs from modern generators like Midjourney, Stable Diffusion, and DALL·E. | Metric | Value | | :--- | :--- | | **Architecture** | SigLIP2 (Vision Transformer) | | **Base Model** | `google/siglip2-base-patch16-224` | | **Input Resolution** | 224x224 pixels | | **Number of Classes** | 2 (`Real`, `Fake`) | | **Model Size** | ~372 MB | | **License** | Apache 2.0 | ## Datasets The model was trained on a carefully curated, balanced dataset of **40,000 images** (20,000 real, 20,000 fake), sampled from five diverse, high-quality sources to ensure robustness and generalization across various forgery types. | Dataset Name | Source | Description | | :--- | :--- | :--- | | **Deepfake and Real Images** | `manjilkarki/deepfake-and-real-images` | A foundational dataset of 190k human faces, split evenly between real and manipulated images created by various deepfake techniques. Images are 256x256 pixels[reference:0]. | | **HardFake vs Real Faces** | `hamzaboulahia/hardfakevsrealfaces` | A challenging test-oriented dataset of 1,288 high-quality images (700 fake, 589 real) designed to push the limits of detection models. Fake faces are generated using StyleGAN2, and real faces feature diverse attributes[reference:1]. | | **GRAVEX-200K** | `muhammadbilal6305/200k-real-vs-ai-visuals-by-mbilal` | A comprehensive multisource dataset of 200,000 face images, curated from six major sources including FaceForensics++, DFDC, Celeb-DF, and Stable Diffusion outputs (SD 1.5, 2.1, XL)[reference:2]. | | **DeepDetect-2025** | `ayushmandatta1/deepdetect-2025` | A large-scale dataset of over 112,000 images spanning diverse categories (people, animals, nature, urban, artworks), generated by cutting-edge models like DALL·E 3, Midjourney, and Stable Diffusion 3. | | **Super GenAI (SUT-Project)** | `hiddenplant/sut-project` | A dataset featuring high-fidelity images from the latest generative models, including Midjourney V6, Flux, and NanoBanana (SDXL), covering landscapes, portraits, and urban scenes. | ## Training Procedure The model was fine-tuned using a progressive unfreezing strategy to adapt the pre-trained SigLIP2 encoder while preventing catastrophic forgetting. All training was performed on a Tesla T4 GPU in Google Colab. ### Training Hyperparameters | Stage | Epochs | Learning Rate | Trainable Parameters | Description | | :--- | :--- | :--- | :--- | :--- | | **Stage 1** | 2 | 1e-3 | Classifier head only | Warm-up phase to adapt the new binary classification head. | | **Stage 2** | 3 | 5e-5 | Classifier + Top 6 Transformer Blocks | Gradual unfreezing to allow the model to learn task-specific features. | | **Stage 3** | 2 | 1e-5 | All layers | Full model fine-tuning with a very low learning rate for final convergence. | - **Batch Size:** 32 - **Optimizer:** AdamW - **Scheduler:** Cosine Annealing - **Loss Function:** Cross-Entropy Loss - **Data Augmentation:** Random Horizontal Flip, Random Rotation (10°), Color Jitter ### Performance Metrics Evaluation on a held-out validation set results: | Metric | Score | | :--- | :--- | | **Accuracy** | 78.5% | | AUC | > 0.86 | | F1 Score | ~0.78 | ## Usage You can load and use this model directly with the Hugging Face `transformers` library. ```python from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import torch # Load model and processor model_name = "king1oo1/ai-vs-real-deepfake-model" # Replace with your actual model ID processor = AutoImageProcessor.from_pretrained(model_name) model = AutoModelForImageClassification.from_pretrained(model_name) model.eval() # Load and preprocess an image image = Image.open("path/to/your/image.jpg").convert("RGB") inputs = processor(images=image, return_tensors="pt") # Run inference with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=1) fake_prob = probs[0][1].item() * 100 real_prob = probs[0][0].item() * 100 print(f"Fake probability: {fake_prob:.2f}%") print(f"Real probability: {real_prob:.2f}%") print(f"Verdict: {'FAKE' if fake_prob > 50 else 'REAL'}")