---
title: DamageLensAI
sdk: docker
emoji: ⚡
colorFrom: red
colorTo: purple
pinned: true
---
# 🚗 DamageLens: AI-Powered Car Damage Detection

[![Python 3.11+](https://img.shields.io/badge/Python-3.11%2B-brightgreen)](https://python.org)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0%2B-red)](https://pytorch.org)
[![FastAPI](https://img.shields.io/badge/FastAPI-Latest-teal)](https://fastapi.tiangolo.com)
[![CI Pipeline](https://github.com/junaidariie/DamageLensAI/actions/workflows/ci.yaml/badge.svg)](https://github.com/junaidariie/DamageLensAI/actions/workflows/ci.yaml)
[![License](https://img.shields.io/badge/License-MIT-orange)](LICENSE)

---

## ⚠️ Important Notes

> **Cold Startup Time**: The API may take **4-5 minutes** on the first request to warm up the models. Subsequent predictions will be significantly faster.

> **Model Size**: The Fusion model is computationally intensive. Individual predictions typically complete in 30-60 seconds depending on hardware.

---

**APP LINK** : https://junaidariie.github.io/DamageLensAI/

**HF REPO** : https://huggingface.co/spaces/junaid17/DamageLensAI/tree/main

---

## 📋 Table of Contents

- [Overview](#-overview)
- [Features](#-features)
- [Architecture](#-architecture)
- [Model Performance](#-model-performance)
- [CI Pipeline](#-ci-pipeline)
- [Setup & Installation](#-setup--installation)
- [Usage](#-usage)
- [API Documentation](#-api-documentation)
- [Model Optimization](#-model-optimization)
- [Dataset & Training](#-dataset--training)
- [Web UI Features](#-web-ui-features)
- [Directory Structure](#-directory-structure)
- [Limitations & Known Issues](#-limitations--known-issues)

---

## 🎯 Overview

**DamageLens** is an advanced AI system for detecting and classifying car damage using multi-model fusion architecture. It combines the power of **ResNet-18**, **EfficientNet-V2-S**, and **ConvNeXt-Small** to achieve robust damage classification across vehicle front and rear sections.

The system can identify six damage categories:
- ✅ Front Normal / Front Breakage / Front Crushed
- ✅ Rear Normal / Rear Breakage / Rear Crushed

Additionally, it uses **YOLO object detection** to localize damage regions with bounding boxes.

---

## ✨ Features

| Feature | Description |
|---------|-------------|
| **Dual Model Architecture** | ResNet (lightweight) and Fusion (high-accuracy) options |
| **Grad-CAM Visualization** | Understand which image regions drive predictions |
| **Real-time YOLO Detection** | Localize damage with confidence scores |
| **FP16 Optimization** | Reduced model size (788MB → 135MB) with minimal accuracy loss |
| **FastAPI Backend** | High-performance REST API with async support |
| **Responsive Web UI** | Modern, interactive web interface with real-time feedback |
| **Static File Serving** | Efficient caching and delivery of results |
| **CI/CD Pipeline** | Automated testing via GitHub Actions on every push/PR |
| **HuggingFace Integration** | Models auto-downloaded from HF Hub on first startup |

---

## 🏗️ Architecture

### System Overview

```
┌──────────────────────────────────────────────────────┐
│                   Frontend (Web UI)                  │
│  HTML / CSS / JavaScript  (Dark Mode, Glassmorphism) │
│  ├─ Drag & Drop Image Upload                         │
│  ├─ Model Selection (Fusion / ResNet)                │
│  └─ Real-time Result Tabs (Prediction/GradCAM/YOLO)  │
└───────────────────┬──────────────────────────────────┘
                    │ REST API (JSON)
┌───────────────────▼──────────────────────────────────┐
│              FastAPI Backend  (app.py)               │
│  ├─ POST /predict/resnet    → ResNet inference       │
│  ├─ POST /predict/fusion    → Fusion inference       │
│  ├─ POST /predict?mode=*    → Grad-CAM generation    │
│  └─ POST /predict/yolo      → YOLO detection         │
│                                                      │
│  Lifespan: models loaded once at startup             │
│  Static:   /static/uploads  /static/results          │
└──────┬───────────┬──────────────┬────────────────────┘
       │           │              │
┌──────▼──┐  ┌─────▼──────┐  ┌───▼──────────┐
│ ResNet  │  │   Fusion   │  │  YOLO v11m   │
│  (77%)  │  │   (84%)    │  │  Detection   │
└──────┬──┘  └─────┬──────┘  └───┬──────────┘
       │           │              │
       └─────┬─────┘              │
             │                    │
     ┌───────▼──────┐    ┌────────▼────────┐
     │  Grad-CAM    │    │  Bounding Boxes │
     │  Heatmaps    │    │  + Confidence   │
     └──────────────┘    └─────────────────┘
```

### Model Loading (scripts/load_models.py)

```
Startup
  │
  ├─ hf_hub_download("junaid17/car-damage-classifier")
  │       └─> ResnetCarDamagePredictor(checkpoint, class_map)
  │
  ├─ hf_hub_download("junaid17/best_fusion_model_fp16")
  │       └─> FusionCarDamagePredictor(checkpoint, class_map)
  │
  └─ hf_hub_download("junaid17/Yolo_Model")
          └─> YOLO(checkpoint)
```

### Fusion Model (High Accuracy — 84%)

```
┌─────────────────────────────────────────────────────────────────┐
│                          INPUT IMAGE                            │
│                         (3, 260, 260)                           │
└────────────────┬────────────────────────────────┬──────────────┘
                 │                                │
         ┌───────▼────────┐             ┌─────────▼────────┐
         │ EfficientNet-  │             │  ConvNeXt-Small  │
         │ V2-S Backbone  │             │  Backbone        │
         │                │             │                  │
         │ Frozen except  │             │ Frozen except    │
         │ features[5,6,7]│             │ stages[2,3] +    │
         │ (unfrozen)     │             │ layernorm        │
         └───────┬────────┘             └─────────┬────────┘
                 │                                │
         ┌───────▼────────┐             ┌─────────▼────────┐
         │ AdaptiveAvg    │             │  Pooler Output   │
         │ Pool → Flatten │             │                  │
         └───────┬────────┘             └─────────┬────────┘
                 │  (1280,)                        │  (768,)
                 └──────────────┬─────────────────┘
                                │
                        ┌───────▼────────┐
                        │  CONCATENATE   │
                        │  1280 + 768    │
                        │  = (2048,)     │
                        └───────┬────────┘
                                │
                    ┌───────────▼───────────┐
                    │   FUSION HEAD         │
                    │  Dropout(0.4)         │
                    │  Linear(2048 → 512)   │
                    │  LayerNorm(512)       │
                    │  GELU()               │
                    │  Dropout(0.3)         │
                    │  Linear(512 → 256)    │
                    │  LayerNorm(256)       │
                    │  GELU()               │
                    │  Dropout(0.2)         │
                    │  Linear(256 → 6)      │
                    └───────────┬───────────┘
                                │
                        ┌───────▼────────┐
                        │ OUTPUT LOGITS  │
                        │  (6 classes)   │
                        └────────────────┘
```

**Optimizer**: AdamW with per-group learning rates
- EfficientNet features[5]: lr=1e-5
- EfficientNet features[6,7]: lr=3e-5
- ConvNeXt stages[2,3] + layernorm: lr=3e-5
- Fusion head: lr=1e-4
- Loss: CrossEntropyLoss with label_smoothing=0.1
- Early stopping patience: 7

### ResNet-18 (Lightweight — 77%)

```
┌──────────────────────────────────┐
│      INPUT IMAGE                 │
│     (3, 128, 128)                │
└───────────────┬──────────────────┘
                │
        ┌───────▼─────────┐
        │   ResNet-18     │
        │   Backbone      │
        │                 │
        │  Frozen except  │
        │  layer3, layer4 │
        └───────┬─────────┘
                │  (512,)
        ┌───────▼─────────────────────┐
        │  Classification Head        │
        │  Dropout(0.5)               │
        │  Linear(512 → 256)          │
        │  ReLU()                     │
        │  Dropout(0.3)               │
        │  Linear(256 → 6 classes)    │
        └───────┬─────────────────────┘
                │
        ┌───────▼──────────┐
        │  OUTPUT LOGITS   │
        │  (6 classes)     │
        └──────────────────┘
```

**Optimizer**: AdamW with per-group learning rates
- layer3: lr=1e-5
- layer4: lr=1e-5
- fc head: lr=1e-4
- Loss: CrossEntropyLoss
- Early stopping patience: 7

### YOLO v11m Integration

```
┌─────────────────────────────┐
│   INPUT IMAGE               │
│   imgsz=640, conf=0.05      │
└──────────────┬──────────────┘
               │
       ┌───────▼────────┐
       │  YOLO v11m     │
       │  Inference     │
       └───────┬────────┘
               │
    ┌──────────┴──────────┐
    │                     │
┌───▼───────┐      ┌──────▼──────┐
│ Bboxes    │      │ Confidence  │
│ (x1,y1,   │      │ Scores +    │
│  x2,y2)   │      │ Class Label │
└───┬───────┘      └──────┬──────┘
    └──────────┬──────────┘
               │
       ┌───────▼────────┐
       │ result.plot()  │
       │ Save to disk   │
       └────────────────┘
```

### Grad-CAM Pipeline (scripts/gradcam.py)

```
Image Path
    │
    ├─ ResNet mode:  target_layer = model.layer4[-1]
    └─ Fusion mode:  target_layer = model.eff_features[-1]
                     (FP16 → FP32 cast on CPU automatically)
    │
    ├─ Register forward hook  (_GradCAMHook)
    ├─ Forward pass → score.backward()
    ├─ acts [C,H,W]  ×  weights (mean of grads) → CAM [H,W]
    ├─ ReLU → normalize → resize to original dims
    └─ cv2.applyColorMap(COLORMAP_JET) → addWeighted overlay
```

### Data Pipeline (src/data/)

```
Raw Images (data/dataset/)
    │
    ├─ ingestion.py   → scan folders, build file list
    ├─ preprocessing.py → validate / clean images
    ├─ augmentation.py  → train/val transforms
    │     ResNet:  Resize(128,128) + HFlip + Rotation(15°) + ColorJitter
    │     Fusion:  Resize(260,260) + HFlip + Rotation(10°) + ColorJitter
    └─ dataset.py   → ImageFolder DataLoaders
                       (train 80% / val 20%, seed=42)
```

### Export & Deployment (src/export/)

```
Trained Checkpoints (checkpoints/)
    │
    ├─ conver_model.py         → FP32 → FP16 conversion
    │                            788MB → 135MB (82.9% reduction)
    └─ upload_to_huggingface.py → HfApi upload to:
          junaid17/new-damagelens-resnet-classifier
          junaid17/new-damagelens-fusion-fp16
          junaid17/new-damagelens-yolo-detector
```

---

## 📊 Model Performance

### Fusion Model (High Accuracy — 84% Overall)

**Classification Report:**

![Fusion Classification Report](assets/fusion_classification_report.png)

**Confusion Matrix:**

![Fusion Confusion Matrix](assets/fusion_confusion_matrix.png)

**Training Curves:**

![Fusion Training Curves](assets/fusion_training_curves.png)

---

### ResNet-18 (Lightweight — 77% Overall)

**Classification Report:**

![ResNet Classification Report](assets/resnet_classification_report.png)

**Confusion Matrix:**

![ResNet Confusion Matrix](assets/resnet_confusion_matrix.png)

**Training Curves:**

![ResNet Training Curves](assets/resnet_training_curves.png)

---

### YOLO Detection Results

![YOLO Detection Sample](assets/yolo_detection_sample.jpg)

---

## 🔁 CI Pipeline

DamageLens uses **GitHub Actions** for continuous integration. Every push or pull request to `main`, `master`, or `dev` triggers the full test suite automatically.

**CI Screenshot (GitHub Actions — All Tests Passing):**

![CI Pipeline Passing](assets/ci_pipeline_passing.png)

### What the pipeline tests:

| Step | Test File | What it covers |
|------|-----------|----------------|
| Config | `test_config.py` | Paths, constants, class map |
| Ingestion | `test_ingestion.py` | Dataset folder scanning |
| Preprocessing | `test_preprocessing.py` | Image validation & cleaning |
| Augmentation | `test_augmentation.py` | Transform pipelines |
| Dataset | `test_dataset.py` | DataLoader creation |
| ResNet Architecture | `test_resnet_model.py` | Model init & forward pass |
| ResNet Training | `test_train_resnet.py` | Smoke test training loop |

### Pipeline config (`.github/workflows/ci.yaml`):
- Runs on: `ubuntu-latest`
- Python: `3.10`
- Triggers: push & PR to `main` / `master` / `dev`

---

## 🚀 Setup & Installation

### Prerequisites

- Python 3.11+
- CUDA 11.8+ (for GPU acceleration, optional but recommended)
- 8GB+ RAM (16GB recommended for Fusion model)

### Installation Steps

```bash
# Clone the repository
git clone https://github.com/junaid17/damagelens.git
cd DamageLens

# Create virtual environment
python -m venv myvenv
source myvenv/bin/activate  # On Windows: myvenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create required directories
mkdir -p static/uploads static/results checkpoints assets
```

### Download Pre-trained Models

Models are automatically downloaded from Hugging Face on first use:
- `car-damage-classifier.pt` — ResNet-18 checkpoint
- `best_fusion_model_fp16.pt` — Fusion model (FP16 optimized, 135MB)
- `damage_detector.pt` — YOLO v11m model

---

## 💻 Usage

### Running the FastAPI Server

```bash
uvicorn app:app --reload --host 127.0.0.1 --port 8000
```

Open your browser at `http://127.0.0.1:8000`

#### Quick Start:
1. Upload a car image (JPG/PNG)
2. Select analysis mode: **Fusion** (accurate) or **ResNet** (fast)
3. Click "Run AI Analysis"
4. View results in tabs:
   - 📊 **Prediction**: Confidence scores and probabilities
   - 👀 **Grad-CAM**: Visualize which regions influenced the prediction
   - 🎯 **YOLO**: Damage bounding boxes with confidence

### Python API Example

```python
import requests

with open('car_image.jpg', 'rb') as f:
    files = {'image': f}
    resp = requests.post('http://127.0.0.1:8000/predict/resnet', files=files)
    print(resp.json())

with open('car_image.jpg', 'rb') as f:
    files = {'image': f}
    resp = requests.post('http://127.0.0.1:8000/predict/fusion', files=files)
    print(resp.json())
```

---

## 📡 API Documentation

### `POST /predict/resnet`
```
Content-Type: multipart/form-data
Body: image (File)

Response:
{
  "status": "success",
  "prediction": {
    "Rear Normal": 0.47,
    "Front Normal": 0.25,
    ...
  }
}
```

### `POST /predict/fusion`
```
Content-Type: multipart/form-data
Body: image (File)

Response:
{
  "status": "success",
  "prediction": {
    "Rear Normal": 0.49,
    "Front Normal": 0.35,
    ...
  }
}
```

### `POST /predict?mode={resnet|fusion}` — Grad-CAM
```
Content-Type: multipart/form-data
Body: file (File), mode (String)

Response:
{
  "status": "success",
  "mode": "fusion",
  "original_image": "/static/uploads/{uuid}_input.jpg",
  "selected_viz": "/static/results/{uuid}_fusion.jpg",
  "resnet_viz": null,
  "fusion_viz": "/static/results/{uuid}_fusion.jpg"
}
```

### `POST /predict/yolo`
```
Content-Type: multipart/form-data
Body: file (File)

Response:
{
  "status": "success",
  "original_image": "/static/uploads/{uuid}_input.jpg",
  "yolo_image": "/static/results/{uuid}_yolo.jpg",
  "detections": [
    { "label": "damage", "confidence": 0.87, "box": [x1, y1, x2, y2] }
  ],
  "total_detections": 2,
  "message": "Detections found"
}
```

---

## 🔧 Model Optimization

### FP16 Conversion (Fusion Model)

```
Original Model (FP32):     788 MB
Optimized Model (FP16):    135 MB
───────────────────────────────────
Compression Ratio:         82.9% reduction ✅
Accuracy Loss:             < 1%            ⚠️
Speed Improvement:         ~1.3x faster   ⚡
```

The system auto-detects FP16 checkpoints at load time:

```python
if first_tensor.dtype == torch.float16:
    model = model.half()

# Grad-CAM on CPU: FP16 → FP32 cast applied automatically
if is_half:
    model = model.float()
```

---

## 📚 Dataset & Training

### Data Constraints

- **Total Samples**: ~1,800 images
- **Train/Val Split**: 80/20 (seed=42)
- **Classes**: 6 (F_Breakage, F_Crushed, F_Normal, R_Breakage, R_Crushed, R_Normal)
- **YOLO subset**: ~100 annotated images (train/val split)

### Data Augmentation

| Transform | ResNet | Fusion |
|-----------|--------|--------|
| Resize | 128×128 | 260×260 |
| RandomHorizontalFlip | ✅ | ✅ |
| RandomRotation | ±15° | ±10° |
| ColorJitter (b/c/s) | ±20% | ±15% |
| ImageNet Normalize | ✅ | ✅ |

### Training Configuration

| Setting | ResNet | Fusion |
|---------|--------|--------|
| Backbone | ResNet-18 | EfficientNet-V2-S + ConvNeXt-Small |
| Frozen layers | All except layer3, layer4 | All except features[5,6,7] / stages[2,3] |
| Optimizer | AdamW | AdamW (per-group LR) |
| Loss | CrossEntropyLoss | CrossEntropyLoss (label_smoothing=0.1) |
| Early stopping | patience=7 | patience=7 |
| Input size | 128×128 | 260×260 (EfficientNet) / 224×224 (ConvNeXt) |

---

## 🎨 Web UI Features

- Dark mode glassmorphism design
- Drag & drop image upload
- Model selection dropdown (Fusion / ResNet)
- Real-time confidence bar animation
- Tab navigation: Prediction → Grad-CAM → YOLO
- Scan line effect during processing
- Plotly bar chart for class probabilities
- Side-by-side original vs heatmap comparison

---

## 🔍 Grad-CAM Visualization

Gradient-weighted Class Activation Mapping highlights which image regions most influenced the model's prediction.

```
Original Image    +    Grad-CAM Heatmap    =    Overlay
                       Red   = High importance
                       Blue  = Low importance
```

- ResNet: hooks into `layer4[-1]`
- Fusion: hooks into `eff_features[-1]` (EfficientNet's last block)

---

## 📋 Directory Structure

```
DamageLens/
├── app.py                              # FastAPI app + all endpoints
├── index.html                          # Web UI
├── requirements.txt
├── README.md
│
├── .github/
│   └── workflows/
│       └── ci.yaml                     # GitHub Actions CI pipeline
│
├── assets/                             # ← Place README images here
│   ├── fusion_classification_report.png
│   ├── fusion_confusion_matrix.png
│   ├── fusion_training_curves.png
│   ├── resnet_classification_report.png
│   ├── resnet_confusion_matrix.png
│   ├── resnet_training_curves.png
│   ├── yolo_detection_sample.png
│   └── ci_pipeline_passing.png
│
├── scripts/
│   ├── prediction_helper.py            # ResNet + Fusion model classes & inference
│   ├── gradcam.py                      # Grad-CAM (ResNet + Fusion, CPU-optimized)
│   ├── load_models.py                  # HF Hub download + model initialization
│   └── yolo_predict.py                 # YOLO inference + bbox drawing
│
├── src/
│   ├── config.py                       # Paths, hyperparams, class map
│   ├── data/
│   │   ├── ingestion.py                # Dataset folder scanning
│   │   ├── preprocessing.py            # Image validation
│   │   ├── augmentation.py             # Train/val transforms
│   │   └── dataset.py                  # DataLoader creation
│   ├── models/
│   │   ├── resnet_model.py             # CarClassifierResNet
│   │   └── fusion_model.py             # FusionClassifier
│   ├── training/
│   │   ├── trainer.py                  # Generic train loop (single + dual input)
│   │   ├── train_resnet.py             # ResNet training entry point
│   │   ├── train_fusion.py             # Fusion training entry point
│   │   └── train_yolo.py               # YOLO fine-tuning
│   └── export/
│       ├── conver_model.py             # FP32 → FP16 conversion
│       └── upload_to_huggingface.py    # HF Hub upload script
│
├── checkpoints/
│   ├── best_resnet_model.pt
│   ├── best_fusion_model_fp16.pt
│   ├── damage_detector.pt
│   └── yolo11m.pt
│
├── Notebooks/
│   ├── Resnet18_fine_tuning_final.ipynb
│   ├── EfficientNet_ConvNext_Fusion.ipynb
│   └── damage_detector_yolo.ipynb
│
├── test/
│   ├── test_config.py
│   ├── test_ingestion.py
│   ├── test_preprocessing.py
│   ├── test_augmentation.py
│   ├── test_dataset.py
│   ├── test_resnet_model.py
│   ├── test_fusion_model.py
│   ├── test_train_resnet.py
│   ├── test_train_fusion.py
│   ├── test_train_yolo.py
│   ├── test_model_conversion.py
│   └── test_upload_to_huggingface.py
│
├── data/
│   ├── dataset/                        # 6-class image folders
│   │   ├── F_Breakage/
│   │   ├── F_Crushed/
│   │   ├── F_Normal/
│   │   ├── R_Breakage/
│   │   ├── R_Crushed/
│   │   └── R_Normal/
│   └── yolo/                           # YOLO annotated subset
│       ├── train/images + labels/
│       ├── val/images + labels/
│       └── dataset_custom.yaml
│
└── static/
    ├── uploads/                        # Temp uploaded images
    └── results/                        # Generated Grad-CAM / YOLO outputs
```

---

## ⚠️ Limitations & Known Issues

### Data Constraints
- **Limited Training Data**: ~1,800 samples — may show variance on edge cases
- **Class Imbalance**: Rear Crushed class has fewer samples, affecting recall

### Performance

| Metric | Value | Note |
|--------|-------|------|
| ResNet Inference | ~500ms | Fast, lower accuracy |
| Fusion Inference | 30-60s | Accurate, computationally heavy |
| Cold Startup | 4-5 min | HF Hub download + model warmup |
| GPU Memory | ~4GB | For Fusion model |
| ResNet Accuracy | 77% | Lightweight trade-off |
| Fusion Accuracy | 84% | Best accuracy |

### Technical Limitations
- Fusion accuracy is **7% higher** than ResNet (84% vs 77%)
- YOLO model may miss small or partially occluded damage
- Grad-CAM is for diagnostic/explainability purposes only
- Batch processing not currently supported
- FP16 Grad-CAM on CPU requires automatic FP32 cast (handled internally)