Heritage Temple Damage Assessment – Mixture-of-Experts (MoE)

Model Description

This is a Mixture-of-Experts (MoE) ensemble for automatically assessing structural damage in heritage temple images. It combines four pre‑trained expert models:

ResNet50 – texture‑sensitive, good for fine cracks and surface damage.
EfficientNet‑B4 – balanced accuracy/speed, robust to varying image quality.
ViT‑Base (patch16_224) – captures global context and structural deformations.
YOLO fallback CNN – a lightweight custom CNN that acts as a robust fallback for heavily corrupted or low‑resolution images.

A learned gating network dynamically weights the experts’ contributions per image. The final output is one of three damage classes:

Class	Criticality Grade
Undamaged	STABLE
Partial Damage	MINOR
Damaged	CRITICAL

The model also outputs per‑expert predictions, gate weights, and a continuous confidence score. A fallback chain (gate → uniform ensemble → mock) guarantees robustness in production.

Intended Uses & Limitations

Intended use: Automated preliminary damage screening for heritage site managers, conservation architects, and NGOs. The model is designed for images captured by drones, phones, or archival photographs (visible spectrum).

Limitations:

The training set is moderately imbalanced (fewer “Damaged” samples). Performance on rare damage types (e.g., severe spalling) may be lower.
The model was trained on a combination of publicly available damage datasets (concrete cracks, disaster infrastructure, surface cracks). It may not generalise equally to all temple architectures (e.g., brick vs. stone).
Very low‑resolution (< 224×224) or heavily compressed images degrade accuracy.
The model does not provide a continuous severity score; only discrete classes (future work).

Training Data

The model was fine‑tuned on a curated dataset of ~4,800 training images aggregated from:

Concrete crack images (classification)
Surface crack detection
Disaster infrastructure damage (CDD)
Building damage assessment datasets
QuakeSet (limited, due to access restrictions)

Images were resized to 224×224, augmented (random crop, flip, rotate, colour jitter, coarse dropout), and split 70/15/15 for training/validation/test. Class‑weighted sampling and focal loss were used to handle imbalance.

Training Procedure

All experts were initialised with ImageNet‑1k weights and fine‑tuned for 25 epochs (5 frozen backbone, 20 unfrozen). The gating network was trained for 15 epochs on frozen experts, using cross‑entropy + 0.01× load‑balancing loss. Gradient accumulation (effective batch 64), EMA, and mixup were applied. Training was done on a single Tesla T4 GPU (Kaggle).

Evaluation Results

On the held‑out test set (1,028 images):

Metric	Value
Accuracy	0.9850
Weighted F1	0.9853
Per‑class F1 (Undamaged)	0.99
Per‑class F1 (Partial)	1.00
Per‑class F1 (Damaged)	0.95

Expert‑only performance (test F1):

ResNet50: 0.9467
EfficientNet‑B4: 0.9641
ViT‑B16: 0.9792
YOLO fallback: 0.6278

The MoE ensemble outperforms every individual expert, demonstrating the benefit of adaptive weighting.

How to Use

The model is hosted on Hugging Face Hub and requires trust_remote_code=True because it includes a custom MoE architecture.

from transformers import AutoModelForImageClassification
from PIL import Image
import requests

# Load model from Hub
model = AutoModelForImageClassification.from_pretrained(
    "monarch8661/moe",
    trust_remote_code=True
)

# Load and preprocess an image
url = "https://example.com/temple_damage.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Run inference (returns a dict with all details)
outputs = model(image)
print(outputs["predicted_class"])          # e.g., "Partial Damage"
print(outputs["criticality"])              # "MINOR"
print(outputs["confidence"])               # 0.92
print(outputs["gate_weights"])             # [0.21, 0.45, 0.30, 0.04]
print(outputs["per_expert"])               # list of expert predictions

Downloads last month: -; Downloads are not tracked for this model. How to track