---
library_name: pytorch
tags:
- bioacoustics
- audio-classification
- spectrogram
- cnn
- vgg16
- dolphin
- bottlenose-dolphin
- whistle
- whistle-detection
- openwhistle
pipeline_tag: audio-classification
---

# OpenWhistle CNN VGG16

`OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16` is a supervised VGG16-based PyTorch classifier for bottlenose dolphin whistle detection.

The model is part of the OpenWhistle family and predicts whether a spectrogram window contains a whistle or noise.

## Model Details

- Model type: VGG16-based CNN classifier
- Framework: PyTorch
- Task: binary classification
- Labels: `whistle` vs `noise`
- Input: `224x224` RGB spectrogram
- Checkpoint: `model_vgg_final_best.pt`
- Best epoch: 4
- Best validation loss: 0.1805

The model operates on spectrogram image windows rather than raw waveform audio.

## Training and Evaluation Data

The model was trained and evaluated using a session-disjoint train/validation/test protocol.

Split summary:

- Train: 53,828 windows across 195 sessions
- Validation: 5,980 windows across 26 sessions
- Test: 16,708 windows across 261 sessions

Each split is balanced between whistle and noise windows.

Test set composition:

- 8,354 whistle windows
- 8,354 matched noise windows

The model is intended for use with the OpenWhistle CNN/detection workflow and related bottlenose dolphin whistle detection datasets.

## Intended Use

This model is intended as a supervised whistle detector for bottlenose dolphin acoustic recordings.

Potential uses include:

- detecting whistle-like spectrogram windows
- filtering long recordings before manual review
- generating candidate whistle detections for downstream analysis
- benchmarking whistle detection workflows on OpenWhistle-style spectrogram windows

This is a binary detector, not a whistle category classifier. It predicts whistle presence versus noise.

## Metrics

Validation metrics:

- Loss: 0.1805
- Accuracy: 0.9460
- F1: 0.9443
- Precision: 0.9747
- Recall: 0.9157

Test metrics:

- Loss: 0.1409
- Accuracy: 0.9723
- F1: 0.9725
- Precision: 0.9652
- Recall: 0.9799

Confusion matrix counts are available in `run_summary.json`.

## Input Format

The model expects:

- spectrogram image input
- RGB format
- spatial size: `224x224`
- normalized tensor input matching the project inference pipeline

The checkpoint is designed to be loaded through the OpenWhistle/DolphinWhistleExtractor PyTorch codebase.

## Loading

```python
import torch

checkpoint_path = "model_vgg_final_best.pt"
checkpoint = torch.load(checkpoint_path, map_location="cpu")
```

Exact model reconstruction should use the VGG16 model definition from the OpenWhistle/DolphinWhistleExtractor codebase.

## Implementation Notes

The VGG16 spectrogram-classification workflow was originally prototyped in a Keras/TensorFlow training script using ImageNet-pretrained VGG16 features. The released checkpoint is the PyTorch version of this workflow.

Evaluation metrics and reporting use standard Python scientific tooling, including scikit-learn for ROC/AUC, F1, precision, and recall.

## Files

This repository contains:

- `model_vgg_final_best.pt`
- `run_summary.json`
- `validation_confusion_matrix.csv`
- `test_confusion_matrix.csv`
- `validation_session_metrics.csv`
- `test_session_metrics.csv`
- training and ROC plots in `figures/`

## Limitations

- The model is specialized for bottlenose dolphin whistle detection on spectrogram windows.
- Performance may change on other species, hydrophones, recording conditions, or spectrogram generation settings.
- The model predicts whistle presence versus noise and does not classify whistle identity or whistle category.
- Downstream ecological or behavioral interpretations should be validated independently.

## License

The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.