--- library_name: pytorch tags: - bioacoustics - audio-classification - spectrogram - cnn - vgg16 - dolphin - bottlenose-dolphin - whistle - whistle-detection - openwhistle pipeline_tag: audio-classification --- # OpenWhistle CNN VGG16 `OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16` is a supervised VGG16-based PyTorch classifier for bottlenose dolphin whistle detection. The model is part of the OpenWhistle family and predicts whether a spectrogram window contains a whistle or noise. ## Model Details - Model type: VGG16-based CNN classifier - Framework: PyTorch - Task: binary classification - Labels: `whistle` vs `noise` - Input: `224x224` RGB spectrogram - Checkpoint: `model_vgg_final_best.pt` - Best epoch: 4 - Best validation loss: 0.1805 The model operates on spectrogram image windows rather than raw waveform audio. ## Training and Evaluation Data The model was trained and evaluated using a session-disjoint train/validation/test protocol. Split summary: - Train: 53,828 windows across 195 sessions - Validation: 5,980 windows across 26 sessions - Test: 16,708 windows across 261 sessions Each split is balanced between whistle and noise windows. Test set composition: - 8,354 whistle windows - 8,354 matched noise windows The model is intended for use with the OpenWhistle CNN/detection workflow and related bottlenose dolphin whistle detection datasets. ## Intended Use This model is intended as a supervised whistle detector for bottlenose dolphin acoustic recordings. Potential uses include: - detecting whistle-like spectrogram windows - filtering long recordings before manual review - generating candidate whistle detections for downstream analysis - benchmarking whistle detection workflows on OpenWhistle-style spectrogram windows This is a binary detector, not a whistle category classifier. It predicts whistle presence versus noise. ## Metrics Validation metrics: - Loss: 0.1805 - Accuracy: 0.9460 - F1: 0.9443 - Precision: 0.9747 - Recall: 0.9157 Test metrics: - Loss: 0.1409 - Accuracy: 0.9723 - F1: 0.9725 - Precision: 0.9652 - Recall: 0.9799 Confusion matrix counts are available in `run_summary.json`. ## Input Format The model expects: - spectrogram image input - RGB format - spatial size: `224x224` - normalized tensor input matching the project inference pipeline The checkpoint is designed to be loaded through the OpenWhistle/DolphinWhistleExtractor PyTorch codebase. ## Loading ```python import torch checkpoint_path = "model_vgg_final_best.pt" checkpoint = torch.load(checkpoint_path, map_location="cpu") ``` Exact model reconstruction should use the VGG16 model definition from the OpenWhistle/DolphinWhistleExtractor codebase. ## Implementation Notes The VGG16 spectrogram-classification workflow was originally prototyped in a Keras/TensorFlow training script using ImageNet-pretrained VGG16 features. The released checkpoint is the PyTorch version of this workflow. Evaluation metrics and reporting use standard Python scientific tooling, including scikit-learn for ROC/AUC, F1, precision, and recall. ## Files This repository contains: - `model_vgg_final_best.pt` - `run_summary.json` - `validation_confusion_matrix.csv` - `test_confusion_matrix.csv` - `validation_session_metrics.csv` - `test_session_metrics.csv` - training and ROC plots in `figures/` ## Limitations - The model is specialized for bottlenose dolphin whistle detection on spectrogram windows. - Performance may change on other species, hydrophones, recording conditions, or spectrogram generation settings. - The model predicts whistle presence versus noise and does not classify whistle identity or whistle category. - Downstream ecological or behavioral interpretations should be validated independently. ## License The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.