Automated Cetacean Vocalization Detection using Passive Acoustic Monitoring

Passive acoustic monitoring is one of the most effective tools available for studying marine mammals in their natural habitat. Unlike visual surveys, PAM systems can operate continuously, at depth, across large ocean areas, and in weather conditions that make ship-based observation impossible. The challenge is not collecting acoustic data. It is extracting reliable detections from recordings that are dominated by ocean noise, shipping traffic, biological interference, and equipment artifacts.

My research has focused on solving exactly this detection problem, developing automated pipelines that identify cetacean vocalizations in real hydrophone recordings without requiring a human analyst to review every spectrogram. The methods I have published span Bryde's whales, Antarctic blue whales, and mysticete species more broadly, and they share a common methodological thread: combining probabilistic temporal modeling with information-theoretic and spectral feature extraction to achieve reliable detection under real offshore noise conditions.

Why Not Just Use Deep Learning?

Deep learning approaches, particularly convolutional neural networks trained on spectrogram images, have shown impressive results on some cetacean detection benchmarks. But they come with real practical limitations in marine bioacoustics. They require large quantities of labelled training data, which is expensive and time-consuming to produce from real hydrophone recordings. They are opaque, making it difficult to understand why a detection was or was not triggered, which matters when the output informs conservation decisions. And they can be brittle when the acoustic environment shifts, for example when a new noise source enters the area or when the recording equipment changes.

The methods I developed work differently. They are trained on relatively small amounts of labelled data, they are interpretable in terms of the signal features that drive each detection, and they generalise reasonably well across recording conditions because they capture fundamental acoustic properties of cetacean calls rather than memorising specific spectrogram patterns.

Multiscale Sample Entropy

Sample entropy measures the unpredictability of a time series. A purely random signal has high entropy. A structured signal, like a cetacean vocalization with its characteristic frequency modulation and amplitude envelope, has lower entropy at certain temporal scales.

Multiscale sample entropy extends this by computing entropy at multiple coarse-graining scales. The signal is averaged into blocks of increasing length, and entropy is computed at each scale. The resulting entropy profile across scales, the multiscale entropy signature, turns out to be a highly discriminative feature for distinguishing cetacean calls from ocean noise. Noise has a characteristic entropy profile that increases monotonically with scale. Whale calls interrupt this pattern in ways that reflect their internal acoustic structure. This approach formed the basis of our 2025 paper in Entropy on blue whale vocalization detection.

Wavelet Feature Extraction

Wavelet decomposition provides a multiresolution view of a signal's frequency content. Unlike the Fourier transform, which analyzes the entire signal at once, wavelets localize energy in both time and frequency simultaneously. This makes them particularly well suited for cetacean vocalizations, which are short, frequency-modulated, and often swept across a specific band.

Using Daubechies-4 wavelets, we decompose each signal frame into sub-bands and compute the normalized energy at each decomposition level. The resulting energy distribution is highly characteristic of whale call types. Bryde's whale short pulse calls concentrate energy in a narrow frequency band that is clearly distinguishable from the broadband energy profile of background noise. This approach was central to our 2024 paper in Ecological Informatics on Antarctic blue whale classification.

Hidden Markov Model Detection

Cetacean calls have temporal structure. They are not isolated events but sequences with characteristic duration, inter-call intervals, and acoustic evolution over time. A detector that processes each frame independently misses this temporal context and produces both missed detections and false alarms that a temporally-aware model would not.

Gaussian Hidden Markov Models capture this structure by modeling the sequence of feature observations as a stochastic process with hidden states corresponding to background noise and whale call. The Viterbi algorithm finds the most probable state sequence given the observed features, naturally smoothing out brief spurious detections and capturing the characteristic duration of cetacean vocalizations. Combining HMM classification with DMD-based feature extraction, we achieved approximately 95% detection accuracy on real offshore hydrophone datasets, results reported in Ecological Informatics in 2021.

The PAM Cetacean Detection Project

The open-source project at github.com/babalolaseyip/pam-cetacean-detection implements this complete pipeline end to end. It generates synthetic Bryde's whale short pulse calls embedded in spectrally shaped ocean noise at realistic SNR conditions, extracts multiscale sample entropy and wavelet sub-band energy features, trains a Gaussian HMM on normal ocean noise, and evaluates detection performance using precision, recall, F1, and posterior probability visualizations.

The notebook runs directly in the browser via Binder with no installation required, and the figures are publication quality. The project is designed so that the synthetic data can be replaced with real hydrophone recordings by swapping the signal generation step, making it immediately useful for researchers with access to real PAM data.

Detection performance on the synthetic dataset reaches 99.3% accuracy with precision 0.91, recall 1.00, and F1 0.95, consistent with the results reported in the peer-reviewed literature on real datasets.