Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 3;14(7):e11636.
doi: 10.1002/ece3.11636. eCollection 2024 Jul.

BASSA: New software tool reveals hidden details in visualisation of low-frequency animal sounds

Affiliations

BASSA: New software tool reveals hidden details in visualisation of low-frequency animal sounds

Benjamin A Jancovich et al. Ecol Evol. .

Abstract

The study of animal sounds in biology and ecology relies heavily upon time-frequency (TF) visualisation, most commonly using the short-time Fourier transform (STFT) spectrogram. This method, however, has inherent bias towards either temporal or spectral details that can lead to misinterpretation of complex animal sounds. An ideal TF visualisation should accurately convey the structure of the sound in terms of both frequency and time, however, the STFT often cannot meet this requirement. We evaluate the accuracy of four TF visualisation methods (superlet transform [SLT], continuous wavelet transform [CWT] and two STFTs) using a synthetic test signal. We then apply these methods to visualise sounds of the Chagos blue whale, Asian elephant, southern cassowary, eastern whipbird, mulloway fish and the American crocodile. We show that the SLT visualises the test signal with 18.48%-28.08% less error than the other methods. A comparison between our visualisations of animal sounds and their literature descriptions indicates that the STFT's bias may have caused misinterpretations in describing pygmy blue whale songs and elephant rumbles. We suggest that use of the SLT to visualise low-frequency animal sounds may prevent such misinterpretations. Finally, we employ the SLT to develop 'BASSA', an open-source, GUI software application that offers a no-code, user-friendly tool for analysing short-duration recordings of low-frequency animal sounds for the Windows platform. The SLT visualises low-frequency animal sounds with improved accuracy, in a user-friendly format, minimising the risk of misinterpretation while requiring less technical expertise than the STFT. Using this method could propel advances in acoustics-driven studies of animal communication, vocal production methods, phonation and species identification.

Keywords: BASSA; Fourier transform; animal communication; animal vocalisation; bioacoustics; phonation; software; spectrogram; vocal production.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest and no relevant commercial relationships.

Figures

FIGURE 1
FIGURE 1
An illustration of the visualisation bias that can occur due to the trade‐off between temporal and spectral resolution. A synthetic signal with sampling frequency of 250 Hz is visualised as (a) a waveform in the time domain; (b) an STFT spectrogram biased towards frequency resolution, n = 250, overlap = 95% and nFFT = 2048; and (c) an STFT spectrogram biased towards time resolution, n = 25, overlap = 50% and nFFT = 2048. Detail is lost in both spectrograms, and neither fully captures the character of the signal.
FIGURE 2
FIGURE 2
The Chagos pygmy blue whale (Balaenoptera musculus brevicauda) song, (a) in the time domain, (b) in the frequency domain and (c) in the time–frequency domain, visualised using an STFT short‐time spectrogram. Fs = 120 Hz, n = 500, overlap = 90%, nFFT = 4096. Recording taken at the Chagos Archipelago, Indian Ocean, isolated by Dr Emmanuelle Leroy from the CTBTO's IMS hydrophone dataset, used with permission.
FIGURE 3
FIGURE 3
An example of a simple animal sound is the call of an eastern whipbird (Psophodes olivaceus). The call is visualised as (a) a waveform in the time domain, (b) a magnitude spectrum in the frequency domain and (c) a short‐time Fourier transform spectrogram in the time–frequency domain. Neither of the visualisations in (a) or (b) convey the full character of the signal. Recording is an excerpt taken from a longer recording, ‘Psophodes olivaceus (ML557908221)’, by David Secomb at Cardinia, Victoria, Australia, courtesy of The Macaulay Library at the Cornell Lab of Ornithology, and was used with permission.
FIGURE 4
FIGURE 4
Programmatic flow diagram of the MATLAB code written to perform quantitative analysis and comparison of TFR methods.
FIGURE 5
FIGURE 5
Construction of the synthetic test signal represented in the time domain. Panel (a) shows the carrier signal x𝑐 (𝑡), panel (b) shows the amplitude modulation signal 𝑥𝑚 (𝑡) and the bottom panel, (c) shows the final test signal 𝑥 (𝑡).
FIGURE 6
FIGURE 6
An example ground truth matrix for the carrier signal 𝑥𝑐 (𝑡), and the upper and lower sideband components 𝑢𝑠𝑏𝑖 (𝑡) and 𝑙𝑠𝑏𝑖 (𝑡) for 𝑖 = [1:3]. Fs = 250 Hz, fmin = 10 Hz, fmax = fs/2, f 0 = 50, f 1 = 30, fmod 0 = 2, fmod 1 = 7 and f res = 0.2 Hz.
FIGURE 7
FIGURE 7
The BASSA pre‐processing screen. This screen allows for trimming, resampling, amplitude normalisation and playback, as well as time‐domain and frequency‐domain visualisation export.
FIGURE 8
FIGURE 8
The BASSA superlet scalogram screen. This screen allows for configuration of the SLT, figure generation as well as post‐processing and export of the visualisation.
FIGURE 9
FIGURE 9
(a) Root mean square error (RMSE) between each algorithmic time–frequency representation (TFR) and the ground truth TFR. In addition to the matrix RMSE (total error), results are also given for the mean RMSE of every row (spectral error) and every column (temporal error). Lower error values indicate better agreement with ground truth. (b) Structural similarity index (SSI) between each algorithmic TFR and the ground truth TFR. Higher SSI scores indicate better agreement with ground truth.
FIGURE 10
FIGURE 10
(a) Ground truth time–frequency representation (TFR) of the test signal. (b) 50‐pt STFT spectrogram (short‐STFT) of test signal; overlap = 75%, nFFT = 1250, fs = 250 Hz and window shape = Hann. (c) 250‐pt STFT spectrogram (long‐STFT) of test signal; overlap = 75%, nFFT = 1250, fs = 250 Hz and window shape = Hann. (d) CWT scalogram of test signal; Fs = 250 Hz, time bandwidth product = 200, wavelet type = symmetric, analytic morse. (e) SLT scalogram of test signal, Fs = 250, initial number of cycles in superlet = 3 and interval of superresolution orders = [10, 40], multiplicative superresolution. An ideal TFR bears maximum resemblance to the ground truth TFR.
FIGURE 11
FIGURE 11
A recording of the Chagos pygmy blue whale (Balaenoptera musculus brevicauda) song, originally recorded at 250 Hz. The recording is visualised as: (a) a time‐domain waveform; (b) a short‐STFT spectrogram Fs = 120 Hz, n = 50, overlap = 90% and nFFT = 1200; (c) a long‐STFT spectrogram Fs = 120 Hz, n = 250, overlap = 90% and nFFT = 1200; (d) a CWT scalogram, time‐bandwidth product = 60 and voices per octave = 48; (e) an SLT scalogram, initial superlet cycles = 4, multiplicative superresolution order interval = 10:40, frequency range = 10–60 Hz and frequency resolution = 0.1 Hz. Recording taken at the Chagos Archipelago, Indian Ocean, isolated by Dr Emmanuelle Leroy from the CTBTO's IMS hydrophone dataset, used with permission.
FIGURE 12
FIGURE 12
A recording of two consecutive Asian elephant (Elephas maximus) rumbles, produced by two individuals, originally recorded at 48 kHz. The recording is visualised as: (a) a time‐domain waveform; (b) a short‐STFT spectrogram, Fs = 500 Hz, n = 50, overlap = 90% and nFFT = 5000; (c) a long‐STFT spectrogram, Fs = 500 Hz, n = 250, overlap = 90% and nFFT = 5000; (d) a CWT scalogram, time‐bandwidth product = 60 and voices per octave = 48; (e) an SLT scalogram, initial superlet cycles = 4, multiplicative superresolution order interval = 10:40, frequency range = 0–250 Hz and frequency resolution = 0.1 Hz. Recording provided by Marc Anderson of Wild Ambience used with permission.
FIGURE 13
FIGURE 13
A recording of an eastern whipbird (Psophodes olivaceus) song, originally recorded at 48 kHz. The recording is visualised as: (a) a time‐domain waveform; (b) a short‐STFT spectrogram, Fs = 20 kHz, n = 50, overlap = 90% and nFFT = 10,000; (c) a long‐STFT spectrogram, Fs = 20 kHz, n = 250, overlap = 90% and nFFT = 10,000; (d) a CWT scalogram, time‐bandwidth product = 60 and voices per octave = 48; (e) an SLT scalogram, initial superlet cycles = 4, multiplicative superresolution order interval = 10:40, frequency range = 1–10 kHz and frequency resolution = 2 Hz. Recording is an excerpt taken from a longer recording, ‘Psophodes olivaceus (ML557908221)’, by David Secomb at Cardinia, Victoria, Australia, courtesy of The Macaulay Library at the Cornell Lab of Ornithology, and was used with permission.
FIGURE 14
FIGURE 14
A recording of a southern cassowary (Casuarius casuarius) grunt, originally recorded at 48 kHz. The recording is visualised as: (a) a time‐domain waveform; (b) a short‐STFT spectrogram, Fs = 2 kHz, n = 50, overlap = 90% and nFFT = 20,000; (c) a long‐STFT spectrogram, Fs = 2 kHz, n = 250, overlap = 90% and nFFT = 20,000; (d) a CWT scalogram, time‐bandwidth product = 60 and voices per octave = 48; (e) an SLT scalogram, initial superlet cycles = 4, multiplicative superresolution order interval = 10:40, frequency range = 0 Hz to 1 kHz and frequency resolution = 0.1 Hz. Recording provided by Marc Anderson of Wild Ambience used with permission.
FIGURE 15
FIGURE 15
A recording of a mulloway (Argyrosomus japonicus) grunt, originally recorded at 5208 Hz. The recording is visualised as: (a) a time‐domain waveform; (b) a short‐STFT spectrogram, Fs = 2 kHz, n = 50, overlap = 90% and nFFT = 20,000; (c) a long‐STFT spectrogram, Fs = 2 kHz, n = 250, overlap = 90% and nFFT= 20,000; (d) a CWT scalogram, time‐bandwidth product = 60 and voices per octave = 48; (e) an SLT scalogram, initial superlet cycles = 4, multiplicative superresolution order interval = 10:40, frequency range = 10 Hz to 1 kHz and frequency resolution = 0.1 Hz. Original recording made at Swan River, Western Australia (Parsons et al., 2013), sourced from fishsounds.com (Looby et al., 2023).
FIGURE 16
FIGURE 16
A recording of an American crocodile (Crocodylus Acutus) mating growl, originally recorded at 16 kHz. The recording is visualised as: (a) a time‐domain waveform; (b) a short‐STFT spectrogram, Fs = 2 kHz, n = 50, overlap = 90% and nFFT= 20,000; (c) a long‐STFT spectrogram, Fs = 2 kHz, n = 250, overlap = 90% and nFFT = 20,000; (d) a CWT scalogram, time‐bandwidth product = 60 and voices per octave = 48; (e) an SLT scalogram, initial superlet cycles = 4, multiplicative superresolution order interval = 10:40, frequency range = 10 Hz to 1 kHz and frequency resolution = 0.1 Hz. Original recording was provided by Benko and Perc (2009) and was used with permission.

References

    1. Arts, L. P. A. , & van den Broek, E. L. (2022). The fast continuous wavelet transformation (fCWT) for real‐time, high‐quality, noise‐resistant time–frequency analysis. Nature Computational Science, 2(1), 47–58. 10.1038/s43588-021-00183-z - DOI - PMC - PubMed
    1. Baotic, A. , Stoeger, A. S. , Li, D. , Tang, C. , & Charlton, B. D. (2014). The vocal repertoire of infant giant pandas (Ailuropoda melanoleuca). Bioacoustics, 23(1), 15–28. 10.1080/09524622.2013.798744 - DOI
    1. Beeck, V. C. , Heilmann, G. , Kerscher, M. , & Stoeger, A. S. (2022). Sound visualization demonstrates velopharyngeal coupling and complex spectral variability in Asian elephants. Animals, 12(16), 2119. 10.3390/ani12162119 - DOI - PMC - PubMed
    1. Benko, T. P. , & Perc, M. (2009). Nonlinearities in mating sounds of American crocodiles. Biosystems, 97(3), 154–159. 10.1016/j.biosystems.2009.05.011 - DOI - PubMed
    1. Bhatt, K. , Jayanthi, N. , & Kumar, M. (2023). High‐resolution superlet transform based techniques for Parkinson's disease detection using speech signal. Applied Acoustics, 214, 109657. 10.1016/j.apacoust.2023.109657 - DOI

LinkOut - more resources