170 likes | 322 Views
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms. Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida, Gainesville, FL, USA May 19, 2005. Overview.
E N D
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida, Gainesville, FL, USA May 19, 2005
Overview • Motivations for acoustic bat detection • Machine learning paradigm • Detection experiments • Conclusions
Bat detection motivations • Bats are among the most diverse yet least studied mammals (~25% of all mammal species are bats). • Bats affect agriculture and carry diseases (directly or through parasites). • Acoustical domain is significant for echolocating bats and is non-invasive. • Recorded data can be volumous automated algorithms for objective and repeatable detection & classification desired.
Conventional methods • Conventional bat detection/classification parallels acoustic-phonetic paradigm of automatic speech recognition from 1970s. • Characteristics of acoustic phonetics: • Originally mimicked human expert methods • First, boundaries between regions determined • Second, features for each region were extracted • Third, features compared with decision trees, DFA • Limitations: • Boundaries ill-defined, sensitive to noise • Many feature extraction algorithms with varying degrees of noise robustness
Machine learning • Acoustic phonetics gave way to machine learning for ASR in 1980s: • Advantages: • Decisions based on more information • Mature statistical foundation for algorithms • Frame-based features, from expert knowledge • Improved noise robustness • For bats: increased detection range
Detection experiments • Database of bat calls • 7 different recording sites, 8 species • 1265 hand-labeled calls (from spectrogram readings) • Detection experiment design • Discrete events: 20-ms bins • Discrete outcomes: Yes or No: does a bin contain any part of a bat call?
Detectors • Baseline • Threshold for frame energy • Gaussian mixture model (GMM) • Model of probability distribution of call features • Threshold for model output probability • Hidden Markov model (HMM) • Similar to GMM, but includes temporal constraints through piecewise-stationary states • Threshold for model output probability along Viterbi path
Feature extraction • Baseline • Normalization: session noise floor at 0 dB • Feature: frame power • Machine learning • Blackman window, zero-padded FFT • Normalization: log amplitude mean subtraction • From ASR: ~cepstral mean subtraction • Removes transfer function of recording environment • Mean across time for each FFT bin • Features: • Maximum FFT amplitude, dB • Frequency at maximum amplitude, Hz • First and second temporal derivatives (slope, concavity)
Feature extraction examples Six features: Power, Frequency, P, F P, F
Conclusions • Machine learning algorithms improve detection when specificity is high (>.6). • HMM slightly superior to GMM, uses more temporal information, but slower to train/test. • Hand labels determined using spectrogram, biased towards high-power calls. • Machine learning models applicable to other species.
Bioacoustic applications • To apply machine learning to other species: • Determine ground truth training data through expert hand labels • Extract relevant frame-based features, considering domain-specific noise sources (echos, propellor noise, other biological sources) • Train models of features from hand-labeled data • Consider training “silence” models for discriminant detection/classification
Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu Acknowledgements Bat data kindly provided by: Brock Fenton, U. of Western Ontario, Canada