150 likes | 305 Views
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA
E N D
Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004
Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Experiments • Conclusions
Bat research motivations • Bats are among: • the most diverse, • the most endangered, • and the least studied mammals. • Close relationship with insects • agricultural impact • disease vectors • Acoustical research non-invasive, significant domain (echolocation) • Simplified biological acoustic communication system (compared to human speech)
Echolocation calls • Features (holistic) • Frequency extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated
Current classification methods • Expert spectrogram readers • Manual or automatic feature extraction • Comparison with exemplar spectrograms • Automatic classification • Decision trees • Discriminant function analysis Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).
Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Manual or automatic feature extraction • Determined by experts to be important for speech • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path
Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to implement • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to information theoretic (machine learning) paradigm of human ASR, better able to account for variability of speech, noise.
Information theoretic ASR • Data-driven models from computer science • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Expert information in feature extraction • Models account for feature, temporal variability
Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab
Experiment design • Hand labels • 436 calls (2% of data) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only • No hand-labeled calls excluded from experiments 1 2 3 4
Experiments • Baseline • Features • Zero crossing • MUSIC super resolution frequency estimator • Classifier • Discriminant function analysis, quadratic boundaries • DTW and HMM • Features • Frequency (MUSIC), log energy, first derivatives (HMM only) • HMM • 5 states/model • 4 Gaussian mixtures/state • diagonal covariances
Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW, MUSIC • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM, MUSIC • Test on train: 85.3 %
Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW, MUSIC HMM, MUSIC
Conclusions • Human ASR algorithms applicable to bat echolocation calls • Experiments • Weakness: accuracy of class labels • HMM most accurate, undertrained • MUSIC frequency estimate robust, slow • Machine learning • DTW: fast training, slow classification • HMM: slow training, fast classification
Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu • DTW reference: • L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 • HMM reference: • L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.