190 likes | 390 Views
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004.
E N D
Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004
Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Machine learning vs. expert knowledge • Experiments • Conclusions and future work
Bat research motivations • Bats are among: • the most diverse (25% of all mammal species), • the most endangered, • and the least studied mammals. • Close relationship with insects • agricultural impact • disease vectors • Acoustical research • non-invasive (compared to netting) • significant domain (echolocation)
More motivations • Calls simple compared to human speech • Same goals as human ASR • Detection • Feature extraction • Classification • Noise-robust performance • Easier to design/develop models • Domain between toy problems and ASR
Bat echolocation • Ultrasonic, brief chirps (~active sonar) • Determine range, velocity of nearby objects (clutter, prey, other bats) • Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:
Echolocation calls • Two characteristics • Frequency modulated (range information) • Constant frequency (velocity information) • Features (holistic) • Freq. extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated
Current classification methods • Expert sonogram readers • Manual or automatic feature extraction • Griffin 1958, Fenton and Bell 1981 • Comparison with exemplar sonograms • Decision trees • Automatic classification • Discriminant function analysis • By far the most popular method in literature • Available in statistical software packages (SAS, SPSS) • Others • Artificial neural networks, Parsons 2001 • Spectrogram correlation, Pettersson Elektronik AB Parallels the 1970s acoustic-phonetic approach to human ASR.
Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Mimics techniques of expert spectrogram readers • Manual or automatic feature extraction • Formants, voicing, duration, intensity, transitions • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path
Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to train • Boundaries difficult to define • Coarticulation, reduction • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to machine learning paradigm of human ASR by 1980s: better able to account for variability of speech, noise.
Machine learning ASR • Data-driven models • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Identical features from every frame • Expert information in feature extraction • Models account for feature, temporal variabilities Machine learning dominates state-of-the-art ASR.
Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab • Portable
Experiment design • Hand labels as ground truth • Narrowband spectrogram • 436 calls (2% of data) in 3 hours (80x real time) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only • No hand-labeled calls excluded from experiments 1 2 3 4
Methods • Baseline, from the literature • Features • Duration • Zero crossing: Fmin, Fmax, Fmax_energy • MUSIC super resolution frequency estimator • Classifier • Discriminant function analysis, quadratic boundaries • DTW and HMM • Features • Frequency (MUSIC), log energy, Δs (HMM only) • HMM • 5 states/model • 4 Gaussian mixtures/state, diagonal covariances • Tests • Leave one out • Repeated trials: 25% test data, 1000 trials • Test on train data (HMM only)
Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM • Test on train: 85.3 %
Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW HMM
Comments • Experiments • Weakness: accuracy of class labels • No labeled calls excluded, realistic • HMM most accurate, but undertrained • MUSIC frequency estimate robust, but 1000x slower than ZCA (20x real time) • Machine learning • Expert information still necessary • Feature extraction (dimensionality reduction) • Model parameters • DTW: fast training, slow classification • HMM: slow training, fast classification (real time)
Future work • Ultimate goal • Real-time portable system for species ID • Commercial product possibilites • Feature extraction • Robust • Broadband noise • Echos • Unknown distance between bat and microphone • Chirp model, echo model • Faster frequency estimates • Match assumptions of classifiers
More future work • Detection • Replace energy-based method with principled statistical methods using frame-based features • Classification • Accurate class labels for training • Netting • Record from known bat roosts (preferred) • Pseudo-sinusoidal input • Oscillator network • Echo state network
Information • markskow@cnel.ufl.edu • http://www.cnel.ufl.edu/~markskow