Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004

Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Machine learning vs. expert knowledge • Experiments • Conclusions and future work

Bat research motivations • Bats are among: • the most diverse (25% of all mammal species), • the most endangered, • and the least studied mammals. • Close relationship with insects • agricultural impact • disease vectors • Acoustical research • non-invasive (compared to netting) • significant domain (echolocation)

More motivations • Calls simple compared to human speech • Same goals as human ASR • Detection • Feature extraction • Classification • Noise-robust performance • Easier to design/develop models • Domain between toy problems and ASR

Bat echolocation • Ultrasonic, brief chirps (~active sonar) • Determine range, velocity of nearby objects (clutter, prey, other bats) • Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:

Echolocation calls • Two characteristics • Frequency modulated (range information) • Constant frequency (velocity information) • Features (holistic) • Freq. extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated

Current classification methods • Expert sonogram readers • Manual or automatic feature extraction • Griffin 1958, Fenton and Bell 1981 • Comparison with exemplar sonograms • Decision trees • Automatic classification • Discriminant function analysis • By far the most popular method in literature • Available in statistical software packages (SAS, SPSS) • Others • Artificial neural networks, Parsons 2001 • Spectrogram correlation, Pettersson Elektronik AB Parallels the 1970s acoustic-phonetic approach to human ASR.

Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Mimics techniques of expert spectrogram readers • Manual or automatic feature extraction • Formants, voicing, duration, intensity, transitions • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path

Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to train • Boundaries difficult to define • Coarticulation, reduction • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to machine learning paradigm of human ASR by 1980s: better able to account for variability of speech, noise.

Machine learning ASR • Data-driven models • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Identical features from every frame • Expert information in feature extraction • Models account for feature, temporal variabilities Machine learning dominates state-of-the-art ASR.

Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab • Portable

Experiment design • Hand labels as ground truth • Narrowband spectrogram • 436 calls (2% of data) in 3 hours (80x real time) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only • No hand-labeled calls excluded from experiments 1 2 3 4

Methods • Baseline, from the literature • Features • Duration • Zero crossing: Fmin, Fmax, Fmax_energy • MUSIC super resolution frequency estimator • Classifier • Discriminant function analysis, quadratic boundaries • DTW and HMM • Features • Frequency (MUSIC), log energy, Δs (HMM only) • HMM • 5 states/model • 4 Gaussian mixtures/state, diagonal covariances • Tests • Leave one out • Repeated trials: 25% test data, 1000 trials • Test on train data (HMM only)

Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM • Test on train: 85.3 %

Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW HMM

Comments • Experiments • Weakness: accuracy of class labels • No labeled calls excluded, realistic • HMM most accurate, but undertrained • MUSIC frequency estimate robust, but 1000x slower than ZCA (20x real time) • Machine learning • Expert information still necessary • Feature extraction (dimensionality reduction) • Model parameters • DTW: fast training, slow classification • HMM: slow training, fast classification (real time)

Future work • Ultimate goal • Real-time portable system for species ID • Commercial product possibilites • Feature extraction • Robust • Broadband noise • Echos • Unknown distance between bat and microphone • Chirp model, echo model • Faster frequency estimates • Match assumptions of classifiers

More future work • Detection • Replace energy-based method with principled statistical methods using frame-based features • Classification • Accurate class labels for training • Netting • Record from known bat roosts (preferred) • Pseudo-sinusoidal input • Oscillator network • Echo state network

Information • markskow@cnel.ufl.edu • http://www.cnel.ufl.edu/~markskow

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Presentation Transcript

Electrical, Computer, and Systems Engineering

Electrical, Computer, and Telecommunications Engineering

Computer Engineering Lab III

Electrical and Computer Engineering Concentrations

Computer Engineering Lab II

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Electrical and Computer Engineering

Electrical and Computer Engineering

Electrical, Computer, and Systems Engineering

Department of Electrical Engineering Computer Networking Lab

Advanced Computer Engineering Lab

Electrical and Computer Engineering Department

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Electrical and Computer Engineering Careers

Civil Engineering Architecture Rural and Surveying Engineering D Electrical Engineering

Electrical and Computer Engineering

Electrical and Computer Engineering and Network for Computational Nanotechnology

• Electrical Engineering • Computer Science

• Electrical Engineering • Computer Science

Electrical and Computer Engineering Dept.

Electrical Engineering Lab Instruments

Electrical and Computer Engineering