1 / 19

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004.

barth
Download Presentation

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004

  2. Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Machine learning vs. expert knowledge • Experiments • Conclusions and future work

  3. Bat research motivations • Bats are among: • the most diverse (25% of all mammal species), • the most endangered, • and the least studied mammals. • Close relationship with insects • agricultural impact • disease vectors • Acoustical research • non-invasive (compared to netting) • significant domain (echolocation)

  4. More motivations • Calls simple compared to human speech • Same goals as human ASR • Detection • Feature extraction • Classification • Noise-robust performance • Easier to design/develop models • Domain between toy problems and ASR

  5. Bat echolocation • Ultrasonic, brief chirps (~active sonar) • Determine range, velocity of nearby objects (clutter, prey, other bats) • Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:

  6. Echolocation calls • Two characteristics • Frequency modulated (range information) • Constant frequency (velocity information) • Features (holistic) • Freq. extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated

  7. Current classification methods • Expert sonogram readers • Manual or automatic feature extraction • Griffin 1958, Fenton and Bell 1981 • Comparison with exemplar sonograms • Decision trees • Automatic classification • Discriminant function analysis • By far the most popular method in literature • Available in statistical software packages (SAS, SPSS) • Others • Artificial neural networks, Parsons 2001 • Spectrogram correlation, Pettersson Elektronik AB Parallels the 1970s acoustic-phonetic approach to human ASR.

  8. Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Mimics techniques of expert spectrogram readers • Manual or automatic feature extraction • Formants, voicing, duration, intensity, transitions • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path

  9. Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to train • Boundaries difficult to define • Coarticulation, reduction • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to machine learning paradigm of human ASR by 1980s: better able to account for variability of speech, noise.

  10. Machine learning ASR • Data-driven models • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Identical features from every frame • Expert information in feature extraction • Models account for feature, temporal variabilities Machine learning dominates state-of-the-art ASR.

  11. Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab • Portable

  12. Experiment design • Hand labels as ground truth • Narrowband spectrogram • 436 calls (2% of data) in 3 hours (80x real time) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only • No hand-labeled calls excluded from experiments 1 2 3 4

  13. Methods • Baseline, from the literature • Features • Duration • Zero crossing: Fmin, Fmax, Fmax_energy • MUSIC super resolution frequency estimator • Classifier • Discriminant function analysis, quadratic boundaries • DTW and HMM • Features • Frequency (MUSIC), log energy, Δs (HMM only) • HMM • 5 states/model • 4 Gaussian mixtures/state, diagonal covariances • Tests • Leave one out • Repeated trials: 25% test data, 1000 trials • Test on train data (HMM only)

  14. Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM • Test on train: 85.3 %

  15. Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW HMM

  16. Comments • Experiments • Weakness: accuracy of class labels • No labeled calls excluded, realistic • HMM most accurate, but undertrained • MUSIC frequency estimate robust, but 1000x slower than ZCA (20x real time) • Machine learning • Expert information still necessary • Feature extraction (dimensionality reduction) • Model parameters • DTW: fast training, slow classification • HMM: slow training, fast classification (real time)

  17. Future work • Ultimate goal • Real-time portable system for species ID • Commercial product possibilites • Feature extraction • Robust • Broadband noise • Echos • Unknown distance between bat and microphone • Chirp model, echo model • Faster frequency estimates • Match assumptions of classifiers

  18. More future work • Detection • Replace energy-based method with principled statistical methods using frame-based features • Classification • Accurate class labels for training • Netting • Record from known bat roosts (preferred) • Pseudo-sinusoidal input • Oscillator network • Echo state network

  19. Information • markskow@cnel.ufl.edu • http://www.cnel.ufl.edu/~markskow

More Related