1 / 14

Automatic Speaker Recognition in Military Environment

Automatic Speaker Recognition in Military Environment. Corinna Harwardt 28.10.2009. Overview. Basics of automatic speaker recognition The VerA system GMM-UBM based system Results on military relevant audio data High-Level Features for improved speaker recognition

finley
Download Presentation

Automatic Speaker Recognition in Military Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Speaker Recognition in Military Environment Corinna Harwardt 28.10.2009

  2. Overview • Basics of automatic speaker recognition • The VerA system • GMM-UBM based system • Results on military relevant audio data • High-Level Features for improved speaker recognition • The Problem looked at in the PhD thesis: Different degrees of vocal effort in training and test data

  3. Speaker recognition • The goal of speaker recognition is to determine the probability that a given speech signal is uttered by a certain speaker.

  4. Speaker identification

  5. Speaker verification

  6. Typical configuration of a speaker recognition system

  7. VerA • VerA – SprecherVerifikation militärisch relevanter Audiodaten (speaker verification on military relevant audio data) • Baseline: MFCC, GMM-UBM based system • Energy-based VAD (voice activity detection) • MFCC (mel frequency cepstrum coefficients) • Developed for speech recognition applications • acoustic features • Calculated on short parts of the signal (20 ms) • GMM (gaussian mixture models) • Statistical Modeling of the features extracted from the signal (e.g. MFCC)

  8. Preliminary results on the Kiel corpus

  9. Preliminary results on military relevant audio data

  10. Comparison to other systems

  11. High-Level Features I • … are features relying on linguistic content or features which are calculated on parts of the signal longer than the normally used approximately 20 ms in frame-based approaches • … might for example use prosodic, phonetic or idiolectal information. • … lead to additional information compared to acoustically motivated features like MFCCs • … shall therefore be used additionally to acoustic features and not exclusively • … are relatively robust against distortions

  12. High-Level features II • Goal: Pick a high-level feature, which does not need a speech recognizer • High-Level features under consideration: • F0 statistics as proposed in (Reynolds et al. 2002 and Rose 2002) • Formant statistics (Becker et al. 2008)

  13. Different degrees of vocal effort • Problem: The recognition performance degrades for several speech processing tasks if speech with high vocal effort is used without additional training (Becker et al. 2008). • The goal is either: • Find robust features for speaker recognition with normal and high-vocal effort. • Or: to find a method to predict the changes of acoustic features due to raised vocal effort.

  14. References • D. Reynolds et al.: SuperSID Project Final Report – Exploiting High-Level Information for High-Performance Speaker Recognition. Department of Defense; National Science Foundation, 2002. • P. Rose: Forensic Speaker Identification, Taylor & Francis, 2002. • T. Becker, M. Jessen, and C. Grigoras, “Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models,” 9th Annual Conference of the International Speech Communication Association, 2008.

More Related