140 likes | 321 Views
Automatic Speaker Recognition in Military Environment. Corinna Harwardt 28.10.2009. Overview. Basics of automatic speaker recognition The VerA system GMM-UBM based system Results on military relevant audio data High-Level Features for improved speaker recognition
E N D
Automatic Speaker Recognition in Military Environment Corinna Harwardt 28.10.2009
Overview • Basics of automatic speaker recognition • The VerA system • GMM-UBM based system • Results on military relevant audio data • High-Level Features for improved speaker recognition • The Problem looked at in the PhD thesis: Different degrees of vocal effort in training and test data
Speaker recognition • The goal of speaker recognition is to determine the probability that a given speech signal is uttered by a certain speaker.
VerA • VerA – SprecherVerifikation militärisch relevanter Audiodaten (speaker verification on military relevant audio data) • Baseline: MFCC, GMM-UBM based system • Energy-based VAD (voice activity detection) • MFCC (mel frequency cepstrum coefficients) • Developed for speech recognition applications • acoustic features • Calculated on short parts of the signal (20 ms) • GMM (gaussian mixture models) • Statistical Modeling of the features extracted from the signal (e.g. MFCC)
High-Level Features I • … are features relying on linguistic content or features which are calculated on parts of the signal longer than the normally used approximately 20 ms in frame-based approaches • … might for example use prosodic, phonetic or idiolectal information. • … lead to additional information compared to acoustically motivated features like MFCCs • … shall therefore be used additionally to acoustic features and not exclusively • … are relatively robust against distortions
High-Level features II • Goal: Pick a high-level feature, which does not need a speech recognizer • High-Level features under consideration: • F0 statistics as proposed in (Reynolds et al. 2002 and Rose 2002) • Formant statistics (Becker et al. 2008)
Different degrees of vocal effort • Problem: The recognition performance degrades for several speech processing tasks if speech with high vocal effort is used without additional training (Becker et al. 2008). • The goal is either: • Find robust features for speaker recognition with normal and high-vocal effort. • Or: to find a method to predict the changes of acoustic features due to raised vocal effort.
References • D. Reynolds et al.: SuperSID Project Final Report – Exploiting High-Level Information for High-Performance Speaker Recognition. Department of Defense; National Science Foundation, 2002. • P. Rose: Forensic Speaker Identification, Taylor & Francis, 2002. • T. Becker, M. Jessen, and C. Grigoras, “Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models,” 9th Annual Conference of the International Speech Communication Association, 2008.