Automatic Speaker Recognition in Military Environment

Automatic Speaker Recognition in Military Environment Corinna Harwardt 28.10.2009

Overview • Basics of automatic speaker recognition • The VerA system • GMM-UBM based system • Results on military relevant audio data • High-Level Features for improved speaker recognition • The Problem looked at in the PhD thesis: Different degrees of vocal effort in training and test data

Speaker recognition • The goal of speaker recognition is to determine the probability that a given speech signal is uttered by a certain speaker.

Speaker identification

Speaker verification

Typical configuration of a speaker recognition system

VerA • VerA – SprecherVerifikation militärisch relevanter Audiodaten (speaker verification on military relevant audio data) • Baseline: MFCC, GMM-UBM based system • Energy-based VAD (voice activity detection) • MFCC (mel frequency cepstrum coefficients) • Developed for speech recognition applications • acoustic features • Calculated on short parts of the signal (20 ms) • GMM (gaussian mixture models) • Statistical Modeling of the features extracted from the signal (e.g. MFCC)

Preliminary results on the Kiel corpus

Preliminary results on military relevant audio data

Comparison to other systems

High-Level Features I • … are features relying on linguistic content or features which are calculated on parts of the signal longer than the normally used approximately 20 ms in frame-based approaches • … might for example use prosodic, phonetic or idiolectal information. • … lead to additional information compared to acoustically motivated features like MFCCs • … shall therefore be used additionally to acoustic features and not exclusively • … are relatively robust against distortions

High-Level features II • Goal: Pick a high-level feature, which does not need a speech recognizer • High-Level features under consideration: • F0 statistics as proposed in (Reynolds et al. 2002 and Rose 2002) • Formant statistics (Becker et al. 2008)

Different degrees of vocal effort • Problem: The recognition performance degrades for several speech processing tasks if speech with high vocal effort is used without additional training (Becker et al. 2008). • The goal is either: • Find robust features for speaker recognition with normal and high-vocal effort. • Or: to find a method to predict the changes of acoustic features due to raised vocal effort.

References • D. Reynolds et al.: SuperSID Project Final Report – Exploiting High-Level Information for High-Performance Speaker Recognition. Department of Defense; National Science Foundation, 2002. • P. Rose: Forensic Speaker Identification, Taylor & Francis, 2002. • T. Becker, M. Jessen, and C. Grigoras, “Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models,” 9th Annual Conference of the International Speech Communication Association, 2008.

Automatic Speaker Recognition in Military Environment

Automatic Speaker Recognition in Military Environment

Presentation Transcript

Speaker Recognition

Speaker Recognition Research in Joensuu

Speaker Recognition

Automatic Speech Recognition

Developments in automatic speaker recognition at the BKA

Spectral Features for Automatic Text-Independent Speaker Recognition

Speaker Recognition

Automatic speech recognition

SPEAKER RECOGNITION

Speaker Recognition

Speaker Recognition

Speaker Recognition

Speaker Recognition Experiment

Automatic Speaker Recognition In Forensic Environment

Speaker Recognition

Speaker Recognition

Speaker Recognition

Robust Speaker Recognition

Automatic Speaker Recognition in Military Environment

Automatic Speaker Recognition: Technologies, Evaluations and Possible Future

Using Speaker Recognition

Automatic Speech Recognition