Robust HMM classification schemes for speaker recognition using integral decode

Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University

Who am I?

Speaker Recognition • Types of speaker recognition 

Speaker Recognition • Why is it hard? • Minimal training data • Background noise • Transducer mismatch • Channel distortions • People’s voices change over time and under stress • Performance

Feature Extraction • Extract speech • Spectral analysis • Cepstrum: • Cepstral means removal

Hidden Markov Models • Statistical pattern recognition • State dependent modeling • Distribution/state • Radial basis functions common • State sequence unobservable

HMM • Efficient decoders: • Training • EM algorithm • Convergence to local maxima guaranteed

Recognition • Model for each speaker • Maximum a priori (MAP) decision rule Arg Max Scores Features Models

The MAP decision rule • Optimal decision rule provided we have accurate distribution parameters & observations. • Problem: • Corruption of feature vectors. • Distribution known to be inaccurate.

A case of mistaken identity

Integral decode • Goal: Include uncorrupted observation ôt. • Problem: ôt unobservable. • Determine a local neighborhood t about ot and use a priori information to weight the likelihood:

Integral decode issues • Problems approximating the integral • High frame rate * number of models • Non-trivial dimensionality • Selection of the neighborhood

Approximating the integral • Monte Carlo impractical • Use simplified cubature technique:

Neighborhood choice • Choosing an appropriate neighborhood: • Upper bound difference neighborhoods [Merhav and Lee 93] • Error source modeling

Upper bound difference neighborhoods • Arbitrary signal pairs with a few general conditions. • PSD • Cepstra

Taking the upper bound • Asymptotic difference between cepstral parameters:

Error source modeling • Multiple error sources • Simplifying assumption of one normal distribution with zero mean • Use time series analysis to estimate the noise • Trend

Error Source Modeling • Estimate variance from detrended signal

Error source modeling • Problem: • is infinite • Solution: • Most of the points are outliers • Set percentage of distribution beyond which points are culled.

Complexity of integration • Expensive • Ways to reduce/cope • Implemented • Top K processing • Principle Components Analysis • Possible • Gaussian Selection • Sub-band Models • SIMD or MIMD parallelism

Top K Processing 1 second 3 seconds 5 seconds

Principal Component Analysis • Choose P most important directions

Principal Component Analysis • Integrate using new basis set for step function

Speech Corpus • King-92 • Used San Diego subset • 26 male speakers • Long distance telephone speech • Quiet room environment • 5 sessions recorded one week apart • 1-3 train • Sessions 4-5 partitioned into test segments

Baseline performance

1 second 3 seconds 5 seconds Integral decode performance

Integral decode with other conditions • Performance on • high quality speech • transducer mismatch

Future work • Extensions to the integral decode • Automatic parameter selection • Gaussian selection • distributed computation • Efficient multiple class preclassifiers

Optimal/utterance hyperparameters – 5 seconds KingNB26 KingWB51 SpidreF18XDR SpidreM27XDR

95% Confidence Intervals • Caveat: • Per speaker means • Large granularity

Pattern Recognition • Long term statistics [Bricker et al 71, Markel et al 77] • Vector Quantization [Soong et al 87] • HMM [Rosenberg et al 90, Tishby 91, Matsui & Furui 92, Reynolds et al 95] • Connectionist frameworks • Feed forward [Oglesby & Mason 90] • Learning vector quantization [He et al 99]

Pattern Recognition Contd. • Hybrid/Modified HMMs • Min Classification Error discriminant [Liu et al 95] • Tree structured neural classifiers [Liou & Mammone 95] • Trajectory modeling [Russell et al 85, Liu et al 95, Ostendorf et al 96, He et al 99] • Sub-band recognition [Besacier & Bonastre 97]

Robust HMM classification schemes for speaker recognition using integral decode

Robust HMM classification schemes for speaker recognition using integral decode

Presentation Transcript

Speaker Recognition

Speaker Recognition

Robust Speech recognition

Language modeling for speaker recognition

Speaker Recognition

Speech recognition using HMM

SPEAKER RECOGNITION

Speaker Recognition

Speaker Recognition

Optimal Schemes for Robust Web Extraction

HMM Profiles for Network Traffic Classification

Speaker Recognition

Robust speaker recognition over varying channels

Robust Activity Recognition

VOICE RECOGNITION USING AN HMM BASED DESIGN

Speaker Recognition

Speaker Recognition

Speaker Recognition

Robust Speaker Recognition

PROSODY MODELING AND EIGEN-PROSODY ANALYSIS FOR ROBUST SPEAKER RECOGNITION

Using Speaker Recognition

Automatic Attendance System Using Speaker Recognition