210 likes | 762 Views
SPEAKER RECOGNITION. A PRESENTATION BY SHAMALEE DESHPANDE. INTRODUCTION. Speaker Recognition * Automatically recognizing speaker * Uses individual information from the speaker’s speech waves. INTRODUCTION. Two Approaches Text-Dependant Recognition
E N D
SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE
INTRODUCTION • Speaker Recognition * Automatically recognizing speaker * Uses individual information from the speaker’s speech waves
INTRODUCTION • Two Approaches Text-Dependant Recognition Text-Independent Recognition
INTRODUCTION • Two Approaches Text-Dependant Recognition *Use of keywords or sentences having the same text for the templates and the recognition Text-Independent Recognition
INTRODUCTION • Two Approaches Text-Dependant Recognition Text-Independent Recognition *Does not rely on a specific text being spoken.
INTRODUCTION • Classes of Sound: Voiced, unvoiced, Plosive • Production of Pitch Frequency and Formants Glottal Waveform
DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS • Feature should occur naturally and frequently in speech • Easily measurable • Doesn’t change over time or be affected by speakers health • Isn’t affected by background noise • Not be subject to mimicry
SOURCES OF VARIABILITY IN SPEECH • Phonetic Identity Two samples may correspond to different phonetic segments. E.g. Vowel and fricative • Pitch Pitch, other features like breathiness and amplitude can be varied • Speaker Differences due to source physiology, emotions • Microphone • Environment
Possible Acoustic Parameters * Formant Frequencies * LPC * Pitch * Nasal Co articulation * Gain
COMMON SPEAKER RECOGNITION TECHNIQUES • DISCRETE FOURIER TRANSFORM • LINEAR PREDICTIVE CODING • CEPSTRAL ANALYSIS • DYNAMIC TIME WARPING • HIDDEN MARKOV MODELS
DISCRETE / FAST FOURIER TRANSFORM • Changes time domain signals into freq domain signal representations • Enables reduced complexity for processor Read N speech samples from input Append N-L zeroes to the input data Calculation of DFT Windowing
LINEAR PREDICTIVE CODING TUBE Vocal tract BUZZER Glottal excitation Characterized by intensity and pitch Characterized by formants LPC model of the speech producing organs of the body
CEPSTRAL ANALYSIS • Dis-adv of DFT/FFT is that formant freqs may shift the pitch or overlap it • In Cepstral analysis, formants are completely removed from the spectrum • Defined as Fourier Transform of the Log of the power spectrum • S(n) = p(n) * v(n) • X(n) = w(n) * s(n) • S’(w) = p’(w) * v’(w) Fourier Transform • Log S’(w)=log p’(w) + log v’(w) • C(q)= log S’(q) = log p’(q) + log v’(q) • Q – quefrency , C(q) – complex cepstrum
CEPSTRAL ANALYSIS Window DFT LOG IDFT Speech Cepstrum
DYNAMIC TIME WARPING • Incoming speech is usually compared frame by frame with stored template • Achieved via a pair wise comparison of feature vectors from each sequence • Dis Adv – variation in length of corresponding phonemes • DTW takes into account non linear relation between lengths of the two signals • Used as a matching algorithm Example DTW grid
HIDDEN MARKOV MODELS • Speech signal is identified during search process rather than explicitly • Comprises of – Hidden Markov Chain representing temporal variability Observable process representing spectral variability • Portrayed as stochastic pair (X,Y) • HMM is a Finite State Machine where a Probability Density Function p(x|s) is associated with each state s
FUTURE RESEARCH • To extract and apply all levels and information from the speech signal conveying speaker identity • Acoustic – use spectral features conveying vocal tract information • Prosodic - use features derived from pitch, energy tracks to classify information • Phonetic – use phone sequences to characterize speaker specific pronunciations • Idiolect – use words to characterize user specific word patterns • Linguistic – use linguistic patterns to characterize speaker specific conversation style
APPLICATIONS • Access Control- physical facilities, computer networks and websites • PC Login and Password Reset • Secured Transactions – remote banking and online credit card purchase authentication • Time Attendance - workplaces • Law Enforcement – forensics, parole