1 / 19

SPEAKER RECOGNITION

SPEAKER RECOGNITION. A PRESENTATION BY SHAMALEE DESHPANDE. INTRODUCTION. Speaker Recognition * Automatically recognizing speaker * Uses individual information from the speaker’s speech waves. INTRODUCTION. Two Approaches Text-Dependant Recognition

mea
Download Presentation

SPEAKER RECOGNITION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE

  2. INTRODUCTION • Speaker Recognition * Automatically recognizing speaker * Uses individual information from the speaker’s speech waves

  3. INTRODUCTION • Two Approaches Text-Dependant Recognition Text-Independent Recognition

  4. INTRODUCTION • Two Approaches Text-Dependant Recognition *Use of keywords or sentences having the same text for the templates and the recognition Text-Independent Recognition

  5. INTRODUCTION • Two Approaches Text-Dependant Recognition Text-Independent Recognition *Does not rely on a specific text being spoken.

  6. INTRODUCTION • Classes of Sound: Voiced, unvoiced, Plosive • Production of Pitch Frequency and Formants Glottal Waveform

  7. BLOCKDIAGRAM OF A SPEAKER RECOGNITION SYSTEM

  8. DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS • Feature should occur naturally and frequently in speech • Easily measurable • Doesn’t change over time or be affected by speakers health • Isn’t affected by background noise • Not be subject to mimicry

  9. SOURCES OF VARIABILITY IN SPEECH • Phonetic Identity Two samples may correspond to different phonetic segments. E.g. Vowel and fricative • Pitch Pitch, other features like breathiness and amplitude can be varied • Speaker Differences due to source physiology, emotions • Microphone • Environment

  10. Possible Acoustic Parameters * Formant Frequencies * LPC * Pitch * Nasal Co articulation * Gain

  11. COMMON SPEAKER RECOGNITION TECHNIQUES • DISCRETE FOURIER TRANSFORM • LINEAR PREDICTIVE CODING • CEPSTRAL ANALYSIS • DYNAMIC TIME WARPING • HIDDEN MARKOV MODELS

  12. DISCRETE / FAST FOURIER TRANSFORM • Changes time domain signals into freq domain signal representations • Enables reduced complexity for processor Read N speech samples from input Append N-L zeroes to the input data Calculation of DFT Windowing

  13. LINEAR PREDICTIVE CODING TUBE Vocal tract BUZZER Glottal excitation Characterized by intensity and pitch Characterized by formants LPC model of the speech producing organs of the body

  14. CEPSTRAL ANALYSIS • Dis-adv of DFT/FFT is that formant freqs may shift the pitch or overlap it • In Cepstral analysis, formants are completely removed from the spectrum • Defined as Fourier Transform of the Log of the power spectrum • S(n) = p(n) * v(n) • X(n) = w(n) * s(n) • S’(w) = p’(w) * v’(w) Fourier Transform • Log S’(w)=log p’(w) + log v’(w) • C(q)= log S’(q) = log p’(q) + log v’(q) • Q – quefrency , C(q) – complex cepstrum

  15. CEPSTRAL ANALYSIS Window DFT LOG IDFT Speech Cepstrum

  16. DYNAMIC TIME WARPING • Incoming speech is usually compared frame by frame with stored template • Achieved via a pair wise comparison of feature vectors from each sequence • Dis Adv – variation in length of corresponding phonemes • DTW takes into account non linear relation between lengths of the two signals • Used as a matching algorithm Example DTW grid

  17. HIDDEN MARKOV MODELS • Speech signal is identified during search process rather than explicitly • Comprises of – Hidden Markov Chain representing temporal variability Observable process representing spectral variability • Portrayed as stochastic pair (X,Y) • HMM is a Finite State Machine where a Probability Density Function p(x|s) is associated with each state s

  18. FUTURE RESEARCH • To extract and apply all levels and information from the speech signal conveying speaker identity • Acoustic – use spectral features conveying vocal tract information • Prosodic - use features derived from pitch, energy tracks to classify information • Phonetic – use phone sequences to characterize speaker specific pronunciations • Idiolect – use words to characterize user specific word patterns • Linguistic – use linguistic patterns to characterize speaker specific conversation style

  19. APPLICATIONS • Access Control- physical facilities, computer networks and websites • PC Login and Password Reset • Secured Transactions – remote banking and online credit card purchase authentication • Time Attendance - workplaces • Law Enforcement – forensics, parole

More Related