150 likes | 263 Views
Digital Signal Processing ( Term Project ). by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING. Speaker Recognition System. Introduction What is Speaker Recognition?. A process that automatically recognizes, who is speaking on the basis of individual
E N D
Digital Signal Processing (Term Project) • by • Habib ur Rehman • Abdul Basit • CENTER FOR ADVANCED STUDIES IN ENGINERING Speaker Recognition System
IntroductionWhat is Speaker Recognition? A process that automatically recognizes, who is speaking on the basis of individual information included in the speech waves Words Speaker Recognition “Who are you?” Speech Signal
Speaker Recognition SystemGoals • The goal of this project is to build a • simple, yet complete and representative • ‘speaker recognition system ‘. • The system should be able to identify • speakers based on the different voice • characteristics of each of the known • speakers. • This identification should be accomplished • regardless of the sentence spoken (Text • independent).
Basic Structure of Speaker Recognition SystemSpeaker Identification /Speaker Verification
Principle of speaker Recognition systemIntroduction • All speaker Recognition systems have to serve two distinguished phases. • Enrollment or Training phase • Testing phase In training phase each registered speaker has to provide samples of their speech so that the system can build a reference model for thatspeaker In testing the input speech is matched with stored reference model(s) and recognition decision is made
Basic structure of speaker RecognitionsystemFeature Extraction / Feature Matching
MFCC ProcessorBlock diagram • Windowing the frames minimize the signal discontinuities at the beg & end of each frame • Windowing minimize spectral distortion to taper • the signal to zero at beg. & end of each frame. • y[n]=x[n]w[n] • Typically Hamming window is used which has the • FFT • Cosine Transform (Mel Cepstrum) • Continuous signal is blocked into frames of N samples. • 1st fram consists of N samples • 2nd frame begins M samples after the 1st & overlap it • N-M samples and so on • Typically N=256(radix 2 FFT), M=100 Frame Blocking Windowing Fourier Transform spectrum Mel cepstrum Mel Mel freq. Wrapping Cepstrum spectrum
Speech ProductionA Convolution Process • Speech can be modeled as • convolution between • Glottal exitation source g[n] • & • A vocal tract impulse response • v[n] • y[n] =g[n]*v[n]
CepstrumA transformation • It is believed that vocal tract characterstics • are important to speech & speaker • recognition. • We would like to separate out this filtered • response. • Cepstrum does this & convertsmultiplication • (convolution in time) • Y( )=g( )v( ) • to sum • Y~( )=log[g( )]+log[v( )]
Mel filter banklinear spacing below 1kHz, log. Scale above 1kHz • Triangular shaped filters • emphasize center frequency and • span to the next center frequency. • Thus for each tone with actual freq. • in Hz. • a subjective pitch is measured on • Mel scale • mel(f)= 2595*log10(1+f / 700) • (Fant’s expresion)
Part 2 Speaker Verification
Speaker VerificationFeature Matching • Clasification of objects of interest into patterns or acoustic vectors extracted from input speech • Since the classification is applied on extracted features, the process can also be reffered to as feature matching • Various feature maching techniques DTW,HMM & VQ etc • Vector Quantization is a process of mapping vectors from a large vector space to a small number of regions in space . • Each region is called a cluster and is represented by its center called a ‘codeword’. • The collection of all the ‘codewords’ is called a codebook.