Speaker Recognition

Speaker Recognition S. Arun Nair, Vaibhav Singh, Dheeraj Mehra , Rohan Paul

Speaker Identification System Enrollment Phase Identification Phase

Methodology • Database Creation 30 speaker database Random text : 1 min samples Telephone Quality : 8bit samples at 8K rate • Pre-Processing Noise Removal : Wavelet Transform Silence Removal (Envelop Detection) Framing • Feature Extraction Mel-Cepstral coefficients Singular Value Decomposition (Dimensionality Reduction) • Learning Problem Gaussian Mixture Modeling, EM Bayesian Classification

Feature Extraction Hamming Window Function

Hamming Window Hamming Window Function

Application of Hamming Window

Mel- Frequency Cepstrum Coefficients Cepstrum (frame) = IDFT(log (|DFT(frame)|))

Feature Extraction: Mel Filters

Singular Value Decomposition Plot of Sigma Values Number of dimensions were selected as 13

Gaussian Mixture Modeling • Linear Combination of Gaussians • Speaker dependent vocal tract configurations • Vocal Classes – vowels - nasals - fricatives • Modeling noise • Smooth approximations to arbitrarily shaped distributions

Gaussian Mixture Modeling

Maximum Likelihood Parameter Estimation Goal: Find model parameters which best match the distribution of the training feature vector Training Set Given present parameters, the likelihood of obtaining this set Iterate to improve the estimate

Recognition Phase Assume that each class is equally likely log likelihood

Demonstration

Multiple Speaker Recognition • Run K-means on the test data • Choose B-best samples from each domain • Calculate posterior probabilities and discover the class Problem: We did not get separate clusters Have to weigh the distance with the variance Basic Assumption Speakers are separable in a higher dimensional space

Multiple Speaker Recognition Approach II: Gaussian Mixture Modeling Number of Gaussians equal to the number of speakers in the input

Multiple Speaker Recognition Approach II: Gaussian Mixture Modeling Recognition of speakers • KL-Divergence • Distance between Means • Distance between Incorporating variances

Multiple Speaker Recognition 6 Speaker Database

Cluster Sizes Input: Ankit, Advait Two sizeable clusters Input: Nilay, Priyanka One Cluster Dominates

Open Issues • Standardization of mics - signal threshold • Clustering – Seperability -Overgrowing - Three gaussians (clusters) for two speakers • Appropriate distance metric for recognition phase • Intuition about nearness of samples is difficult

Work Distribution Phase I • Pre-processing Vaibhav, Rohan Feature extraction SVD • Database construction Dheeraj, Arun, Rohan GMM, EM Phase II • K- means Dheeraj, Arun Clustering using GMM • KL-divergence Vaibhav, Arun, Rohan Other techniques Experimentation • Front End Dheeraj, Vaibhav

Thank You

Speaker Recognition