220 likes | 253 Views
Speaker Recognition. S. Arun Nair, Vaibhav Singh, Dheeraj Mehra , Rohan Paul. Speaker Identification System. Enrollment Phase. Identification Phase. Methodology. Database Creation 30 speaker database Random text : 1 min samples Telephone Quality : 8bit samples at 8K rate
E N D
Speaker Recognition S. Arun Nair, Vaibhav Singh, Dheeraj Mehra , Rohan Paul
Speaker Identification System Enrollment Phase Identification Phase
Methodology • Database Creation 30 speaker database Random text : 1 min samples Telephone Quality : 8bit samples at 8K rate • Pre-Processing Noise Removal : Wavelet Transform Silence Removal (Envelop Detection) Framing • Feature Extraction Mel-Cepstral coefficients Singular Value Decomposition (Dimensionality Reduction) • Learning Problem Gaussian Mixture Modeling, EM Bayesian Classification
Feature Extraction Hamming Window Function
Hamming Window Hamming Window Function
Mel- Frequency Cepstrum Coefficients Cepstrum (frame) = IDFT(log (|DFT(frame)|))
Singular Value Decomposition Plot of Sigma Values Number of dimensions were selected as 13
Gaussian Mixture Modeling • Linear Combination of Gaussians • Speaker dependent vocal tract configurations • Vocal Classes – vowels - nasals - fricatives • Modeling noise • Smooth approximations to arbitrarily shaped distributions
Maximum Likelihood Parameter Estimation Goal: Find model parameters which best match the distribution of the training feature vector Training Set Given present parameters, the likelihood of obtaining this set Iterate to improve the estimate
Recognition Phase Assume that each class is equally likely log likelihood
Multiple Speaker Recognition • Run K-means on the test data • Choose B-best samples from each domain • Calculate posterior probabilities and discover the class Problem: We did not get separate clusters Have to weigh the distance with the variance Basic Assumption Speakers are separable in a higher dimensional space
Multiple Speaker Recognition Approach II: Gaussian Mixture Modeling Number of Gaussians equal to the number of speakers in the input
Multiple Speaker Recognition Approach II: Gaussian Mixture Modeling Recognition of speakers • KL-Divergence • Distance between Means • Distance between Incorporating variances
Multiple Speaker Recognition 6 Speaker Database
Cluster Sizes Input: Ankit, Advait Two sizeable clusters Input: Nilay, Priyanka One Cluster Dominates
Open Issues • Standardization of mics - signal threshold • Clustering – Seperability -Overgrowing - Three gaussians (clusters) for two speakers • Appropriate distance metric for recognition phase • Intuition about nearness of samples is difficult
Work Distribution Phase I • Pre-processing Vaibhav, Rohan Feature extraction SVD • Database construction Dheeraj, Arun, Rohan GMM, EM Phase II • K- means Dheeraj, Arun Clustering using GMM • KL-divergence Vaibhav, Arun, Rohan Other techniques Experimentation • Front End Dheeraj, Vaibhav