Language and Speaker Identification using Gaussian Mixture Model

Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002

Outline • Motivation of having Speaker Identification and Language Identification System • Reasons of using Gaussian Mixture Model • Experiment • Results • Conclusion

Motivation of having Language Identification • To detect multi-lingual speech for further speech recognition system such as IBM ViaVoice and Microsoft SAPI • Demo

Motivation of having Speaker Identification • Speaker Identification Systems using speech recognition technique: • Security Lock System • Speaker Tracking System to locating the specified person in the video clips (Demo) • Voice Mail System

Reasons of using Gaussian Mixture Model • Gaussian Mixture Model is a type of density model which includes a number of component functions • These component functions are combined to provide a multi-model density

Reasons of using Gaussian Mixture Model (cont’) • Because speakers and languages have their own statistical density

Reasons of using Gaussian Mixture Model (cont’) • Expectation-Maximization (EM) is a well established maximum likelihood algorithm for fitting a mixture model to a set of training data. • By EM, a set of GMMs is trained and used for recognition.

Steps of the Identification Systems • Pre-processing Stage: • 1. Collect the specific speakers or languages sound clips • 2. Train the GMM using the collected sound clips • Testing Stage: • 1. Calculate the scores of each GMM • 2. Select the maxium score as the result

System Diagram

Probability Score Calculation • After EM training, the GMM is include means and variances for each mixture component functions. • Using the equation shown below, the score is calculated:

Experiment Setup • In Common: • Testing and Training Data • Score: News Video Clips • Audio Data: 22kHz • Feature: 24 MFCC • No. of Mixture: 256 • Duration: >6 seconds

Language Identification Experiment Setup • For Language Identification System: • Three Languages are tested: 1. English, 2. Cantonese and 3. Putonghua • 14 sound clips are tested for each case, i.e. 42 sound clips in total

Result of Language Identification • Overall it can get 36 out of 42 correct. (~85% accurate) • Thus, the language is identified and the appropriated speech recognition engine is selected for “speech to text” process.

Speaker Identification Experiment Setup • For Speaker Identification System: • Testing speakers: five males and five females, totally 50 sound clips • Can be close-set or open-set experiment • For close-set experiment, we select the maximum score achieved as our result. • For open-set experiment, the result score is normalized with the score of silence model

Result of Speaker Identification • Close-set • i.e. select the best speaker within the group • correct: 49 out of 50 (~98%) • Open-set • correct: 45 out of 50 (~90%) • false alarm (accepted wrong speaker): 1.5% • false reject (rejected correct speaker): 6%

Conclusion • GMM is suitable for statistical analysis • Language Identification ==> 85% • Speaker Identification ==> ~90%

Language and Speaker Identification using Gaussian Mixture Model

Language and Speaker Identification using Gaussian Mixture Model

Presentation Transcript

Gaussian Mixture-Sound Field Landmark Model for Robot Localization

Speaker Identification and Verification

Improved Adaptive Gaussian Mixture Model for Background

Text Independent Speaker Identification Using Gaussian Mixture Model

Gaussian Mixture Models and Expectation Maximization

A NONLINEAR MIXTURE AUTOREGRESSIVE MODEL FOR SPEAKER VERIFICATION

Gaussian Mixture Models and Acoustic Modeling

A NONLINEAR MIXTURE AUTOREGRESSIVE MODEL FOR SPEAKER VERIFICATION

Speaker Identification using Gaussian Mixture Model

Gaussian Mixture Model

Gaussian Mixture Models

Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor

Speaker Identification Using Wavelet Analysis and ANN

Univariate Gaussian Mixture Model

Gaussian Mixture-Sound Field Landmark Model for Robot Localization

Gaussian Mixture Models and Expectation Maximization

A Quantum-Statistical-Mechanical Extension of Gaussian Mixture Model

Gaussian Mixture Models

Speaker Identification and Verification

Speaker Identification of Customer and Agent using AWS