70 likes | 151 Views
Optimal Adaptation for Statistical Classifiers. Xiao Li. Motivation. Problem A statistical classifier works well if the test set matches the data distribution of the train set It is difficult to get a large amount of matched training data A case study – vowel classification
E N D
Motivation • Problem • A statistical classifier works well if the test set matches the data distribution of the train set • It is difficult to get a large amount of matched training data • A case study – vowel classification • Target test set – pure vowel articulation for specific speakers • Available train set – conversational speech with a great number of speakers
Adaptation Methodology • Extract vowel segments from conversational speech to form a train set • Feature extraction and class labeling • Train speaker-independent models on this train set • Ask a speaker to articulate a few seconds of vowels for each class • Adapt the classifier on this small amount of speaker-dependent, pure vowel data
Two Classifiers • Gaussian mixture models (GMM) • Generative models • Training objective: maximum likelihood via EM • Neural Networks (NN) • Multilayer perceptrons • Training objective: • Least square error • Minimum relative entropy
MLLR for GMM Adaptation • Maximum Likelihood Linear Regression • Apply a linear transformation on the Gaussian mean • Same transformation for the mixture of Gaussians in the same class • Adaptation Objective • Find the transformation matrices that maximizes the likelihood via EM
NN Adaptation • Idea -- Fix the nonlinear mapping and update the last layer of linear classifier • Two alternative methods with different objectives • Minimum relative entropy • Optimization method – gradient descent • Optimal hyper-plane • Optimization method – support vector machine
Vowel Classification Experiments • Databases • Database A – speaker-independent conversational speech • Database B – sustained vowel recordings from 6 speakers, with different energy and pitch • Method • Train speaker-independent classifiers Database A s • Adapt classifiers on a small set of Database B, 300-500 samples per speaker • Test on the rest of Database B