160 likes | 363 Views
Regularized Adaptation for Discriminative Classifiers. Xiao Li and Jeff Bilmes University of Washington, Seattle. This work …. Investigates links between a number discriminative classifiers Presents a general adaptation strategy – “regularized adaptation”. Adaptation for generative models.
E N D
Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle
This work … • Investigates links between a number discriminative classifiers • Presents a general adaptation strategy – “regularized adaptation” Xiao Li and Jeff Bilmes University of Washington, Seattle
Adaptation for generative models • Target sample distribution is different from that of training • Has long been studied in speech recognition for generative models • Maximum likelihood linear regression • Maximum a posteriori • Eigenvoice Xiao Li and Jeff Bilmes University of Washington, Seattle
Discriminative classifiers • Discriminative classifiers • Directly model the conditional relation of a label given features • Often yield more robust classification performance than generative models • Popularly used: • Support vector machines (SVM) • Multi-layer perceptrons (MLP) • Conditional maximum entropy models Xiao Li and Jeff Bilmes University of Washington, Seattle
Existing Discriminative Adaptation Strategies • SVMs: • Combine SVs with selected adaptation data (Matic 93) • Combine selected SVs with adaptation data (Li 05) • MLPs: • Linear input network (Neto 95, Abrash 97) • Retrain both layers from unadapted model (Neto 95) • Retrain part of last layer (Stadermann 05) • Retrain first layer • Conditional MaxEnt: • Gaussian prior (Chelba 04) Xiao Li and Jeff Bilmes University of Washington, Seattle
Regularizer Empirical risk SVMs and MLPs – Links • Binary classification (xt yt) • Discriminant function • Accuracy-regularization objective Nonlinear transform SVM: maximum margin MLP: weight decay MaxEnt: Gaussian smoothing Xiao Li and Jeff Bilmes University of Washington, Seattle
SVMs and MLPs – Differences Xiao Li and Jeff Bilmes University of Washington, Seattle
Adaptation • Adaptation data • May be in a small amount • May be unbalanced in classes • We intend to utilize • Unadapted model w0 • Adaptation data (xt, yt), t=1:T Xiao Li and Jeff Bilmes University of Washington, Seattle
Regularized Adaptation • Generalized objective w.r.t. adapt data • Relations with existing SVM adapt. algs. • hinge loss (retrain SVM) • hard boosting (Matic 93) Margin error Xiao Li and Jeff Bilmes University of Washington, Seattle
d0 Decision function using adapt data only New Regularized Adaptation for SVMs • Soft boosting – combine margin errors adapt data adapt data Xiao Li and Jeff Bilmes University of Washington, Seattle
Regularized Adaptation for SVMs (Cont.) • Theorem, for linear SVMs In practice, we use α=1 Xiao Li and Jeff Bilmes University of Washington, Seattle
Reg. Adaptation for MLPs • Extend this to a two-layer MLP • Relations with existing MLP adapt. algs. • Linear input network: μ∞ • Retrain from SI model: μ=0, ν=0 • Retrain last layer: μ=0, ν∞ • Retrain first layer: μ∞, ν=0 • Regularized:choose μ,ν on a dev set • This also relates to MaxEnt adaptation using Gaussian priors Xiao Li and Jeff Bilmes University of Washington, Seattle
Experiments – Vowel Classification • Application: the Vocal Joystick • A voice based computer interface for individuals with motor impairments • Vowel quality angle • Data set (extended) • Train/dev/eval: 21/4/10 speakers • 6-fold cross-validation • MLP configuration • 7 frames of MFCC + deltas • 50 hidden nodes • Frame-level classification error rate Xiao Li and Jeff Bilmes University of Washington, Seattle
Varying Adaptation Time Xiao Li and Jeff Bilmes University of Washington, Seattle
Varying # vowels in adaptation (3s each) SI: 32% Xiao Li and Jeff Bilmes University of Washington, Seattle
Summary • Drew links between discriminative classifiers • Presented a general notion of “regularized adaptation” for discriminative classifiers • Natural adaptation strategies for SVMs and MLPs justified using a maximum margin argument • A unified view of different adaptation algorithms • MLP experiments show superior performance especially for class-skewed data Xiao Li and Jeff Bilmes University of Washington, Seattle