400 likes | 413 Views
Discriminative Training and Machine Learning Approaches. Chih-Pin Liao. Machine Learning Lab, Dept. of CSIE, NCKU. Discriminative Training. Our Concerns. Feature extraction and HMM modeling should be jointly performed. Common objective function should be considered.
E N D
Discriminative Trainingand Machine Learning Approaches Chih-Pin Liao Machine Learning Lab, Dept. of CSIE, NCKU
Our Concerns • Feature extraction and HMM modeling should be jointly performed. • Common objective function should be considered. • To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory. • Model parameters should be calculated rapidly without applying descent algorithm.
Minimum Classification Error (MCE) • MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications. • Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors. • Gradientdescent algorithm was used to estimate HMM parameters.
MCE Training Procedure • Procedure of training discriminative models using observations X • Discriminant function • Anti-discriminant function • Misclassification measure
Expected Loss • Loss function is calculated by mapping into a range between zero to one through a sigmoid function. • Minimize the expected loss or classification error to find discriminative model.
Likelihood Ratio Test • New training criterion was derived from hypothesis test theory. • We are testing null hypothesis against alternative hypothesis. • Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma • Higher likelihood ratio imply stronger confidence towards accepting null hypothesis.
Hypotheses in HMM Training • Null and alternative hypotheses : ObservationsX are from target HMM state j : Observation X are not from target HMM state j • We develop discriminative HMM parameters for target state against non-target states. • Problem turns out to verify the goodness of data alignment to the corresponding HMM states.
Maximum Confidence HMM • MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure where parameter set consists of HMM parameters and transformation matrix
Hybrid Parameter Estimation • Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation • E-step
MC Classification Rule • Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y
Summary • A new maximum confidence HMM framework was proposed. • Hypothesis test principle was used for building training criterion. • Discriminative feature extraction and HMM modeling were performed under the same criterion. • “Maximum Confidence Hidden Markov Modeling for Face Recognition”Chien, Jen-Tzung; Liao, Chih-Pin;Pattern Analysis and Machine Intelligence, IEEE Transactions onVolume 30, Issue 4, April 2008 Page(s):606 – 616
Introduction • Conditional Random Fields (CRF) • relax the normal conditional independence assumption of the likelihood model • enforce the homogeneity of labeling variables conditioned on the observation • Due to the weak assumptions of CRF model and its discriminative nature • allows arbitrary relationship among data • may require less resources to train its parameters
Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs) • language and text processing problem • Object recognition problems • Image and video segmentation • tracking problem in video sequences
Two Classes of Models • Generative model (HMM) - model the distribution of states • Direct model (MEMM and CRF) - model the posterior probability directly MEMM CRF
Comparisons of Two Kinds of Model • Generative model – HMM • Use Bayesian rule approximation • Assume that observations are independent • Multiple overlapping features are not modeled • Model is estimated through recursive Viterbi algorithm
Direct model - MEMM and CRF • Direct modeling of posterior probability • Dependencies of observations are flexibly modeled • Model is estimated through recursive Viterbi algorithm
Hidden Markov Model & Maximum Entropy Markov Model
HMM for Human Motion Recognition • HMM is defined by • Transition probability • Observation probability
Maximum Entropy Markov Model • MEMM is defined by • is used to replace transition and observation probability in HMM model
Maximum Entropy Criterion • Definition of feature functions where • Constrained optimization problem where empirical expectation model expectation
Solution of MEMM • Lagrange multipliers are used for constrained optimization where are the model parameters • Solution is obtained by
GIS Algorithm • Optimize the Maxmimum Mutual Information Criterion (MMI) • Step1: Calculate the empirical expectation • Step2: Start from an initial value • Step3: Calculate the model expectation • Step4: Update model parameters • Repeat step 3 and 4 until convergence
Conditional Random Field • Definition Let be a graph such that . When conditioned on , and obeyed the Markov property Then, is a conditional random field
CRF Model Parameters • The undirected graphical structure can be used to factorize into a normalized product of potential functions • Consider the graph as a linear-chain structure • Model parameter set • Feature function set
CRF Parameter Estimation • We can rewrite and maximize the posterior probability where and • Logposterior probability is given by
Parameter Updating by GIS Algorithm • Differentiating the log posterior probability with respect to parameter • Setting this derivative to zero yields the constraint in maximum entropy model • This estimation has no closed-form solution. We can use GIS algorithm.
Summary and Future works • We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied. • In the future, the variational inference algorithm will be developed for improving calculation of conditional probability. • The posterior probability can be calculated directly by a approximating approach. • “Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung;ICASSP 2008. IEEE International Conference on March 31 2008-April 4 2008 Page(s):1969 - 1972
Thanks for your attention and Discussion