440 likes | 586 Views
MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However
E N D
MSRC Summer School - 30/06/2009 Hybrids of generative anddiscriminative methods for machine learning Cambridge – UK
Motivation • Generative models • prior knowledge • handle missing data such as labels • Discriminative models • perform well at classification • However • no straightforward way to combine them
Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data
Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data
Generative methods • Answer: “what does a cat look like? and a dog?” => data and labels joint distribution x : data c : label : parameters
Generative methods • Objective function: G() = p() p(X, C|) G() = p() n p(xn, cn|) • 1 reusable model per class, can deal with incomplete data • Example: GMMs
Discriminative methods • Answer: “is it a cat or a dog?” => labels posterior distribution x : data c : label : parameters
Discriminative methods • The objective function is D() = p() p(C|X, ) D() = p() n p(cn|xn, ) • Focus on regions of ambiguity, make faster predictions • Example: neural networks, SVMs
Example of discriminative model SVMs / NNs
Generative versus discriminative No effect of the double mode on the decision boundary
Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data
Semi-supervised learning • Few labelled data / lots of unlabelled data • Discriminative methods overfit, generative models only help classify if they are “good” • Need to have the modelling power of generative models while performing at discriminating => hybrid models
p(xn, cn|) c p(xn, c|) Discriminative trainingBach et al, ICASSP 05 • Discriminative objective function: D() = p() n p(cn|xn, ) • Using a generative model: D() = p() n [ p(xn, cn|) / p(xn|) ] D() = p() n
Convex combinationBouchard et al, COMPSTAT 04 • Generative objective function: G() = p() n p(xn, cn|) • Discriminative objective function: D() = p() n p(cn|xn, ) • Convex combination: log L() = log D() + (1- ) log G() [0,1]
A principled hybrid model • - posterior distribution of the labels ’- marginal distribution of the data and ’ communicate through a prior • Hybrid objective function: L(,’) = p(,’) n p(cn|xn, ) n p(xn|’)
A principled hybrid model • = ’ => p(, ’) = p() (-’) L(,’) = p() (-’) n p(cn|xn, ) n p(xn|’) L() = G() generative case • ’ => p(, ’) = p() p(’) L(,’) = [ p() n p(cn|xn, ) ] [ p(’) n p(xn|’) ] L(,’) = D() f(’) discriminative case
A principled hybrid model • Anything in between – hybrid case • Choice of prior: p(, ’) = p() N(’|, (a)) a 0 => 0 => = ’ a 1 => => ’
Why principled? • Consistent with the likelihood of graphical models => one way to train a system • Everything can now be modelled => potential to be Bayesian • Potential to learn a
Learning • EM / Laplace approximation / MCMC either intractable or too slow • Conjugate gradients flexible, easy to check BUT sensitive to initialisation, slow • Variational inference
Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data
Toy example • 2 elongated distributions • Only spherical gaussians allowed => wrong model • 2 labelled points per class => strong risk of overfitting
Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data
A real example • Images are a special case, as they contain several features each • 2 levels of supervision: at the image level, and at the feature level • Image label only => weakly labelled • Image label + segmentation => fully labelled
The underlying generative model multinomial multinomial gaussian
The underlying generative model weakly – fully labelled
Experimental set-up • 3 classes: bikes, cows, sheep • : 1 Gaussian per class => poor generative model • 75 training images for each category
Results • When increasing the proportion of fully labelled data, the trend is: generative hybrid discriminative • Weakly labelled data has little influence on the trend • With sufficient fully labelled data, HF tends to perform better than CC
Experimental set-up • 3 classes: lions, tigers and cheetahs • : 1 Gaussian per class => poor generative model • 75 training images for each category
Results • Hybrid models consistently perform better • However, generative and discriminative models haven’t reached saturation • No clear difference between HF and CC
Conclusion • Principled hybrid framework • Possibility to learn the best trade-off • Helps for ambiguous datasets when labelled data is scarce • Problem of optimisation
Future avenues • Bayesian version (posterior distribution of ) under study • Replace by a diagonal matrix to allow flexibility => need for the Bayesian version • Choice of priors