240 likes | 450 Views
Hierarchical Mixture of Experts. Presented by Qi An Machine learning reading group Duke University 07/15/2005. Outline. Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions. Background. The idea of mixture of experts
E N D
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005
Outline • Background • Hierarchical tree structure • Gating networks • Expert networks • E-M algorithm • Experimental results • Conclusions
Background • The idea of mixture of experts • First presented by Jacobs and Hintons in 1988 • Hierarchical mixture of experts • Proposed by Jordan and Jacobs in 1994 • Difference from previous mixture model • Mixing weights depends on both the input and the output
One-layer structure μ Ellipsoidal Gating function g1 g2 g3 Gating Network x μ1 μ2 μ3 Expert Network Expert Network Expert Network x x x
Hierarchical tree structure Linear Gating function
Expert network • At the leaves of trees for each expert: linear predictor output of the expert link function For example: logistic function for binary classification
Gating network • At the nonterminal of the tree top layer: other layer:
Output • At the non-leaves nodes top node: other nodes:
Probability model • For each expert, assume the true output y is chosen from a distribution P with mean μij • Therefore, the total probability of generating y from x is given by
Posterior probabilities • Since the gij and gi are computed based only on the input x, we refer them as prior probabilities. • We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes’ rule
E-M algorithm • Introduce auxiliary variables zij which have an interpretation as the labels that corresponds to the experts. • The probability model can be simplified with the knowledge of auxiliary variables
E-M algorithm • Complete-data likelihood: • The E-step
E-M algorithm • The M-step
IRLS • Iteratively reweighted least squares alg. • An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model • A special case for Fisher scoring method
Algorithm E-step M-step
Online algorithm • This algorithm can be used for online regression • For Expert network: where Rij is the inverse covariance matrix for EN(i,j)
Online algorithm • For Gating network: where Si is the inverse covariance matrix and where Sijis the inverse covariance matrix
Results • Simulated data of a four-joint robot arm moving in three-dimensional space
Conclusions • Introduce a tree-structured architecture for supervised learning • Much faster than traditional back-propagation algorithm • Can be used for on-line learning
Thank you Questions?