Modular and hierarchical learning systems

Modular and hierarchical learning systems Michael I. Jordan and Robert A. Jacobs Presented by Danke Xie Cognitive Science, UCSD CSE 291s Lawrence Saul 4/26/2007

Outline • Decision Tree • Mixture of Experts Architecture • The Mixture of Experts Model • Learning algorithm • Hierarchical Mixture of Experts architecture • Demo

Introduction • Why modular and hierarchical systems? • Divide into less complex problems • Ex: supervised learning y x f(x) g(x)

Decision Tree • Classification problem • Decision Tree x y  {0,1} X5 > 3 y n X2 < 4 ? X6 > 7 ? y n n y 0 1 0 1

Decision Tree • What’s missing • Living in10,000-dimension space? • Learning is greedy  optimizing likelihood • Soft decision / assignment of task to experts 2 1 4 3 Example: 4 classes in high-dimensional space

Mixture of experts (ME) architecture • Gating network Generating weights • Expert network Interpreted probabilistically as

Generating data • Data set • Given x, randomly choose labels i with probability where is the parameter of the data generating model • Generate y according to • Learn to estimate and from data

A Gradient-based Learning algorithm • Maximize log-likelihood • Optimize with respect to and where =

Analogy of Mixture of Gaussians • The learning algorithm can also be derived using EM algorithm • EM algorithm can be used to find maximum likelihood estimates of parameters, where the likelihood cannot be computed without knowing how to assign data points to clusters / experts • The probabilities of the assignments can be seen as latent variables. This is similar to all of Mixture of Gaussians and (Hierarchical) Mixture of Experts.

EM algorithm • Mixture of Gaussians (unsupervised)

EM algorithm • Mixture of Experts (supervised)

Hierarchical Mixture of Experts

Training set

Classification of test set

Thank you

A Gradient-based Learning algorithm • Maximize log-likelihood • We derive learning rules for the special case • Expert networks and gating networks are linear • Simple probabilistic density for Expert Networks

A Gradient-based Learning algorithm • Take derivative of l with respect to өi

Learning rule for ME • Experts are linear models

Learning rule for HME • LMS-like learning algorithm

Modular and hierarchical learning systems

Modular and hierarchical learning systems

Presentation Transcript

Modular PLC systems

Hierarchical Reinforcement Learning

Modular Building Systems International (MBSI): Modular Build

4.4 Clock Arithmetic and Modular Systems

Introduction to Hierarchical Reinforcement Learning

Hierarchical Memory Systems

Using Hierarchical Clustering for Learning the Ontologies used in Recommendation Systems

Concurrent Hierarchical Reinforcement Learning

Intrinsically Motivated Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning

Modeling Visibility in Hierarchical Systems

Revisiting Hierarchical Quorum Systems

Learning Hierarchical Models of Scenes, Objects and Parts

Model Checking Hierarchical Probabilistic Systems

Hierarchical Design and Analysis of Reactive Systems

FSA, Hierarchical Memory Systems

4.4 Clock Arithmetic and Modular Systems

Hierarchical Reinforcement Learning

Modular Office Furniture Systems

Modular Pontoon Systems

Hierarchical Design and Analysis of Reactive Systems

Modular PLC systems