330 likes | 664 Views
Deep Learning. Bing-Chen Tsai 1/21. outline. Neural networks Graphical model Belief nets Boltzmann machine DBN Reference. Neural networks. Supervised learning The training data consists of input information with their corresponding output information. Unsupervised learning
E N D
Deep Learning Bing-Chen Tsai 1/21
outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference
Neural networks • Supervised learning • The training data consists of input information with their corresponding output information. • Unsupervised learning • The training data consists of input information withouttheir corresponding output information.
Neural networks P(x,y2) P(y1|x) P(y2|x) P(x,y1) • Generative model • Model the distribution of input as well as output ,P(x , y) • Discriminative model • Model the posterior probabilities ,P(y | x)
Neural networks x1 w1 w2 x2 y 1 b 1 if 0 otherwise • What is the neural? • Linear neurons • Binary threshold neurons • Sigmoid neurons • Stochastic binary neurons
Neural networks Back-propagation Step1: Randomly initial weight Determine the output vector Step2: Evaluating the gradient of an error function Step3: Adjusting weight, Repeat The step1,2,3 until error enough low • Two layer neural networks(Sigmoid neurons)
Neural networks • Back-propagation is not good for deep learning • It requires labeled training data. • Almost data is unlabeled. • The learning time is very slow in networks with multiple hidden layers. • It is very slow in networks with multi hidden layer. • It can get stuck in poor local optima. • For deep nets they are far from optimal. • Learn P(input) not P(output | input) • What kind of generative model should we learn?
outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference
Graphical model In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D. A graphical model is a probabilistic model for which graph denotes the conditional dependence structure between random variables probabilistic model
Graphical model A C B D A C B D Directed graphical model Undirected graphical model
outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference
Belief nets stochastic hidden causes • Stochastic binary neurons It is sigmoid belief nets visible A belief net is a directed acyclic graph composed of stochastic variables
Belief nets stochastic hidden causes visible • we would like to solve two problems • The inference problem: Infer the states of the unobserved variables. • The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data.
Belief nets stochastic hidden causes visible • It is easy to generate sample P(v | h) • It is hard to infer P(h | v) • Explaining away
Belief nets H1 and H2are independent, but they can become dependent when we observe an effect that they can both influence H1 H2 V • Explaining away
Belief nets • Some methods for learning deep belief nets • Monte Carlo methods • But its painfully slow for large, deep belief nets • Learning with samples from the wrong distribution • Use Restricted Boltzmann Machines
outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference
Boltzmann Machine hidden j i visible It is a Undirected graphical model The Energy of a joint configuration
Boltzmann Machine An example of how weights define a distribution h1 h2 +2 +1 v1 v2 -1
Boltzmann Machine Expected value of product of states at thermal equilibrium when v is clamped on the visible units Expected value of product of states at thermal equilibrium with no clamping Derivative of log probability of one training vector, v under the model. A very surprising fact
Boltzmann Machines visible • Restricted Boltzmann Machine • We restrict the connectivity to make learning easier. • Only one layer of hidden units. • We will deal with more layers later • No connections between hidden units • Making the updates more parallel
Boltzmann Machines j j j j i i i i t = 0 t = 1 t = 2 t = infinity the Boltzmann machine learning algorithm for an RBM
Boltzmann Machines j j This is not following the gradient of the log likelihood. But it works well. i i t = 0 t = 1 reconstruction data Contrastive divergence: A very surprising short-cut
outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference
DBN stochastic hidden causes visible • It is easy to generate sample P(v | h) • It is hard to infer P(h | v) • Explaining away • Use RBM to initial weight can get good optimal
DBN Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets! Combining two RBMs to make a DBN
etc. DBN h2 v2 h1 v1 h0 v0 • Why we can use RBM to initial belief nets weights? • An infinite sigmoid belief net that is equivalent to an RBM • Inference in a directed net with replicated weights • Inference is trivial. We just multiply v0 by W transpose. • The model above h0 implements a complementary prior. • Multiplying v0 by W transpose gives the productof the likelihood term and the prior term.
DBN X1 X2 X3 X4 • Complementary prior • A Markov chain is a sequence of variables X1;X2; : : : with the Markov property • A Markov chain is stationary if the transition probabilities do not depend on time is called the transition matrix. • If a Markov chain is ergodicit has a unique equilibrium distribution
DBN X1 X2 X3 X4 X1 X2 X3 X4 Most Markov chains used in practice satisfy detailed balance e.g. Gibbs, Metropolis-Hastings, slice sampling. . . Such Markov chains are reversible
DBN Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets! Combining two RBMs to make a DBN
Reference Deep Belief Nets,2007 NIPStutorial , G . Hinton https://class.coursera.org/neuralnets-2012-001/class/index Machine learning 上課講義 http://en.wikipedia.org/wiki/Graphical_model