1 / 32

Deep Learning

Deep Learning. Bing-Chen Tsai 1/21. outline. Neural networks Graphical model Belief nets Boltzmann machine DBN Reference. Neural networks. Supervised learning The training data consists of input information with their corresponding output information. Unsupervised learning

zurina
Download Presentation

Deep Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep Learning Bing-Chen Tsai 1/21

  2. outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

  3. Neural networks • Supervised learning • The training data consists of input information with their corresponding output information. • Unsupervised learning • The training data consists of input information withouttheir corresponding output information.

  4. Neural networks P(x,y2) P(y1|x) P(y2|x) P(x,y1) • Generative model • Model the distribution of input as well as output ,P(x , y) • Discriminative model • Model the posterior probabilities ,P(y | x)

  5. Neural networks x1 w1 w2 x2 y 1 b 1 if 0 otherwise • What is the neural? • Linear neurons • Binary threshold neurons • Sigmoid neurons • Stochastic binary neurons

  6. Neural networks Back-propagation Step1: Randomly initial weight Determine the output vector Step2: Evaluating the gradient of an error function Step3: Adjusting weight, Repeat The step1,2,3 until error enough low • Two layer neural networks(Sigmoid neurons)

  7. Neural networks • Back-propagation is not good for deep learning • It requires labeled training data. • Almost data is unlabeled. • The learning time is very slow in networks with multiple hidden layers. • It is very slow in networks with multi hidden layer. • It can get stuck in poor local optima. • For deep nets they are far from optimal. • Learn P(input) not P(output | input) • What kind of generative model should we learn?

  8. outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

  9. Graphical model In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D. A graphical model is a probabilistic model for which graph denotes the conditional dependence structure between random variables probabilistic model

  10. Graphical model A C B D A C B D Directed graphical model Undirected graphical model

  11. outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

  12. Belief nets stochastic hidden causes • Stochastic binary neurons It is sigmoid belief nets visible A belief net is a directed acyclic graph composed of stochastic variables

  13. Belief nets stochastic hidden causes visible • we would like to solve two problems • The inference problem: Infer the states of the unobserved variables. • The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data.

  14. Belief nets stochastic hidden causes visible • It is easy to generate sample P(v | h) • It is hard to infer P(h | v) • Explaining away

  15. Belief nets H1 and H2are independent, but they can become dependent when we observe an effect that they can both influence H1 H2 V • Explaining away

  16. Belief nets • Some methods for learning deep belief nets • Monte Carlo methods • But its painfully slow for large, deep belief nets • Learning with samples from the wrong distribution • Use Restricted Boltzmann Machines

  17. outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

  18. Boltzmann Machine hidden j i visible It is a Undirected graphical model The Energy of a joint configuration

  19. Boltzmann Machine An example of how weights define a distribution h1 h2 +2 +1 v1 v2 -1

  20. Boltzmann Machine Expected value of product of states at thermal equilibrium when v is clamped on the visible units Expected value of product of states at thermal equilibrium with no clamping Derivative of log probability of one training vector, v under the model. A very surprising fact

  21. Boltzmann Machines visible • Restricted Boltzmann Machine • We restrict the connectivity to make learning easier. • Only one layer of hidden units. • We will deal with more layers later • No connections between hidden units • Making the updates more parallel

  22. Boltzmann Machines j j j j i i i i t = 0 t = 1 t = 2 t = infinity the Boltzmann machine learning algorithm for an RBM

  23. Boltzmann Machines j j This is not following the gradient of the log likelihood. But it works well. i i t = 0 t = 1 reconstruction data Contrastive divergence: A very surprising short-cut

  24. outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

  25. DBN stochastic hidden causes visible • It is easy to generate sample P(v | h) • It is hard to infer P(h | v) • Explaining away • Use RBM to initial weight can get good optimal

  26. DBN Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets! Combining two RBMs to make a DBN

  27. etc. DBN h2 v2 h1 v1 h0 v0 • Why we can use RBM to initial belief nets weights? • An infinite sigmoid belief net that is equivalent to an RBM • Inference in a directed net with replicated weights • Inference is trivial. We just multiply v0 by W transpose. • The model above h0 implements a complementary prior. • Multiplying v0 by W transpose gives the productof the likelihood term and the prior term.

  28. DBN X1 X2 X3 X4 • Complementary prior • A Markov chain is a sequence of variables X1;X2; : : : with the Markov property • A Markov chain is stationary if the transition probabilities do not depend on time is called the transition matrix. • If a Markov chain is ergodicit has a unique equilibrium distribution

  29. DBN X1 X2 X3 X4 X1 X2 X3 X4 Most Markov chains used in practice satisfy detailed balance e.g. Gibbs, Metropolis-Hastings, slice sampling. . . Such Markov chains are reversible

  30. DBN

  31. DBN Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets! Combining two RBMs to make a DBN

  32. Reference Deep Belief Nets,2007 NIPStutorial , G . Hinton https://class.coursera.org/neuralnets-2012-001/class/index Machine learning 上課講義 http://en.wikipedia.org/wiki/Graphical_model

More Related