Deep Learning

Deep Learning Bing-Chen Tsai 1/21

outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

Neural networks • Supervised learning • The training data consists of input information with their corresponding output information. • Unsupervised learning • The training data consists of input information withouttheir corresponding output information.

Neural networks P(x,y2) P(y1|x) P(y2|x) P(x,y1) • Generative model • Model the distribution of input as well as output ,P(x , y) • Discriminative model • Model the posterior probabilities ,P(y | x)

Neural networks x1 w1 w2 x2 y 1 b 1 if 0 otherwise • What is the neural? • Linear neurons • Binary threshold neurons • Sigmoid neurons • Stochastic binary neurons

Neural networks Back-propagation Step1: Randomly initial weight Determine the output vector Step2: Evaluating the gradient of an error function Step3: Adjusting weight, Repeat The step1,2,3 until error enough low • Two layer neural networks(Sigmoid neurons)

Neural networks • Back-propagation is not good for deep learning • It requires labeled training data. • Almost data is unlabeled. • The learning time is very slow in networks with multiple hidden layers. • It is very slow in networks with multi hidden layer. • It can get stuck in poor local optima. • For deep nets they are far from optimal. • Learn P(input) not P(output | input) • What kind of generative model should we learn?

Graphical model In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D. A graphical model is a probabilistic model for which graph denotes the conditional dependence structure between random variables probabilistic model

Graphical model A C B D A C B D Directed graphical model Undirected graphical model

Belief nets stochastic hidden causes • Stochastic binary neurons It is sigmoid belief nets visible A belief net is a directed acyclic graph composed of stochastic variables

Belief nets stochastic hidden causes visible • we would like to solve two problems • The inference problem: Infer the states of the unobserved variables. • The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data.

Belief nets stochastic hidden causes visible • It is easy to generate sample P(v | h) • It is hard to infer P(h | v) • Explaining away

Belief nets H1 and H2are independent, but they can become dependent when we observe an effect that they can both influence H1 H2 V • Explaining away

Belief nets • Some methods for learning deep belief nets • Monte Carlo methods • But its painfully slow for large, deep belief nets • Learning with samples from the wrong distribution • Use Restricted Boltzmann Machines

Boltzmann Machine hidden j i visible It is a Undirected graphical model The Energy of a joint configuration

Boltzmann Machine An example of how weights define a distribution h1 h2 +2 +1 v1 v2 -1

Boltzmann Machine Expected value of product of states at thermal equilibrium when v is clamped on the visible units Expected value of product of states at thermal equilibrium with no clamping Derivative of log probability of one training vector, v under the model. A very surprising fact

Boltzmann Machines visible • Restricted Boltzmann Machine • We restrict the connectivity to make learning easier. • Only one layer of hidden units. • We will deal with more layers later • No connections between hidden units • Making the updates more parallel

Boltzmann Machines j j j j i i i i t = 0 t = 1 t = 2 t = infinity the Boltzmann machine learning algorithm for an RBM

Boltzmann Machines j j This is not following the gradient of the log likelihood. But it works well. i i t = 0 t = 1 reconstruction data Contrastive divergence: A very surprising short-cut

DBN stochastic hidden causes visible • It is easy to generate sample P(v | h) • It is hard to infer P(h | v) • Explaining away • Use RBM to initial weight can get good optimal

DBN Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets! Combining two RBMs to make a DBN

etc. DBN h2 v2 h1 v1 h0 v0 • Why we can use RBM to initial belief nets weights? • An infinite sigmoid belief net that is equivalent to an RBM • Inference in a directed net with replicated weights • Inference is trivial. We just multiply v0 by W transpose. • The model above h0 implements a complementary prior. • Multiplying v0 by W transpose gives the productof the likelihood term and the prior term.

DBN X1 X2 X3 X4 • Complementary prior • A Markov chain is a sequence of variables X1;X2; : : : with the Markov property • A Markov chain is stationary if the transition probabilities do not depend on time is called the transition matrix. • If a Markov chain is ergodicit has a unique equilibrium distribution

DBN X1 X2 X3 X4 X1 X2 X3 X4 Most Markov chains used in practice satisfy detailed balance e.g. Gibbs, Metropolis-Hastings, slice sampling. . . Such Markov chains are reversible

DBN

DBN Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets! Combining two RBMs to make a DBN

Reference Deep Belief Nets,2007 NIPStutorial , G . Hinton https://class.coursera.org/neuralnets-2012-001/class/index Machine learning 上課講義 http://en.wikipedia.org/wiki/Graphical_model

Deep Learning

Deep Learning

Presentation Transcript

Deep Learning

Deep Learning

Deep Learning!!!!

Deep learning

Deep Learning

Deep Learning

Deep Learning

Deep Learning

Deep Learning

Active Learning = Deep Learning

Deep learning

Deep Learning Tutorial

Deep Learning

Deep Learning Market

Deep Learning

Deep Learning Market

Discriminate between deep learning and deep q learning