390 likes | 486 Views
Boltzman Machines. Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131. Document classification given binary vectors Nuclear power station – dont want positive examples!. Two ways a model can generate data:
E N D
Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131
Document classification given binary vectors Nuclear power station – dont want positive examples!
Two ways a model can generate data: Causal model: First generate latent variables (hidden units), then … Boltzman Machines: …
Learning in Boltzman Machines Lecture 12a
Modelling the input vectors There are no labels; we want to build a model of a set of input vectors.
Given that one needs to know about all the other weights, it is very surprising that there is a simple learning algorithm:
How often i and j are on together when v is clamped on visible units How often i and j are on together when v is NOT clamped
First term in the rule saysraise the weights in proportion to the product of activities that the units have (Hebbian learning). But if we only use this rule, the weights will all become positive and the whole system will blow up. So the second term in the rule says to decrease how often the units are on together when you are sampling from the model’s distribution. An alternate view is that the first term is like the storage term for a Hopfield net and the second term term for getting rid of the spurious minima. And this is the correct way of thinking about that (that tells you how much unlearning to do).
You expect the energy landscape to have many different minimum that are fairly separated and have about the same energy. • Model a set of images all of which has the same energy and unreasonable images with very high energy. • Sample how often to units are on together = measuring the correlation between two units • Repeat over all the data vectors
Restricted Boltzman Machines Lecture 12c
Much simplified architecture: No connection between hidden units • If visible units are given, equilibrium distribution of hidden units can be computed in one step – because hidden units are all independent from one another given the state of visible units • Proper Boltzman Machine learning alg. is still slow for a restricted Boltzman machine • In 1998, a short cut for Boltzman machines (Hinton) • approx. but works well in practice • caused resurgence in this area
Note that this does not depend on what other units are doing; so can be computed all in parallel.
Fantasy particles == global configurations After each weight update, you update the fantasy particles a little and that should bring them back to close to being in equilibrium. Algorithm works very well at building density models.
Example of Contrastive Divergence Lecture 12d
RBMs for Collaborative Filtering Lecture 12e