Boltzman Machines

Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131

Document classification given binary vectors Nuclear power station – dont want positive examples!

Two ways a model can generate data: Causal model: First generate latent variables (hidden units), then … Boltzman Machines: …

What to do when the network is big

What is Needed for Learning:

Learning in Boltzman Machines Lecture 12a

Modelling the input vectors There are no labels; we want to build a model of a set of input vectors.

Given that one needs to know about all the other weights, it is very surprising that there is a simple learning algorithm:

How often i and j are on together when v is clamped on visible units How often i and j are on together when v is NOT clamped

First term in the rule saysraise the weights in proportion to the product of activities that the units have (Hebbian learning). But if we only use this rule, the weights will all become positive and the whole system will blow up. So the second term in the rule says to decrease how often the units are on together when you are sampling from the model’s distribution. An alternate view is that the first term is like the storage term for a Hopfield net and the second term term for getting rid of the spurious minima. And this is the correct way of thinking about that (that tells you how much unlearning to do).

Unlearning to get rid of the spurious minima

You expect the energy landscape to have many different minimum that are fairly separated and have about the same energy. • Model a set of images all of which has the same energy and unreasonable images with very high energy. • Sample how often to units are on together = measuring the correlation between two units • Repeat over all the data vectors

Restricted Boltzman Machines Lecture 12c

Much simplified architecture: No connection between hidden units • If visible units are given, equilibrium distribution of hidden units can be computed in one step – because hidden units are all independent from one another given the state of visible units • Proper Boltzman Machine learning alg. is still slow for a restricted Boltzman machine • In 1998, a short cut for Boltzman machines (Hinton) • approx. but works well in practice • caused resurgence in this area

Note that this does not depend on what other units are doing; so can be computed all in parallel.

Fantasy particles == global configurations After each weight update, you update the fantasy particles a little and that should bring them back to close to being in equilibrium. Algorithm works very well at building density models.

Alternate but much faster algorithm:

Hinton 2002 -

Example of Contrastive Divergence Lecture 12d

RBMs for Collaborative Filtering Lecture 12e

Boltzman Machines

Boltzman Machines

Presentation Transcript

Machines

MACHINES

Machines

MACHINES

Machines!

Machines

Machines

MACHINES

Machines

Machines

Machines

Maxwell Boltzman Distribution and reactivity

Machines

Machines -

Machines

Machines

Chapter 4 Boltzman Distribution in infirm-coupling system

Machines that Make machines

Machines

Machines

Machines

Machines