The Boltzmann Machine

The Boltzmann Machine Psych 419/719 March 1, 2001

Recall Constraint Satisfaction.. • We have a network of units and connections… • Finding an optimal state involves relaxation: letting the network settle into a configuration that maximizes a goodness function • This is done by annealing

Simulated Annealing • Update unit states according to a probability distribution, which is based on: • The input to the unit. Higher input = greater odds of being on • The temperature. High temperature = more random. Low temperature = deterministic function of input • Start with high temperature, and gradually reduce it

Constraint Satisfaction Networks Have Nice Properties • Can settle into stable configurations based on partial or noisy information • Can do pattern completion • Have well formed attractors corresponding to stable states • BUT: How can we make a network learn?

What about Backprop? • Two problems: • Tends to split the probability distributions • If input is ambiguous (say, the word LEAD), output reflects that distribution. Not like the necker cube • Also: not very biologically plausible. • Error gradients travel backwards along connections. Neurons don’t seem to do this.

We Need Hidden Units • Hidden units are needed to solve xor-style problems • In these networks, we have a set of symmetric connections between units. • Some units are visible and others are hidden

The Boltzmann Machine:Memorizing Patterns • Here, we want to train the network on a set of patterns. • We want the network to learn about the statistics and relationships between the parts of the patterns. • Not really performing an explicit mapping (like backprop is good for)

How it Works • Step 1. Pick an example • Step 2. Run network in positive phase • Step 3. Run network in negativephase • Step 4. Compare the statistics of the two phases • Step 5. Update the weights based on statistics • Step 6. Go to step 1 and repeat.

Step 1: Pick Example • Pretty simple. Just select an example at random.

Step 2. The Positive Phase • Clamp our visible units with the pattern specified by our current example • Let network settle using the simulated annealing method • Record the outputs of the units • Start again with our example, settling again and recording units again.

Step 3. The Negative Phase • Here, we don’t clamp the network units. We just let it settle to some state as before. • Do this several times, again recording the unit outputs.

Step 4. Compare Statistics • For each pair of units, we compute the odds that both units are coactive (both on) for the positive phase. Do it also for the negative phase. • If we have n units, this gives us two n x n matrices of probabilities • pi,j is probability that both unit i and j are both on.

Step 5: Update Weights • Change each weight according to the difference of the probabilities for the positive and negative phase • Here, k is like a learning rate

Why it Works • This reduces the difference between what the network settles to when the inputs are clamped, and what it settles to when its allowed to free-run. • So, the weights learn about what kinds of visible units go together. • Recruits hidden units to help learn higher order relationships

Can Be Used For Mappings Too • Here, the positive phase involves clamping both the input and output units and letting the network settle. • The negative phase involves clamping just the input units • Network learns that given the input, it should settle to a state where the output units are what they should be

Contrastive Hebbian Learning • Very similar to a normal Boltzmann machine, except we can have units whose outputs are a deterministic function of their input (like the logistic). • As before, we have two phases: positive and negative.

Contrastive Hebbian Learning Rule • Weight updates based on actual unit outputs, not probabilities that they’re both on.

Problems • Weight explosion. If weights get too big too early, network will get stuck in one goodness optimum. • Can be alleviated with weight decay • Settling time. Time to process an example is long, due to settling process. • Learning time. Takes a lot of presentations to learn. • Symmetric weights? Phases?

Sleep? • It has been suggested that something like the minus phase might be happening during sleep: • Spontaneous correlations between hidden units (not those driven by external input) get subtracted off. Will vanish, unless driven by external input while awake. • Not a lot of evidence to support this conjecture. • We can learn while awake!

For Next Time • Optional reading handed out. • Ends section on learning internal representations. Next: biologically plausible learning. • Remember: • No class next Thursday • Homework 3 due March 13 • Project proposal due March 15. See web page.

The Boltzmann Machine

The Boltzmann Machine

Presentation Transcript

The Boltzmann Distribution

The Shape Boltzmann Machine

Boltzmann Machine

Deep Boltzmann Machines

Restricted Boltzmann Machine and Deep Belief Net

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials:

Boltzmann Machine (BM) (§6.4)

The Boltzmann factor

Meta-controlled Boltzmann Machine toward Accelerating the Computation

Lattice Boltzmann Method

The Boltzmann Constant

The Maxwell-Boltzmann Distribution

Boltzmann statistics, paramagnetism

Ch12: Boltzmann Machine

Quantum Boltzmann Machine

The Boltzmann Transport Equation