200 likes | 334 Views
The Boltzmann Machine. Psych 419/719 March 1, 2001. Recall Constraint Satisfaction. We have a network of units and connections… Finding an optimal state involves relaxation : letting the network settle into a configuration that maximizes a goodness function This is done by annealing.
E N D
The Boltzmann Machine Psych 419/719 March 1, 2001
Recall Constraint Satisfaction.. • We have a network of units and connections… • Finding an optimal state involves relaxation: letting the network settle into a configuration that maximizes a goodness function • This is done by annealing
Simulated Annealing • Update unit states according to a probability distribution, which is based on: • The input to the unit. Higher input = greater odds of being on • The temperature. High temperature = more random. Low temperature = deterministic function of input • Start with high temperature, and gradually reduce it
Constraint Satisfaction Networks Have Nice Properties • Can settle into stable configurations based on partial or noisy information • Can do pattern completion • Have well formed attractors corresponding to stable states • BUT: How can we make a network learn?
What about Backprop? • Two problems: • Tends to split the probability distributions • If input is ambiguous (say, the word LEAD), output reflects that distribution. Not like the necker cube • Also: not very biologically plausible. • Error gradients travel backwards along connections. Neurons don’t seem to do this.
We Need Hidden Units • Hidden units are needed to solve xor-style problems • In these networks, we have a set of symmetric connections between units. • Some units are visible and others are hidden
The Boltzmann Machine:Memorizing Patterns • Here, we want to train the network on a set of patterns. • We want the network to learn about the statistics and relationships between the parts of the patterns. • Not really performing an explicit mapping (like backprop is good for)
How it Works • Step 1. Pick an example • Step 2. Run network in positive phase • Step 3. Run network in negativephase • Step 4. Compare the statistics of the two phases • Step 5. Update the weights based on statistics • Step 6. Go to step 1 and repeat.
Step 1: Pick Example • Pretty simple. Just select an example at random.
Step 2. The Positive Phase • Clamp our visible units with the pattern specified by our current example • Let network settle using the simulated annealing method • Record the outputs of the units • Start again with our example, settling again and recording units again.
Step 3. The Negative Phase • Here, we don’t clamp the network units. We just let it settle to some state as before. • Do this several times, again recording the unit outputs.
Step 4. Compare Statistics • For each pair of units, we compute the odds that both units are coactive (both on) for the positive phase. Do it also for the negative phase. • If we have n units, this gives us two n x n matrices of probabilities • pi,j is probability that both unit i and j are both on.
Step 5: Update Weights • Change each weight according to the difference of the probabilities for the positive and negative phase • Here, k is like a learning rate
Why it Works • This reduces the difference between what the network settles to when the inputs are clamped, and what it settles to when its allowed to free-run. • So, the weights learn about what kinds of visible units go together. • Recruits hidden units to help learn higher order relationships
Can Be Used For Mappings Too • Here, the positive phase involves clamping both the input and output units and letting the network settle. • The negative phase involves clamping just the input units • Network learns that given the input, it should settle to a state where the output units are what they should be
Contrastive Hebbian Learning • Very similar to a normal Boltzmann machine, except we can have units whose outputs are a deterministic function of their input (like the logistic). • As before, we have two phases: positive and negative.
Contrastive Hebbian Learning Rule • Weight updates based on actual unit outputs, not probabilities that they’re both on.
Problems • Weight explosion. If weights get too big too early, network will get stuck in one goodness optimum. • Can be alleviated with weight decay • Settling time. Time to process an example is long, due to settling process. • Learning time. Takes a lot of presentations to learn. • Symmetric weights? Phases?
Sleep? • It has been suggested that something like the minus phase might be happening during sleep: • Spontaneous correlations between hidden units (not those driven by external input) get subtracted off. Will vanish, unless driven by external input while awake. • Not a lot of evidence to support this conjecture. • We can learn while awake!
For Next Time • Optional reading handed out. • Ends section on learning internal representations. Next: biologically plausible learning. • Remember: • No class next Thursday • Homework 3 due March 13 • Project proposal due March 15. See web page.