250 likes | 379 Views
Neural Networks. Ellen Walker Hiram College. Connectionist Architectures. Characterized by (Rich & Knight) Large number of very simple neuron-like processing elements Large number of weighted connections between these elements Highly parallel, distributed control
E N D
Neural Networks Ellen Walker Hiram College
Connectionist Architectures • Characterized by (Rich & Knight) • Large number of very simple neuron-like processing elements • Large number of weighted connections between these elements • Highly parallel, distributed control • Emphasis on automatic learning of internal representations (weights)
Classes of Connectionist Architectures • Constraint networks • Positive and negative connections denote constraints between the values of nodes • Weights set by programmer • Layered networks • Weights represent contribution from one intermediate value to the next • Weights are learned using feedback
Hopfield Network • A constraint network • Every node is connected to every other node • If the weight is 0, the connection doesn’t matter • To use the network, set the values of the nodes and let the nodes adjust their values according to the weights. • The “result” is the set of all values in the stabilized network.
Hopfield Network as CAM • Nodes represent features of objects • Compatible features support each other (weights > 0) • Stable states (local minima) are “valid” interpretations • Noise features (incompatible) will be suppressed (network will fall into nearest stable state)
Relaxation • Algorithm to find stable state for Hopfield network (serial or parallel) • Pick a node • Compute [incoming weights]*[neighbors] • If above sum > 0, node =1, else node = -1 • When values aren’t changing, network is stable • Result can depend on order of nodes chosen
Line Labeling and Relaxation • Given an object, each vertex contrains the labels of its connected lines
Hopfield Network for Labeling Lines denote positive links between compatible labels Each gray box contains 4 mutually exclusive nodes (with negative links between them)
Boltzmann Machine • Alternative training method for a Hopfield network, based on simulated annealing • Goal: to find the most stable state (rather than the nearest) • Boltzmann rule is probabilistic, based on the “temperature” of the system
Deterministic vs. Boltzman • Deterministic update rule • Probabilistic update rule • As temperature decreases, probabilistic rule approaches deterministic one
Networks and Function Fitting • We earlier talked about function fitting • Finding a function that approximates a set of data so that • Function fits the data well • Function generalized to fit additional data
What Can a Neuron Compute? • n inputs (i0=1, i1…in) • n+1 weights (w0…wn) • 1 output: • 1 if g(i) > 0 • 0 if g(i) < 0 • g(i) = • G denotes a linear surface, and the output is 1 if the point is above this surface
Training a Neuron • Initialize weights randomly • Collect all misclassified examples • If there are none, we’re done. • Else compute gradient & update weights • Add all points that should have fired, subtract all points that should not have fired • Add a constant (0<C<1) * gradient back to the weights. • Repeat steps 2-5 until done (Guaranteed to converge -- loop will end)
Perceptron Problem • We have a model and a training algorithm, but we can only compute linearly separable functions! • Most interesting functions are not linearly separable. • Solution: use more than one line (multiple perceptrons)
Multilayered Network input hidden output Layered, fully-connected (between layers), feed-forward
Backpropagation Training • Compute a result: • input->hidden->output • Compute error for each hidden node, based on desired result • Propagate errors back: • Output->hidden, hidden->input • Weights are adjusted using gradient
Backpropagation Training (cont’d) • Repeat above for every example in the training set (one epoch) • Repeat above until stopping criterion is reached • Good enough average performance on training set • Little enough change in network • Hundreds of epochs…
Generalization • If the network is trained correctly, results will generalize to unseen data • If overtrained, network will “memorize” training data, random outputs otherwise • Tricks to avoid memorization • Limit number of hidden nodes • Insert noise into training data
Unsupervised Network Learning • Kohonen network for classification
Training Kohonen Network • Create inhibitory links among nodes of output layer (“winner take all”) • For each item in training data: • Determine an input vector • Run network - find max output node • Reinforce (increase) weights to maximum node • Normalize weights so they sum to 1
Representations in Networks • Distributed representation • Concept = pattern • Examples: Hopfield, backpropagation • Localist representation • Concept = single node • Example: Kohonen • Distributed can be more robust, also more efficient