1 / 28

Neural Networks

Neural Networks. Tuomas Sandholm Carnegie Mellon University Computer Science Department. How the brain works. Synaptic connections exhibit long-term changes in the connection strengths based on patterns seen. Comparing brains with digital computers. Parallelism Graceful degradation

lynley
Download Presentation

Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks Tuomas Sandholm Carnegie Mellon University Computer Science Department

  2. How the brain works Synaptic connections exhibit long-term changes in the connection strengths based on patterns seen

  3. Comparing brains with digital computers Parallelism Graceful degradation Inductive learning

  4. ANN (software/hardware, synchronous/asynchronous) Notation

  5. Single unit (neuron) of an artificial neural network

  6. Activation Functions Where W0,i = t and a0= -1  fixed

  7. W=1 W=1 W= -1 t=-0.5 t=1.5 t=0.5 W=1 W=1 Boolean gates can be simulated by units with a step function AND OR NOT g is a step function

  8. Topologies Feed-forward vs. recurrent Recurrent networks have state (activations from previous time steps have to be remembered): Short-term memory.

  9. Hopfield network • Bidirectional symmetric (Wi,j = Wj,i) connections • g is the sign function • All units are both input and output units • Activations are  1 • “Associative memory” • After training on a set of examples, a new stimulus will cause the network to settle into an activation pattern corresponding to the example in the training set that most closely resemble the new stimulus. • E.g. parts of photograph • Thrm. Can reliably store 0.138 #units training examples

  10. Boltzman machine • Symmetric weights • Each output is 0 or 1 • Includes units that are neither input units nor output units • Stochastic g, i.e. some probability (as a fn of ini) that g=1 • State transitions that resemble simulated annealing. Approximates the configuration that best meets the training set.

  11. Learning in ANNs is the process of tuning the weights Form of nonlinear regression.

  12. ANN topology Representation capability vs. overfitting risk. A feed-forward net with one hidden layer can approximate any continuous fn of the inputs. With 2 hidden layers it can approximate any fn at all. The #units needed in each layer may grow exponentially Learning the topology Hill-climbing vs. genetic algorithms vs. … Removing vs. adding (nodes/connections). Compare candidates via cross-validation.

  13. Perceptrons Implementable with one output unit Decision tree requires O(2n) nodes Majority fn

  14. Representation capability of a perceptron Every input can only affect the output in one direction independent of other inputs. E.g. unable to represent WillWait in the restaurant example. Perceptrons can only represent linearly separable fns. For a given problem, does one know in advance whether it is linearly separable?

  15. Linear separability in 3D Minority Function

  16. Learning linearly separable functions Training examples used over and over! epoch Err = T-O Variant of perceptron learning rule. Thrm. Will learn the linearly separable target fn. (if  is not too high) Intuition: gradient descent in a search space with no local optima

  17. Encoding for ANNs E.g. #patrons can be none, some or full Local encoding: None=0.0, Some=0.5, Full=1.0 Distributed encoding: None 1 0 0 Some 0 1 0 Full 0 0 1

  18. Majority Function

  19. WillWait

  20. Multilayer feedforward networks Structural credit assignment problem Back propagation algorithm (again, Erri=Ti-Oi) Updating between hidden & output units. Updating between input & hidden units: Back propagation of the error

  21. Back propagation (BP) as gradient descent search A way of localizing the computation of the gradient to units.

  22. Observations on BP as gradient descent • Minimize error  move in opposite direction of gradient • g needs to be differentiable • Cannot use sign fn or step fn • Use e.g. sigmoid g’=g(1-g) • Gradient taken wrt. one training example at a time

  23. ANN learning curve WillWait problem

  24. WillWait Problem

  25. Expressiveness of BP 2n/n hidden units needed to represent arbitrary Boolean fns of n inputs. (such a network has O(2n) weights, and we need at least 2n bits to represent a Boolean fn) Thrm. Any continuous fn f:[0,1]nRm Can be implemented in a 3-layer network with 2n+1 hidden units. (activation fns take special form) [Kolmogorov]

  26. Using is fast Training is slow Epoch takes May need exponentially many epochs in #inputs Efficiency of BP

  27. More on BP… Generalization: Good on fns where output varies smoothly with input Sensitivity to noise: Very tolerant of noise Does not give a degree of certainty in the output Transparency: Black box Prior knowledge: Hard to “prime” No convergence guarantees

  28. Summary of representation capabilities (model class) of different supervised learning methods 3-layer feedforward ANN Decision Tree Perceptron K-Nearest neighbor Version space

More Related