110 likes | 212 Views
Neural Networks. 10701 /15781 Recitation February 12, 2008. Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Recall Linear Regression. Prediction of continuous variables Learn the mapping f: X Y
E N D
Neural Networks 10701/15781Recitation February 12, 2008 Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.
Recall Linear Regression • Prediction of continuous variables • Learn the mapping f: X Y • Model is linear in the parameters w (+ some noise) • Assume Gaussian noise • Learn MLE w =
Neural Network • Neural nets are also models withw parameters in them. They are now called weights. • As before, we want to compute the weights to minimize sum-of-squared residuals • Which turns out, under “Gaussian i.i.d noise” assumption to be max. likelihood. • Instead of explicitly solving for max. likelihood weights, we use Gradient Descent
Perceptrons • Input x=(x1,…, xn) and target value t: or • Given training data {(x(l),t(l))}, find w which minimizes
Gradient descent • General framework for finding a minimum of a continuous (differentiable) function f(w) • Start with some initial value w(1) and compute the gradient vector • The next value w(2)is obtained by moving some distance from w(1) in the direction of steepest descent, i.e., along the negative of the gradient
Gradient Descent on a Perceptron • The sigmoid perceptron update rule
Boolean Functions e.g using step activation function with threshold 0, can we learn the function • X1 AND X2? • X1OR X2? • X1AND NOT X2? • X1XOR X2?
Multilayer Networks • The class of functions representable by perceptron is limited • Think of nonlinear functions:
A 1-Hidden layer Net • Ninput=2, Nhidden=3, Noutput=1
Backpropagation • HW2 – Problem 2 • Output in k-th output unit from input x • With bias: add a constant term for every non-input unit • Learn w to minimize
Backpropagation Initialize all weights Do until convergence 1. Input a training example to the network and compute the output ok 2. Update each hidden-to-output weight wkj by 3. Update each input-to-hidden weight wji by