250 likes | 398 Views
Neural Networks. Slides from: Doug Gray, David Poole. What is a Neural Network?. Information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information
E N D
Neural Networks Slides from: Doug Gray, David Poole
What is a Neural Network? • Information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information • A method of computing, based on the interaction of multiple connected processing elements
What can a Neural Net do? • Compute a known function • Approximate an unknown function • Pattern Recognition • Signal Processing • Learn to do any of the above
Basic Concepts A Neural Network generally maps a set of inputs to a set of outputs Number of inputs/outputs is variable The Network itself is composed of an arbitrary number of nodes with an arbitrary topology
Basic Concepts Definition of a node: • A node is an element which performs the function y = fH(∑(wixi) + Wb) Connection Node
Properties • Inputs are flexible • any real values • Highly correlated or independent • Target function may be discrete-valued, real-valued, or vectors of discrete or real values • Outputs are real numbers between 0 and 1 • Resistant to errors in the training data • Long training time • Fast evaluation • The function produced can be difficult for humans to interpret
Perceptrons • Basic unit in a neural network • Linear separator • Parts • N inputs, x1 ... xn • Weights for each input, w1 ... wn • A bias input x0 (constant) and associated weight w0 • Weighted sum of inputs, y = w0x0 + w1x1 + ... + wnxn • A threshold function (activation function), i.e 1 if y > 0, -1 if y <= 0
Diagram w1 x1 x0 w0 x2 w2 Σ Threshold . . . 1 if y >0 -1 otherwise y = Σ wixi xn wn
Typical Activation Functions • F(x) = 1 / (1 + e –x) • Using a nonlinear function which approximates a linear threshold allows a network to approximate nonlinear functions
Simple Perceptron • Binary logic application • fH(x) = u(x) [linear threshold] • Wi = random(-1,1) • Y = u(W0X0 + W1X1 + Wb) • Now how do we train it?
Basic Training • Perception learning ruleΔWi = η * (D – Y) * Xi • η = Learning Rate • D = Desired Output • Adjust weights based on how well the current weights match an objective
Logic Training • Expose the network to the logical OR operation • Update the weights after each epoch • As the output approaches the desired output for all cases, ΔWi will approach 0
Results W0 W1 Wb
Details • Network converges on a hyper-plane decision surface • X1 = (W0/W1)X0 + (Wb/W1) X1 X0
Feed-forward neural networks Feed-forward neural networks are the most common models. These are directed acyclic graphs:
Axiomatizing the Network The values of the attributes are real numbers. Thirteen parameters w0; … ;w12 are real numbers. The attributes h1 and h2 correspond to the values of hidden units. There are 13 real numbers to be learned. The hypothesis space is thus a 13-dimensional real space. Each point in this 13-dimensional space corresponds to a particular logic program that predicts a value for reads given known, new, short, and home
Neural Network Learning Aim of neural network learning: given a set of examples, find parameter settings that minimize the error. Back-propagation learning is gradient descent search through the parameter space to minimize the sum-of-squares error.
Backpropagation Learning • Inputs: • A network, including all units and their connections • Stopping Criteria • Learning Rate (constant of proportionality of gradient descent search) • Initial values for the parameters • A set of classified training data • Output: Updated values for the parameters
Backpropagation Learning Algorithm • Repeat • evaluate the network on each example given the current parameter settings • determine the derivative of the error for each parameter • change each parameter in proportion to its derivative • until the stopping criteria is met
Bias in neural networks and decision trees • It’s easy for a neural network to represent “at least two of I1, …, Ik are true”: w0 w1 wk -15 10 10 This concept forms a large decision tree. • Consider representing a conditional: “If c then a else b”: • Simple in a decision tree. • Needs a complicated neural network to represent (c ^ a) V (~c ^ b).
Neural Networks and Logic Meaning is attached to the input and output units. There is no a priori meaning associated with the hidden units. What the hidden units actually represent is something that’s learned.