2.67k likes | 4.11k Views
Feed-Forward Neural Networks. 主講人 : 虞台文. Content. Introduction Single-Layer Perceptron Networks Learning Rules for Single-Layer Perceptron Networks Perceptron Learning Rule Adaline Leaning Rule -Leaning Rule Multilayer Perceptron Back Propagation Learning algorithm.
Feed-Forward Neural Networks 主講人: 虞台文
Content • Introduction • Single-Layer Perceptron Networks • Learning Rules for Single-Layer Perceptron Networks • Perceptron Learning Rule • Adaline Leaning Rule • -Leaning Rule • Multilayer Perceptron • Back Propagation Learning algorithm
Feed-Forward Neural Networks Introduction
Historical Background • 1943 McCulloch and Pitts proposed the first computational models of neuron. • 1949 Hebb proposed the first learning rule. • 1958 Rosenblatt’s work in perceptrons. • 1969 Minsky and Papert’s exposed limitation of the theory. • 1970s Decade of dormancy for neural networks. • 1980-90s Neural network return (self-organization, back-propagation algorithms, etc)
Nervous Systems • Human brain contains ~ 1011 neurons. • Each neuron is connected ~ 104 others. • Some scientists compared the brain with a “complex, nonlinear, parallel computer”. • The largest modern neural networks achieve the complexity comparable to a nervous system of a fly.
Neurons • The main purpose of neurons is to receive, analyze and transmit further the information in a form of signals (electric pulses). • When a neuron sends the information we say that a neuron “fires”.
Neurons Acting through specialized projections known as dendrites and axons, neurons carry information throughout the neural network. This animation demonstrates the firing of a synapse between the pre-synaptic terminal of one neuron to the soma (cell body) of another neuron.
x1 wi1 x2 yi wi2 . . . f (.) a (.) wim =i xm= 1 bias A Model ofArtificial Neuron
x1 wi1 x2 yi wi2 . . . f (.) a (.) wim =i xm= 1 bias A Model ofArtificial Neuron
y1 y2 yn . . . . . . . . . . . . x1 x2 xm Feed-Forward Neural Networks • Graph representation: • nodes: neurons • arrows: signal flow directions • A neural network that does not contain cycles (feedback loops) is called a feed–forward network (or perceptron).
y1 y2 yn . . . Output Layer . . . . . . . . . Input Layer x1 x2 xm Layered Structure Hidden Layer(s)
y1 y2 yn . . . . . . . . . . . . x1 x2 xm Knowledge and Memory • The output behavior of a network is determined by the weights. • Weights the memory of an NN. • Knowledge distributed across the network. • Large number of nodes • increases the storage “capacity”; • ensures that the knowledge is robust; • fault tolerance. • Store new information by changing weights.
y1 y2 yn . . . . . . . . . . . . x1 x2 xm Pattern Classification output pattern y • Function: x y • The NN’s output is used to distinguish between and recognize different input patterns. • Different output patterns correspond to particular classes of input patterns. • Networks with hidden layers can be used for solvingmore complex problems then just a linear pattern classification. input pattern x
yi1 di1 yi2 di2 yin din xi1 xi2 xim Training Set Training . . . . . . Goal: . . . . . .
y1 y2 yn . . . . . . . . . . . . x1 x2 xm Generalization • By properly training a neural network may produce reasonable answers for input patterns notseen during training (generalization). • Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series).
y1 y2 yn . . . . . . . . . with noise without noise . . . x1 x2 xm Generalization • By properly training a neural network may produce reasonable answers for input patterns notseen during training (generalization). • Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series).
Applications • Pattern classification • Object recognition • Function approximation • Data compression • Time series analysis and forecast • . . .
Feed-Forward Neural Networks Single-Layer Perceptron Networks
. . . y1 y2 yn . . . w1m w2m wn1 w22 w12 wn2 w11 wnm w21 . . . xm= 1 x1 x2 xm-1 The Single-Layered Perceptron
. . . y1 y2 yn d1 d2 dn . . . w1m w2m wn1 w22 w12 wn2 w11 wnm w21 . . . xm= 1 x1 x2 xm-1 Training a Single-Layered Perceptron Training Set Goal:
. . . y1 y2 yn d1 d2 dn . . . w1m w2m wn1 w22 w12 wn2 w11 wnm w21 . . . xm= 1 x1 x2 xm-1 Learning Rules • Linear Threshold Units (LTUs) : Perceptron Learning Rule • Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule Training Set Goal:
Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks Perceptron Learning Rule Adline Leaning Rule -Learning Rule
x1 wi1 x2 wi2 . . . +1 1 wim=i xm= 1 Perceptron Linear Threshold Unit sgn
x1 wi1 x2 wi2 . . . +1 1 wim=i xm= 1 Goal: Perceptron Linear Threshold Unit sgn
Class 1 (+1) Class 2 (1) x2 y x1 2 1 2 x2 x3= 1 x1 Goal: Example Class 1 g(x) = 2x1 +x2+2=0 Class 2
Class 1 (+1) Class 2 (1) y 2 1 2 x2 x3= 1 x1 Goal: Augmented input vector Class 1 (+1) Class 2 (1)
x3 x2 x1 Class 1 (0,0,0) (1,0, 1) (1.5, 1, 1) (2,0, 1) y (2.5, 1, 1) (1, 2, 1) (1, 2, 1) Class 2 2 1 2 x2 x3= 1 x1 Goal: Augmented input vector
x3 x2 x1 Class 1 (0,0,0) (1,0, 1) (1.5, 1, 1) (2,0, 1) y (2.5, 1, 1) (1, 2, 1) (1, 2, 1) Class 2 2 1 2 x2 x3= 1 x1 Goal: Augmented input vector A plane passes through the origin in the augmented input space.
1 1 1 0 0 0 1 1 1 Linearly Separable vs. Linearly Non-Separable AND OR XOR Linearly Separable Linearly Separable Linearly Non-Separable
Goal • Given training setsT1C1andT2 C2withelements in form ofx=(x1, x2 , ..., xm-1 , xm) T, where x1, x2 , ..., xm-1 Randxm= 1. • AssumeT1 andT2arelinearly separable. • Findw=(w1, w2 , ..., wm) Tsuch that
wTx = 0is a hyperplain passes through the origin ofaugmented input space. Goal • Given training setsT1C1andT2 C2withelements in form ofx=(x1, x2 , ..., xm-1 , xm) T, where x1, x2 , ..., xm-1 Randxm= 1. • AssumeT1 andT2arelinearly separable. • Findw=(w1, w2 , ..., wm) Tsuch that
d = +1 + d = 1 x2 w1 x w2 w6 x1 w5 w3 w4 Observation Which w’s correctly classify x? + What trick can be used?
d = +1 + d = 1 x2 w x x1 Observation Is this w ok? + w1x1 +w2x2 = 0
d = +1 + d = 1 x2 x w x1 Observation w1x1 +w2x2 = 0 Is this w ok? +
d = +1 + d = 1 x2 x ? w ? x1 Observation w1x1 +w2x2 = 0 Is this w ok? + How to adjust w? w = ?
d = +1 + d = 1 x2 x w x1 Observation Is this w ok? + How to adjust w? w = x reasonable? >0 <0
d = +1 + d = 1 x2 x w x1 Observation Is this w ok? + reasonable? How to adjust w? w = x >0 <0
d = +1 + d = 1 x2 w x x1 Observation Is this w ok? w = ? +x x or
d = +1 + d = 1 + + No error Perceptron Learning Rule Upon misclassification on Define error
+ + No error Perceptron Learning Rule Define error
Learning Rate Error (dy) Input Perceptron Learning Rule
Based on the general weight learning rule. Summary Perceptron Learning Rule correct incorrect
x y . . . . . . d + Converge? Summary Perceptron Learning Rule
x y . . . . . . d + • Exercise: Reference some papers or textbooks to prove the theorem. Perceptron Convergence Theorem If the given training set is linearly separable, the learning process will converge in a finite number of steps.
x2 x(1) + x(2) + x1 x(3) x(4) Linearly Separable. The Learning Scenario
x2 x(1) + x(2) + x1 x(3) x(4) w0 The Learning Scenario
x2 x(1) + x(2) w1 + x1 w0 x(3) x(4) w0 The Learning Scenario
x2 x(1) w2 + w1 x(2) w1 + x1 w0 x(3) x(4) w0 The Learning Scenario
x2 w2 x(1) w3 w2 + w1 x(2) w1 + x1 w0 x(3) x(4) w0 The Learning Scenario
x2 w2 x(1) w2 + w1 x(2) w1 + x1 w0 x(3) x(4) w0 w4 = w3 The Learning Scenario w3