Feed-Forward Neural Networks

Feed-Forward Neural Networks 主講人: 虞台文

Content • Introduction • Single-Layer Perceptron Networks • Learning Rules for Single-Layer Perceptron Networks • Perceptron Learning Rule • Adaline Leaning Rule • -Leaning Rule • Multilayer Perceptron • Back Propagation Learning algorithm

Feed-Forward Neural Networks Introduction

Historical Background • 1943 McCulloch and Pitts proposed the first computational models of neuron. • 1949 Hebb proposed the first learning rule. • 1958 Rosenblatt’s work in perceptrons. • 1969 Minsky and Papert’s exposed limitation of the theory. • 1970s Decade of dormancy for neural networks. • 1980-90s Neural network return (self-organization, back-propagation algorithms, etc)

Nervous Systems • Human brain contains ~ 1011 neurons. • Each neuron is connected ~ 104 others. • Some scientists compared the brain with a “complex, nonlinear, parallel computer”. • The largest modern neural networks achieve the complexity comparable to a nervous system of a fly.

Neurons • The main purpose of neurons is to receive, analyze and transmit further the information in a form of signals (electric pulses). • When a neuron sends the information we say that a neuron “fires”.

Neurons Acting through specialized projections known as dendrites and axons, neurons carry information throughout the neural network. This animation demonstrates the firing of a synapse between the pre-synaptic terminal of one neuron to the soma (cell body) of another neuron.

x1 wi1 x2  yi wi2 . . . f (.) a (.) wim =i xm= 1 bias A Model ofArtificial Neuron

y1 y2 yn . . . . . . . . . . . . x1 x2 xm Feed-Forward Neural Networks • Graph representation: • nodes: neurons • arrows: signal flow directions • A neural network that does not contain cycles (feedback loops) is called a feed–forward network (or perceptron).

y1 y2 yn . . . Output Layer . . . . . . . . . Input Layer x1 x2 xm Layered Structure Hidden Layer(s)

y1 y2 yn . . . . . . . . . . . . x1 x2 xm Knowledge and Memory • The output behavior of a network is determined by the weights. • Weights  the memory of an NN. • Knowledge distributed across the network. • Large number of nodes • increases the storage “capacity”; • ensures that the knowledge is robust; • fault tolerance. • Store new information by changing weights.

y1 y2 yn . . . . . . . . . . . . x1 x2 xm Pattern Classification output pattern y • Function: x y • The NN’s output is used to distinguish between and recognize different input patterns. • Different output patterns correspond to particular classes of input patterns. • Networks with hidden layers can be used for solvingmore complex problems then just a linear pattern classification. input pattern x

yi1 di1 yi2 di2 yin din xi1 xi2 xim Training Set Training . . . . . . Goal: . . . . . .

y1 y2 yn . . . . . . . . . . . . x1 x2 xm Generalization • By properly training a neural network may produce reasonable answers for input patterns notseen during training (generalization). • Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series).

y1 y2 yn . . . . . . . . . with noise without noise . . . x1 x2 xm Generalization • By properly training a neural network may produce reasonable answers for input patterns notseen during training (generalization). • Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series).

Applications • Pattern classification • Object recognition • Function approximation • Data compression • Time series analysis and forecast • . . .

Feed-Forward Neural Networks Single-Layer Perceptron Networks

. . . y1 y2 yn . . . w1m w2m wn1 w22 w12 wn2 w11 wnm w21 . . . xm= 1 x1 x2 xm-1 The Single-Layered Perceptron

. . . y1 y2 yn d1 d2 dn . . . w1m w2m wn1 w22 w12 wn2 w11 wnm w21 . . . xm= 1 x1 x2 xm-1 Training a Single-Layered Perceptron Training Set Goal:

. . . y1 y2 yn d1 d2 dn . . . w1m w2m wn1 w22 w12 wn2 w11 wnm w21 . . . xm= 1 x1 x2 xm-1 Learning Rules • Linear Threshold Units (LTUs) : Perceptron Learning Rule • Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule Training Set Goal:

Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks Perceptron Learning Rule Adline Leaning Rule -Learning Rule

x1 wi1 x2 wi2  . . . +1 1 wim=i xm= 1 Perceptron Linear Threshold Unit sgn

x1 wi1 x2 wi2  . . . +1 1 wim=i xm= 1 Goal: Perceptron Linear Threshold Unit sgn

Class 1 (+1) Class 2 (1) x2 y x1 2 1 2 x2 x3= 1 x1 Goal: Example Class 1 g(x) = 2x1 +x2+2=0 Class 2

Class 1 (+1) Class 2 (1) y 2 1 2 x2 x3= 1 x1 Goal: Augmented input vector Class 1 (+1) Class 2 (1)

x3 x2 x1 Class 1 (0,0,0) (1,0, 1) (1.5, 1, 1) (2,0, 1) y (2.5, 1, 1) (1, 2, 1) (1, 2, 1) Class 2 2 1 2 x2 x3= 1 x1 Goal: Augmented input vector

x3 x2 x1 Class 1 (0,0,0) (1,0, 1) (1.5, 1, 1) (2,0, 1) y (2.5, 1, 1) (1, 2, 1) (1, 2, 1) Class 2 2 1 2 x2 x3= 1 x1 Goal: Augmented input vector A plane passes through the origin in the augmented input space.

1 1 1 0 0 0 1 1 1 Linearly Separable vs. Linearly Non-Separable AND OR XOR Linearly Separable Linearly Separable Linearly Non-Separable

Goal • Given training setsT1C1andT2  C2withelements in form ofx=(x1, x2 , ..., xm-1 , xm) T, where x1, x2 , ..., xm-1 Randxm= 1. • AssumeT1 andT2arelinearly separable. • Findw=(w1, w2 , ..., wm) Tsuch that

wTx = 0is a hyperplain passes through the origin ofaugmented input space. Goal • Given training setsT1C1andT2  C2withelements in form ofx=(x1, x2 , ..., xm-1 , xm) T, where x1, x2 , ..., xm-1 Randxm= 1. • AssumeT1 andT2arelinearly separable. • Findw=(w1, w2 , ..., wm) Tsuch that

d = +1 + d = 1  x2 w1 x w2 w6 x1 w5 w3 w4 Observation Which w’s correctly classify x? + What trick can be used?

d = +1 + d = 1  x2 w x x1 Observation Is this w ok? + w1x1 +w2x2 = 0

d = +1 + d = 1  x2 x w x1 Observation w1x1 +w2x2 = 0 Is this w ok? +

d = +1 + d = 1  x2 x ? w ? x1 Observation w1x1 +w2x2 = 0 Is this w ok? + How to adjust w? w = ?

d = +1 + d = 1  x2 x w x1 Observation Is this w ok? + How to adjust w? w = x reasonable? >0 <0

d = +1 + d = 1  x2 x w x1 Observation Is this w ok? + reasonable? How to adjust w? w = x >0 <0

d = +1 + d = 1  x2 w x x1 Observation Is this w ok?  w = ? +x x or

d = +1 + d = 1  +   + No error Perceptron Learning Rule Upon misclassification on Define error

+   + No error Perceptron Learning Rule Define error

Learning Rate Error (dy) Input Perceptron Learning Rule

Based on the general weight learning rule. Summary  Perceptron Learning Rule correct incorrect

x y . . . . . .  d +  Converge? Summary  Perceptron Learning Rule

x y . . . . . .  d +  • Exercise: Reference some papers or textbooks to prove the theorem. Perceptron Convergence Theorem If the given training set is linearly separable, the learning process will converge in a finite number of steps.

x2 x(1) + x(2) + x1 x(3)  x(4)  Linearly Separable. The Learning Scenario

x2 x(1) + x(2) + x1 x(3)  x(4) w0  The Learning Scenario

x2 x(1) + x(2) w1 + x1 w0 x(3)  x(4) w0  The Learning Scenario

x2 x(1) w2 + w1 x(2) w1 + x1 w0 x(3)  x(4) w0  The Learning Scenario

x2 w2 x(1) w3 w2 + w1 x(2) w1 + x1 w0 x(3)  x(4) w0  The Learning Scenario

x2 w2 x(1) w2 + w1 x(2) w1 + x1 w0 x(3)  x(4) w0  w4 = w3 The Learning Scenario w3

Feed-Forward Neural Networks