CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya Training The Feedforward Network; Backpropagation Algorithm

Multilayer Feedforward Network - Needed for solving problems which are not linearly separable. - Hidden layer neurons: assist computation.

…….. Output layer …….. Hidden layer …….. …….. Input layer Forward connection; no feedback connection

Gradient Descent Rule j Wji fed feeding ΔWji α - δE/ δWji i P M E = error = ½ ΣΣ( tm – om) 2 p=1 m=1 TOTAL SUM SQUARE ERROR(TSS)

Gradient Descent For a Single Neuron y n Net input = Σ WiXi i=0 …. W0 = 0 Wn Wn-1 Xn X0 = -1 Xn-1

y= f(net) Characteristic function f = sigmoid = 1 / ( 1+ e-net ) df f = = f(1-f) dnet y net

α Y = 0 …. observed target Wn ΔWi - δE/ δWi E = ½( t- o)2 W0 Wn-1 Xn X0 Xn-1

α W = <Wn, ……, W0> randomly initialized ΔWi - δE/ δWi = - ηδE/ δWi , ηis the learning rate 0 <= η <=1

E ΔWi = - ηδE / δWi δE / δWi =δ(1/2(t - o)2) / δWi = (δE / δo)*(δo/ δWi ); chain rule = - (t - o) * (δo / δnet)* (δnet/ δWi)

o net δo / δnet= δ f(net)/ δnet = f(net) = f ( 1 - f ) = o ( 1 - o )

y W …. …. δnet/ δWi = xi Wn Wi W0 Xn X0 Xi n net = ΣWiXi i = 0

E = ½ (t - o)2 ΔWi = η (t - o) (1 - o) o Xi o δE / δo δnet / δWi W …. …. δf / δnet Wn Wi W0 Xn X0 Xi

o …. …. Wn Wi W0 Xn X0 Xi E = ½( t - o) 2 ΔWi = η (t - o) (1 - o) o Xi Obs: Xi = 0 , ΔWi = 0 If Xi is more, so is the ΔWi BLAME/CREDIT ASSIGNMENT

More the difference ( t – o ), more is Δw. If( t – o ) is +ve , so is Δw If( t – o ) is –ve, so is Δw

If o is 0/1 , Δw = 0 o is 0/1 when net = - ∞ or + ∞ Δw  0 because of o  0/1. It is called “saturation” or “paralysis’ of the network. It happens due to sigmoid. o 1 net

k 1. y = k / (1+e–x) k Solution to network saturation 2. y = tanh(x) x - k

Solution to network saturation (Contd) 3. Scale the inputs Reduced the values Problem of floating/fixed number representation error.

ΔWi = η ( t - o) o ( 1 – o) Xi Smaller η smaller ΔW

E op. pt Wi Start with large η, gradually decrease it. Global minimum

Gradient Descent training is typically slow: First parameter: η; learning rate Second parameter: β; Momentum factor0 <= β <= 1

Momentum Factor Use a part of previous weight Change into the current weight change. (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1 Iteration

Effect of β If (ΔWi)n and (ΔWi)n-1 are of same sign then (ΔWi)n is enhanced. If (ΔWi)n and (ΔWi)n-1 are of opposite sign then effective (ΔWi)n is reduced.

A E P Q op. pt R S W Accelerates movement at A. 2) Dampens oscillation near global minimum.

Pure gradient descent momentum (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi )n-1 Relation between η and β ?

Relation between η and β η >> β ? η << β ? (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1

Relation between η and β (Contd) If η << β (ΔWi)n = β(ΔWi)n-1 recurrence Relation (ΔWi )n = β(ΔWi)n-1 = β[β(ΔWi)n-2] = β2[β(ΔWi)n-3] . . . = βn(ΔWi)0

Relation between η and β (Contd) β is typically 1/10 th of η Empirical Practice If β is very large compared to η, no effect of output error, input or neuron characteristics is felt. Also (ΔW) goes on decreasing since β is a fraction.

CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya

Presentation Transcript

Artificial Life

Artificial Intelligence Chapter 4: Informed Search and Exploration

Intelligence Chapter 10

Chapter 6: Knowledge-based Decision Support and Artificial Intelligence

Explorations in Artificial Intelligence

Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, Univer

344-571 ปัญญาประดิษฐ์ ( Artificial Intelligence)

CS460/626 : Natural Language Processing/Language Technology for the Web (Lecture 1 – Introduction)

Artificial Intelligence

CptS 440 / 540 Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Artificial Intelligence

CSCE 580 Artificial Intelligence Ch.4: Informed (Heuristic) Search and Exploration

CS712 : Topics in Natural Language Processing (Lecture 1– Introduction; Machine Translation)

Intelligence and Security Informatics for International Security: Framework and Case Studies

人工智能 Artificial Intelligence

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints