CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS344: Introduction to Artificial Intelligence(associated lab: CS386) Pushpak BhattacharyyaCSE Dept., IIT Bombay Lecture 34: Backpropagation; need for multiple layers and non linearity 5th April, 2011

Backpropagation algorithm …. Output layer (m o/p neurons) j wji • Fully connected feed forward network • Pure FF network (no jumping of connections over layers) …. i Hidden layers …. …. Input layer (n i/p neurons)

Gradient Descent Equations

Backpropagation – for outermost layer

Backpropagation for hidden layers …. Output layer (m o/p neurons) k …. j Hidden layers …. i …. Input layer (n i/p neurons) kis propagated backwards to find value of j

Backpropagation – for hidden layers

General Backpropagation Rule • General weight updating rule: • Where for outermost layer for hidden layers

Observations on weight change rules Does the training technique support our intuition? • The larger the xi, larger is ∆wi • Error burden is borne by the weight values corresponding to large input values

Observations contd. • ∆wi is proportional to the departure from target • Saturation behaviour when o is 0 or 1 • If o < t, ∆wi > 0 and if o > t, ∆wi < 0 which is consistent with the Hebb’s law

Hebb’s law nj wji ni • If nj and ni are both in excitatory state (+1) • Then the change in weight must be such that it enhances the excitation • The change is proportional to both the levels of excitation ∆wjiα e(nj) e(ni) • If ni and nj are in a mutual state of inhibition ( one is +1 and the other is -1), • Then the change in weight is such that the inhibition is enhanced (change in weight is negative)

Saturation behavior • The algorithm is iterative and incremental • If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region. • The weight values hardly change in the saturation region

= 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x2 x1 How does it work? • Input propagation forward and error propagation backward (e.g. XOR)

If Sigmoid Neurons Are Used, Do We Need MLP? Does sigmoid have the power of separating non-linearly separable data? Can sigmoid solve the X-OR problem

1 O yu O = 1 / 1+ e -net yl net O = 1 if O > yu O = 0 if O < yl Typically yl << 0.5 , yu >> 0.5

W0 W2 W1 X0=-1 X2 X1 O = 1 / (1+ e –net ) Inequalities

<0, 0> O = 0 i.e 0 < yl 1 / 1 + e(–w1x1- w2x2+w0)< yl i.e. (1 / (1+ ewo))< yl (1)

<0, 1> O = 1 i.e. 0 > yu 1/(1+ e (–w1x1- w2x2 + w0))> yu (1 / (1+ e-w2+w0))> yu (2)

<1, 0> O = 1 i.e. (1/1+ e-w1+w0)> yu (3) <1, 1> O = 0 i.e. 1/(1+ e-w1-w2+w0)< yl (4)

1/(1+ ewo) < yl Rearranging, 1 gives i.e. 1+ ewo > 1 / yl i.e. Wo > ln ((1- yl) / yl) (5)

1/1+ e-w2+w0 > yu 2 Gives i.e. 1+ e-w2+w0 < 1 / yu i.e. e-w2+w0 < 1-yu / yu i.e. -W2 + Wo < ln (1-yu) / yu (6) i.e. W2 - Wo > ln (yu / (1–yu))

3 Gives (7) W1 - Wo > ln (yu / (1- yu)) 4 Gives -W1 – W2 + Wo > ln ((1- yl)/ yl) (8)

5 + 6 + 7 + 8 Gives 0 > 2ln (1- yl )/ yl + 2 ln yu / (1 – yu ) i.e. 0 > ln [ (1- yl )/ yl * yu / (1 – yu )] i.e. ((1- yl ) / yl) * (yu / (1 – yu )) < 1

[(1- yl ) / (1- yy )] * [yu / yl] < 1 • 2) Yu >> 0.5 • 3) Yl <<0.5 From i, ii and iii; Contradiction, hence sigmoid cannot compute X-OR

Can Linear Neurons Work? h2 h1 x2 x1

Note: The whole structure shown in earlier slide is reducible to a single neuron with given behavior Claim: A neuron with linear I-O behavior can’t compute X-OR. Proof: Considering all possible cases: [assuming 0.1 and 0.9 as the lower and upper thresholds] For (0,0), Zero class: For (0,1), One class:

For (1,0), One class: For (1,1), Zero class: These equations are inconsistent. Hence X-OR can’t be computed. Observations: • A linear neuron can’t compute X-OR. • A multilayer FFN with linear neurons is collapsible to a single linear neuron, hence no a additional power due to hidden layer. • Non-linearity is essential for power.

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Presentation Transcript

Artificial Life

Artificial Intelligence Chapter 4: Informed Search and Exploration

Intelligence Chapter 10

Chapter 6: Knowledge-based Decision Support and Artificial Intelligence

Explorations in Artificial Intelligence

Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, Univer

344-571 ปัญญาประดิษฐ์ ( Artificial Intelligence)

Intelligence and Achievement

Artificial Intelligence

CptS 440 / 540 Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Artificial Intelligence

Expert Systems

CSCE 580 Artificial Intelligence Ch.4: Informed (Heuristic) Search and Exploration

Intelligence and Security Informatics for International Security: Framework and Case Studies

人工智能 Artificial Intelligence

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints