1 / 36

CS623: Introduction to Computing with Neural Nets (lecture-2)

Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay. CS623: Introduction to Computing with Neural Nets (lecture-2). Recapituation. The human brain. Insula: Intuition & empathy. Pre-frontal cortex: Self control. Amygdala: Aggresion. Hippocampus:

ghazi
Download Presentation

CS623: Introduction to Computing with Neural Nets (lecture-2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay CS623: Introduction to Computing with Neural Nets(lecture-2)

  2. Recapituation

  3. The human brain Insula: Intuition & empathy Pre-frontal cortex: Self control Amygdala: Aggresion Hippocampus: Emotional memory Anterior Cingulate Cortex: Anxiety Hypothalamus: Control of Hormones Pituitary gland: Mother instincts

  4. Functional Map of the Brain

  5. The Perceptron Model A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron. Output = y Threshold = θ w1 wn Wn-1 x1 Xn-1

  6. y 1 θ Σwixi Step function / Threshold function y = 1 for Σwixi >=θ =0 otherwise

  7. θ, ≤ θ, < w1 w2 wn w1 w2 wn . . . w3 . . . x1 x2 x3 xn x1 x2 x3 xn Perceptron Training Algorithm (PTA) Preprocessing: • The computation law is modified to y = 1 if ∑wixi > θ y = o if ∑wixi < θ  w3

  8. θ θ w1 w2 w3 wn wn w0=θ w1 w2 w3 . . . . . . x2 x3 xn x0= -1 x1 x2 x3 xn PTA – preprocessing cont… 2. Absorb θ as a weight  3. Negate all the zero-class examples x1

  9. Example to demonstrate preprocessing • OR perceptron 1-class <1,1> , <1,0> , <0,1> 0-class <0,0> Augmented x vectors:- 1-class <-1,1,1> , <-1,1,0> , <-1,0,1> 0-class <-1,0,0> Negate 0-class:- <1,0,0>

  10. Example to demonstrate preprocessing cont.. Now the vectors are x0 x1 x2 X1 -1 0 1 X2 -1 1 0 X3 -1 1 1 X4 1 0 0

  11. Perceptron Training Algorithm • Start with a random value of w ex: <0,0,0…> • Test for wxi > 0 If the test succeeds for i=1,2,…n then return w 3. Modify w, wnext = wprev + xfail

  12. Tracing PTA on OR-example w=<0,0,0> wx1 fails w=<-1,0,1> wx4 fails w=<0,0 ,1> wx2 fails w=<-1,1,1> wx1 fails w=<0,1,2> wx4 fails w=<1,1,2> wx2 fails w=<0,2,2> wx4 fails w=<1,2,2> success

  13. Theorems on PTA • The process will terminate • The order of selection of xi for testing and wnext does not matter.

  14. End of recap

  15. Proof of Convergence of PTA • Perceptron Training Algorithm (PTA) • Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

  16. Proof of Convergence of PTA • Suppose wn is the weight vector at the nth step of the algorithm. • At the beginning, the weight vector is w0 • Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as wi+1 = wi + Xj • Since Xjs form a linearly separable function,  w* s.t. w*Xj > 0 j

  17. Proof of Convergence of PTA • Consider the expression G(wn) = wn . w* | wn| where wn = weight at nth iteration • G(wn) = |wn| . |w*| . cos  |wn| where  = angle between wn and w* • G(wn) = |w*| . cos  • G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1)

  18. Behavior of Numerator of G wn . w* = (wn-1 + Xn-1fail ) . w* • wn-1 . w* + Xn-1fail . w* • (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* ….. • w0 . w*+ ( X0fail + X1fail +.... + Xn-1fail ). w* w*.Xifail is always positive: note carefully • Suppose |Xj| ≥  , where  is the minimum magnitude. • Num of G ≥ |w0 . w*| + n  . |w*| • So, numerator of G grows with n.

  19. Behavior of Denominator of G • |wn| =  wn . wn •  (wn-1 + Xn-1fail )2 •  (wn-1)2+ 2. wn-1. Xn-1fail + (Xn-1fail )2 •  (wn-1)2+ (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 ) •  (w0)2+ (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2 • |Xj| ≤  (max magnitude) • So, Denom ≤ (w0)2+ n2

  20. Some Observations • Numerator of G grows as n • Denominator of G grows as  n => Numerator grows faster than denominator • If PTA does not terminate, G(wn) values will become unbounded.

  21. Some Observations contd. • But, as |G(wn)| ≤ |w*| which is finite, this is impossible! • Hence, PTA has to converge. • Proof is due to Marvin Minsky.

  22. Convergence of PTA • Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

  23. θ w1 w2 w3 wn . . . x2 x3 xn Study of Linear Separability • W. Xj = 0 defines a hyperplane in the (n+1) dimension. => W vector and Xj vectors are perpendicular to each other. y x1

  24. Linear Separability X1+ + X2 + Xk+1 - Xk+ Positive set : w. Xj > 0 j≤k Xk+2 - - Xm - Negative set : w. Xj < 0 j>k Separating hyperplane

  25. Test for Linear Separability (LS) • Theorem: A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC) • PLC – Both a necessary and sufficient condition. • X1, X2, … , Xm - Vectors of the function • Y1, Y2, … , Ym - Augmented negated set • Prepending -1 to the 0-class vector Xi and negating it, gives Yi

  26. X1 <0,0> + Y1 <-1,0,0> X2 <0,1> - Y2 <1,0,-1> X3 <1,0> - Y3 <1,-1,0> X4 <1,1> + Y4 <-1,1,1> Example (1) - XNOR • The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m • where each Pi is a non-negative scalar and • atleast one Pi > 0 • Example : 2 bit even-parity (X-NOR function)

  27. Example (1) - XNOR • P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T + P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T = [ 0 0 0 ]T • All Pi = 1 gives the result. • For Parity function, PLC exists => Not linearly separable.

  28. Xi Yi <0 0 0> - <1 0 0 0> <0 0 1> - <1 0 0 -1> <0 1 0> - <1 0 -1 0> <0 1 1> + <-1 0 1 1> <1 0 0> - <1 -1 0 0> <1 0 1> + <-1 1 0 1> <1 1 0> + <-1 1 1 0> <1 1 1> + <-1 1 1 1> Example (2) – Majority function • 3-bit majority function • Suppose PLC exists. Equations obtained are: • P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0 • -P5+ P6 + P7+ P8 = 0 • -P3+ P4 + P7+ P8 = 0 • -P2+ P4 + P6+ P8 = 0 • On solving, all Pi will be forced to 0 • 3 bit majority function • => No PLC => LS

  29. Limitations of perceptron • Non-linear separability is all pervading • Single perceptron does not have enough computing power • Eg: XOR cannot be computed by perceptron

  30. Solutions • Tolerate error (Ex: pocket algorithm used by connectionist expert systems). • Try to get the best possible hyperplane using only perceptrons • Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola • Use layered network

  31. x1 x2 x1x2 0 0 0 0 1 1 1 0 0 1 1 0 Example - XOR • = 0.5 w1=1 w2=1  Calculation of XOR x1x2 x1x2 Calculation of x1x2 • = 1 w1=-1 w2=1.5 x2 x1

  32. Example - XOR • = 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x2 x1

  33. Multi Layer Perceptron (MLP) • Question:- How to find weights for the hidden layers when no target output is available? • Credit assignment problem – to be solved by “Gradient Descent”

  34. Assignment • Find by solving linear inequalities the perceptron to solve the majority Boolean Function. • Then feed it to your implementation of the perceptron training algorithm and study its behaviour. • Download SNNS and WEKA packages and try your hand at the perceptron algorithm built in the packages.

More Related