220 likes | 319 Views
CS621/CS449 Artificial Intelligence Lecture Notes. Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004. Outline. Proof of Convergence of PTA Some Observations Study of Linear Separability Test for Linear Separability Asummability. Proof of Convergence of PTA.
E N D
CS621/CS449Artificial IntelligenceLecture Notes Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004 CS-621/CS-449 Lecture Notes
Outline • Proof of Convergence of PTA • Some Observations • Study of Linear Separability • Test for Linear Separability • Asummability CS-621/CS-449 Lecture Notes
Proof of Convergence of PTA • Perceptron Training Algorithm (PTA) • Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function. CS-621/CS-449 Lecture Notes
Proof of Convergence of PTA • Suppose wn is the weight vector at the nth step of the algorithm. • At the beginning, the weight vector is w0 • Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as wi+1 = wi + Xj • Since Xjs form a linearly separable function, w* s.t. w*Xj > 0 j CS-621/CS-449 Lecture Notes
Proof of Convergence of PTA • Consider the expression G(wn) = wn . w* | wn| where wn = weight at nth iteration • G(wn) = |wn| . |w*| . cos |wn| where = angle b/w wn and w* • G(wn) = |w*| . cos • G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1) CS-621/CS-449 Lecture Notes
Behavior of Numerator of G wn . w* = (wn-1 + Xn-1fail ) . w* • wn-1 . w* + Xn-1fail . w* • (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* ….. • w0 . w*+ ( X0fail + X1fail +.... + Xn-1fail ). w* • Suppose |Xj| ≥ , where is the minimum magnitude. • Num of G ≥ |w0 . w*| + n . |w*| • So, numerator of G grows with n. CS-621/CS-449 Lecture Notes
Behavior of Denominator of G • |wn| = wn . wn • (wn-1 + Xn-1fail )2 • (wn-1)2+ 2. wn-1. Xn-1fail + (Xn-1fail )2 • (wn-1)2+ (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 ) • (w0)2+ (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2 • |Xj| ≤ (max magnitude) • So, Denom ≤ (w0)2+ n2 CS-621/CS-449 Lecture Notes
Some Observations • Numerator of G grows as n • Denominator of G grows as n => Numerator grows faster than denominator • If PTA does not terminate, G(wn) values will become unbounded. • But, as |G(wn)| ≤ |w*| which is finite, this is impossible! • Hence, PTA has to converge. • Proof is due to Marvin Minsky. CS-621/CS-449 Lecture Notes
θ w1 w2 w3 wn . . . x2 x3 xn Study of Linear Separability • W. Xj = 0 defines a hyperplane in the (n+1) dimension. => W vector and Xj vectors are perpendicular to each other. y x1 CS-621/CS-449 Lecture Notes
Linear Separability X1+ + X2 + Xk+1 - Xk+ Positive set : w. Xj > 0 j≤k Xk+2 - - Xm - Negative set : w. Xj < 0 j>k Separating hyperplane CS-621/CS-449 Lecture Notes
Linear Separability • w. Xj = 0 => w is normal to the hyperplane which separates the +ve points from the –ve points. • In this computing paradigm, computation means “placing hyperplanes”. • Functions computable by the perceptron are called • “threshold functions” because of comparing ∑ wiXi with θ (the threshold) • “linearly separable” because of setting up linear surfaces to separate +ve and –ve points CS-621/CS-449 Lecture Notes
Linear Separability • Decision problems may have +ve and –ve points that need to be separated • The hyperplane found should be generalized through learning • PAC learning, SVMs etc. for learning CS-621/CS-449 Lecture Notes
Test for Linear Separability (LS) • Theorem: A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC) • PLC – Both a necessary and sufficient condition. • X1, X2, … , Xm - Vectors of the function • Y1, Y2, … , Ym - Augmented negated set • Prepending -1 to the 0-class vector Xi and negating it, gives Yi CS-621/CS-449 Lecture Notes
Example (1) - XNOR • The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m • where each Pi is a non-negative scalar and • atleast one Pi > 0 • Example : 2 bit even-parity (X-NOR function) CS-621/CS-449 Lecture Notes
Example (1) - XNOR • P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T + P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T = [ 0 0 0 ]T • All Pi = 1 gives the result. • For Parity function, PLC exists => Not linearly separable. CS-621/CS-449 Lecture Notes
Example (2) – Majority function • 3-bit majority function • Suppose PLC exists. Equations obtained are: • P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0 • -P5+ P6 + P7+ P8 = 0 • -P3+ P4 + P7+ P8 = 0 • -P2+ P4 + P6+ P8 = 0 • On solving, all Pi will be forced to 0 • 3 bit majority function => No PLC => LS CS-621/CS-449 Lecture Notes
Example (3) – n bit comparator z • n-bit comparator • z = 1 if dec(y) > dec(x) = 0 otherwise • Perceptron realization possible? θ W2n-1 wn wn w0 . . . . . y2n-1 yn Xn-1 x0 • w vector is such that wi = - 2i i , 0 ≤ i ≤ n-1 = 2j-n j , n ≤ j ≤ 2n-1 • and θ = 0 CS-621/CS-449 Lecture Notes
Example (3) – n bit comparator • 2 bit comparator (y0, y1, x0, x1, z) • +ve class : 0100, 1000 and so on • -ve class : 0000, 0001and so on • After augmentation and negation, 10 non-negative nos. Pi are to be found such that not all of them are zero. • Suppose PLC exists, Σ Pi Yi = 0such that i Pi ≥ 0, i=1…15and ~ (i , Pi = 0) • Solving the equations, we eventually end up with all Pis = 0. CS-621/CS-449 Lecture Notes
Observations • If the vectors are from a linearly separable function, then during the course of training, never can there be a repetition of weights. • Weights strictly increase and hence do not repeat • Asummability CS-621/CS-449 Lecture Notes
Asummability • Another form of PLC test • Helps in quick testing of LS • Definition of k-summability: Given the +ve and -ve classes, a function is called k-summable, if we can find k vectors from the +ve class and k vectors from the –ve class such that where pis and njs are >0. CS-621/CS-449 Lecture Notes
Asummability • A function is called asummable if it is not k-summable for any k. • Asummability is equivalent to LS. • Example : XOR +ve class -ve class 01 10 11 00 11 00 11 01 10 11 2-summable => not LS CS-621/CS-449 Lecture Notes
Assumability – Examples • AND – is not k-summable => asummability =>LS +ve class -ve class 11 00 01 10 • A single vector in one class and all others in another => linearly separable • Another example – Majority function CS-621/CS-449 Lecture Notes