CS621/CS449 Artificial Intelligence Lecture Notes

CS621/CS449Artificial IntelligenceLecture Notes Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004 CS-621/CS-449 Lecture Notes

Outline • Proof of Convergence of PTA • Some Observations • Study of Linear Separability • Test for Linear Separability • Asummability CS-621/CS-449 Lecture Notes

Proof of Convergence of PTA • Perceptron Training Algorithm (PTA) • Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function. CS-621/CS-449 Lecture Notes

Proof of Convergence of PTA • Suppose wn is the weight vector at the nth step of the algorithm. • At the beginning, the weight vector is w0 • Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as wi+1 = wi + Xj • Since Xjs form a linearly separable function,  w* s.t. w*Xj > 0 j CS-621/CS-449 Lecture Notes

Proof of Convergence of PTA • Consider the expression G(wn) = wn . w* | wn| where wn = weight at nth iteration • G(wn) = |wn| . |w*| . cos  |wn| where  = angle b/w wn and w* • G(wn) = |w*| . cos  • G(wn) ≤ |w*| ( as -1 ≤ cos  ≤ 1) CS-621/CS-449 Lecture Notes

Behavior of Numerator of G wn . w* = (wn-1 + Xn-1fail ) . w* • wn-1 . w* + Xn-1fail . w* • (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* ….. • w0 . w*+ ( X0fail + X1fail +.... + Xn-1fail ). w* • Suppose |Xj| ≥  , where  is the minimum magnitude. • Num of G ≥ |w0 . w*| + n  . |w*| • So, numerator of G grows with n. CS-621/CS-449 Lecture Notes

Behavior of Denominator of G • |wn| =  wn . wn •  (wn-1 + Xn-1fail )2 •  (wn-1)2+ 2. wn-1. Xn-1fail + (Xn-1fail )2 •  (wn-1)2+ (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 ) •  (w0)2+ (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2 • |Xj| ≤  (max magnitude) • So, Denom ≤ (w0)2+ n2 CS-621/CS-449 Lecture Notes

Some Observations • Numerator of G grows as n • Denominator of G grows as  n => Numerator grows faster than denominator • If PTA does not terminate, G(wn) values will become unbounded. • But, as |G(wn)| ≤ |w*| which is finite, this is impossible! • Hence, PTA has to converge. • Proof is due to Marvin Minsky. CS-621/CS-449 Lecture Notes

θ w1 w2 w3 wn . . . x2 x3 xn Study of Linear Separability • W. Xj = 0 defines a hyperplane in the (n+1) dimension. => W vector and Xj vectors are perpendicular to each other. y x1 CS-621/CS-449 Lecture Notes

Linear Separability X1+ + X2 + Xk+1 - Xk+ Positive set : w. Xj > 0 j≤k Xk+2 - - Xm - Negative set : w. Xj < 0 j>k Separating hyperplane CS-621/CS-449 Lecture Notes

Linear Separability • w. Xj = 0 => w is normal to the hyperplane which separates the +ve points from the –ve points. • In this computing paradigm, computation means “placing hyperplanes”. • Functions computable by the perceptron are called • “threshold functions” because of comparing ∑ wiXi with θ (the threshold) • “linearly separable” because of setting up linear surfaces to separate +ve and –ve points CS-621/CS-449 Lecture Notes

Linear Separability • Decision problems may have +ve and –ve points that need to be separated • The hyperplane found should be generalized through learning • PAC learning, SVMs etc. for learning CS-621/CS-449 Lecture Notes

Test for Linear Separability (LS) • Theorem: A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC) • PLC – Both a necessary and sufficient condition. • X1, X2, … , Xm - Vectors of the function • Y1, Y2, … , Ym - Augmented negated set • Prepending -1 to the 0-class vector Xi and negating it, gives Yi CS-621/CS-449 Lecture Notes

Example (1) - XNOR • The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m • where each Pi is a non-negative scalar and • atleast one Pi > 0 • Example : 2 bit even-parity (X-NOR function) CS-621/CS-449 Lecture Notes

Example (1) - XNOR • P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T + P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T = [ 0 0 0 ]T • All Pi = 1 gives the result. • For Parity function, PLC exists => Not linearly separable. CS-621/CS-449 Lecture Notes

Example (2) – Majority function • 3-bit majority function • Suppose PLC exists. Equations obtained are: • P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0 • -P5+ P6 + P7+ P8 = 0 • -P3+ P4 + P7+ P8 = 0 • -P2+ P4 + P6+ P8 = 0 • On solving, all Pi will be forced to 0 • 3 bit majority function => No PLC => LS CS-621/CS-449 Lecture Notes

Example (3) – n bit comparator z • n-bit comparator • z = 1 if dec(y) > dec(x) = 0 otherwise • Perceptron realization possible? θ W2n-1 wn wn w0 . . . . . y2n-1 yn Xn-1 x0 • w vector is such that wi = - 2i i , 0 ≤ i ≤ n-1 = 2j-n j , n ≤ j ≤ 2n-1 • and θ = 0 CS-621/CS-449 Lecture Notes

Example (3) – n bit comparator • 2 bit comparator (y0, y1, x0, x1, z) • +ve class : 0100, 1000 and so on • -ve class : 0000, 0001and so on • After augmentation and negation, 10 non-negative nos. Pi are to be found such that not all of them are zero. • Suppose PLC exists, Σ Pi Yi = 0such that i Pi ≥ 0, i=1…15and ~ (i , Pi = 0) • Solving the equations, we eventually end up with all Pis = 0. CS-621/CS-449 Lecture Notes

Observations • If the vectors are from a linearly separable function, then during the course of training, never can there be a repetition of weights. • Weights strictly increase and hence do not repeat • Asummability CS-621/CS-449 Lecture Notes

Asummability • Another form of PLC test • Helps in quick testing of LS • Definition of k-summability: Given the +ve and -ve classes, a function is called k-summable, if we can find k vectors from the +ve class and k vectors from the –ve class such that where pis and njs are >0. CS-621/CS-449 Lecture Notes

Asummability • A function is called asummable if it is not k-summable for any k. • Asummability is equivalent to LS. • Example : XOR +ve class -ve class 01 10 11 00 11 00 11 01 10 11 2-summable => not LS CS-621/CS-449 Lecture Notes

Assumability – Examples • AND – is not k-summable => asummability =>LS +ve class -ve class 11 00 01 10 • A single vector in one class and all others in another => linearly separable • Another example – Majority function CS-621/CS-449 Lecture Notes

CS621/CS449 Artificial Intelligence Lecture Notes

CS621/CS449 Artificial Intelligence Lecture Notes

Presentation Transcript

Artificial Intelligence

CS 564 Brain Theory and Artificial Intelligence

DSS : Decision Support Systems and AI : Artificial Intelligence

Itti: CS564 - Brain Theory and Artificial Intelligence University of Southern California

Planning under Uncertainty with Markov Decision Processes: Lecture I

Lecture Notes on Image Filtering and Template Matching

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints

CptS 440 / 540 Artificial Intelligence

CTC6, Educational Session 603 Artificial Intelligence in the Legal Environment A Legal Reasoning System on the Contract

CS 541: Artificial Intelligence

CS 541: Artificial Intelligence

CS 541: Artificial Intelligence

March 21, 2012

CSCE 580 Artificial Intelligence Ch.4: Informed (Heuristic) Search and Exploration

Hsinchun Chen, Ph.D. Director, Artificial Intelligence Lab

Introduction to Artificial Intelligence: Applications in Computational Biology

MCA/ MSc CS 302 ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEMS

Explorations in Artificial Intelligence