Pattern Recognition: Statistical and Neural

Nanjing University of Science & Technology Pattern Recognition:Statistical and Neural Lonnie C. Ludeman Lecture 14 Oct 14, 2005

Lecture 14 Topics 1. Review structures of Optimal Classifier 2. Define Linear functions, hyperplanes, boundaries, unit normals,various distances 3. Use of Linear Discriminant functions for defining classifiers- Examples

Motivation! Motivation! Motivation!

Optimum Decision Rules: 2-class Gaussian Case 1: K1 = K2 Quadratic Processing C1 if > T1 - (x – M1)TK1-1(x – M1) + (x – M2)TK2-1(x – M2) < C2 Case 2: K1 = K2 = K Linear Processing C1 > T2 if ( M1 – M2)T K-1 x < C2 Review 1

Optimum Decision Rules: 2-class Gaussian (cont) Case 3: K1 = K2 = K = s2 I Linear Processing C1 > T3 if ( M1 – M2)T x < C2 Review 2

M-Class General Gaussian MPE and MAP Case 1: K1 = K2 Select ClassCj if Qj(x) is MINIMUM Qi(x) = (x – Mj)TKj-1(x – Mj) } – 2 ln P(Cj) + ln | Ki | Case 2: K1 = K2 = … = KM = K Select ClassCj if Lj(x) is MAXIMUM Lj(x) = MjTK-1x – ½ MjTK-1Mj+lnP(Cj) Review 3

M-Class General Gaussian: Bayes exp(- ½ (x – Mk)TKk-1(x – Mk) ) 1 p(x|Ck) = ½ N/2 (2 ) Kk Ck :X ~ N( Mk, Kk ) , P(Ck) Bayes decision rule is determined form a set of yi(x) defined by where Review 4

½ N/2 Kj (2 ) M Cij exp(- ½ (x – Mj)TKj-1(x – Mj)) P(Cj) yi(x) = j=1 Taking the ln of the yi(x) for this case does not simplify to a linear or quadratic processor The structure of the optimum classifier uses a sum of exp( quadratic forms) and thus is a special form of nonlinear processing using quadratic forms. Review 5

Linear and Gaussian assumptions Quadratic processing Reasons for studying linear, quadratic and other special forms of non linear processing If Gaussian we can find or learn a usable decision rule and the rule is optimum If non-Gaussian case we can find or learn a usable decision rule; however the rule is NOT necessarily optimum

Linear functions One Variable f(x1) = w1x1+ w2 Two Variables f(x1, x2 ) = w1x1+ w2x2 + w3 Three Variables f(x1, x2 , x3) = w1x1+ w2x2 + w2x2 + w3

w1x1+ w2 = 0 Constant w1x1+ w2x2 + w3 = 0 Line w1x1+ w2x2 + w3x3 + w4 = 0 Plane ? w1x1+ w2x2 + w3x3 + w4x4 + w5 = 0 Answer = Hyperplane

Hyperplanes n-dimensional Hyperplane w1x1+ w2x2 + … + wnxn + wn+1 = 0 Define x= [ x1,x2 ,… , xn ]T w0=[ w1, w2 ,… , wn]T An alternative representation of a Hyperplane is w0 x + wn+1 = 0 T

Hyperplanes as boundaries for Regions Positive side of Hyperplane boundary R+ = {x: } Negative side of Hyperplane boundary } R- = {x: Hyperplane boundary w0 x + wn+1 = 0

Definitions (1) Unit Normal u

(2) Distance from a point y to the hyperplane (3) Distance from the origin to the hyperplane

(4) Linear Discriminate Functions where Augmented Pattern Vector Weight vector

Linear Decision Rule: 2-Class Case using single linear discriminant function given: d(x)=w1x1+ w2x2 + … + wnxn + wn+1 for a vector xif No claim of optimality !!!

Linear Decision Rule: 2-Class Case using two linear discriminant function given two discriminant functions define decision rule by except on boundaries d1(x) = 0 and d2(x) = 0 where we decide randomly between C1 and C2

Decision regions (2-class case) using two linear discriminant functions and AND logic

Decision regions (2-class case) using two linear discriminant functions(continued)

Decision regions (2-class case) alternative formulation using two linear discriminant functions

Decision regions (2-class case) using alternative form of two linear discriminant functions equivalent to

Decision regions (3-class case) using two linear discriminant functions

Decision regions (4-class case) using two linear discriminant functions

Decision region R1 (M-class case) using K linear discriminant functions

Example: Piecewise linear boundaries Given the following discriminant functions

Define the following decision rule Example Continued If d1(x) > 0 AND d2(x) > 0 OR d3(x) > 0 AND d4(x) > 0 AND d5(x) > 0 AND d6(x) > 0 then decide x comes from class C1, on the boundaries decide randomly, otherwise decide C2 Show the decision regions in the two dimensional pattern space

Solution:

Lecture 14 Summary 1. Reviewed structures of Optimal Classifier 2. Defined Linear functions, hyperplanes, boundaries, unit normals,various distances 3. Used Linear Discriminant functions for defining classifiers- Examples

End of Lecture 14

Pattern Recognition: Statistical and Neural