140 likes | 272 Views
Chapter 2 Decision Functions. Contents 2.1 Basic concepts 2.2 Linear decision functions 2.3 Generalized decision functions 2.4 Geometric discussions 2.5 Orthogonal functions. 2.1 Basic concepts. A simple example Two classes C 1 and C 2 Two-dimensional feature vector X = (x1, x2) T
E N D
Chapter 2 Decision Functions • Contents 2.1 Basic concepts 2.2 Linear decision functions 2.3 Generalized decision functions 2.4 Geometric discussions 2.5 Orthogonal functions .
2.1 Basic concepts • A simple example • Two classes C1 and C2 • Two-dimensional feature vector • X = (x1, x2)T • Figure 2.1.1 • Clearly separable by a straight line • d(x) = w1x1 + w2x2 + w3 = 0 • Decision rule • d(x) > 0 x C1 • d(x) < 0 x C2 • d(x) called the linear decision function
2.1 Basic concepts • n-dimensional Euclidean vector space (Rn) • Decision function represented by a hyperplane • n-dimensional feature vector • X = (x1, x2, …, xn)T • hyperplane • d(x) = w1x1 + w2x2 + … + wnxn + wn+1 = 0 • Decision rule • d(x) > 0 x C1 • d(x) < 0 x C2 • vector notation • X = (x1, x2, …, xn, 1)T • d(x) = wTx
2.1 Basic concepts • Nonlinear decision functions • Figure 2.1.2 • circumference • d(x) = 1 - x12 - x22 = 0 • Decision rule (Note it is the same as the previous ones.) • d(x) > 0 x C1 • d(x) < 0 x C2
2.1 Basic concepts • More than two classes • m pattern classes {C1, C2, …, Cm} in Rn • Definition 2.1.1 • If a surface d(x), xRn, separate Ci and the remaining Cj, ji • i.e, • d(x) > 0 x Ci • d(x) < 0 x Cj, ji • d(x) called a decision function of Ci • Example 2.1.2 • Figure 2.1.4
2.2 Linear decision functions • Two cases • Absolute separation • Pairwise separation • Absolute separation • If each class Ci has a linear decision function di(x) for 1im • i.e. • d(x) = wiTx > 0, x Ci • d(x) = wiTx < 0, otherwise • Then absolute separation exists between C1~Cm (absolutely separable) • Example 2.2.1 • Figure 2.2.1
2.2 Linear decision functions • Absolute separation (Continued) • How do we classify an incoming pattern x ? • Classify x into C1 if • d1(x) > 0 • d2(x) < 0 • d3(x) < 0 • Definition 2.2.1 (decision region) • Di = {x| di(x) > 0; dj(x) < 0, ji}, 1im • Example 2.2.2 • Figure 2.2.2 • A case of no absolute separation • Figure 2.2.3
2.2 Linear decision functions • Pairwise separation • Each pair of classes separable by linear function • Pair of Ci and Cj separable by dij if • dij(x) > 0 for all x Ci • dij(x) < 0 for all x Cj • Consequently, for all x Ci • dij(x) > 0 for all ji • Decision rule • classify x into Ci if • dij(x) > 0 for all ji • Example 2.2.4 • Figure 2.2.4
2.2 Linear decision functions • Pairwise separation (Continued) • Definition 2.2.2 (decision region) • Di = {x| dij(x) > 0, ji}, 1im • Example 2.2.5 • Figure 2.2.5 • Union of decision regions • not the whole space • rejection region
2.3 Generalized Decision Functions • Generalized decision functions • high complexity of boundaries nonlinear surfaces needed • d(x) = w1f1(x) + w2f2(x) + … + wnfn(x) + wn+1 • fi(x), 1in : scalar functions of the pattern x, x Rn • vector notation • d(x) = i=1,n+1wifi(x) = wTx* where x*= (f1(x), f2(x), …,fn(x), fn+1(x))T and wT = (w1, w2, …, wn, wn+1) • polynomial classifier is popularly used • fi(x) are polynomials • eg) f1(x) = x1, f2(x) = x12, f3(x)=x1x2, …..
2.3 Generalized Decision Functions • Quadratic decision functions • 2-nd order polynomial classifier • eg) 2-D patterns (n=2), x=(x1,x2) • d(x) = w1x12 + w2x1x2 + w3x22 + w4x1 + w5x2 + w6 • for patterns x Rn • d(x) = i=1,nwiixi2 + i=1,n-1 j=i+1,nwijxixj + i=1,nwixi + wn+1 • number of terms =(n+1)(n+2)/2 • eg) n=2 6 terms, n=3 10 terms, .., n=10 65 terms, …
2.3 Generalized Decision Functions • Quadratic decision functions (Continued) • in case of order m • fi(x)=xi1e1 xi2e2 ….ximem • Theorem 2.3.1 • dm(x) = i1=1,n j2=i1,n …. im=im-1,n wi1i2…imxi1xj2….xjm + dm-1(x) where d0(x) = wn+1 • proof by mathematical induction • Example 2.3.1 • Example 2.3.2 • number of terms = (n+m)!/(n!m!) • matrix notation • d(x) = xTAx + xTb + c
2.4 Geometric Discussion • Importance of geometric interpretation of decision function’s properties • hyperplanes • dichotomies • Hyperplanes • linear decision functions • in 2-D, straight line • in 3-D, plane • in n-D where n>3, hyperplane • Figure 2.4.1 • hyperplane H • unit normal vector n • point on hyperplane, P, Q • vector associated with P and Q, y, x • normal vector n • n = w0/|w0| equation 2.4.7 • distance between an arbitrary point R from H • Dz = | (w0T/|w0|)(z-y)| = | (w0Tz + wn+1) / |w0|| equation 2.4.11
(1,2) 5/4 5/3 2.4 Geometric Discussion • Hyperplanes (Continued) • Example 2.4.1 • 3x1 + 4x2 – 5 = 0 in R2 • |w0| = 5 • n = (3/5, 4/5)T • D(1,2) = 1.2 • Example 2.4.2 • 2x1 - x2 + 2x3 - 7 = 0 in R3 • excluding the patterns whose distance from hyperplane is less than 0.01 • by |(2y1-y2+2y3-7) /|w0|| = |(2y1-y2+2y3-7) / 3| <0.01