150 likes | 176 Views
Discriminant Functions. Alexandros Potamianos School of ECE, NTUA Fall 2013-2014. Discriminant Functions. Main Idea: Describe parametrically the decision boundary (instead of the properties of the class), e.g., the two classes are separated by a straight line
E N D
Discriminant Functions Alexandros Potamianos School of ECE, NTUA Fall 2013-2014
Discriminant Functions • Main Idea: Describe parametrically the decision boundary (instead of the properties of the class), e.g., the two classes are separated by a straight line a x1 + b x2 + c = 0, with parameters (a,b,c) (instead of the feature PDFs are 2-D Gaussians)
a x1 + b x2 + c = 0 x2 N(1,1) x2 N(2,2) 12 22 w1 w1 w2 w2 11 21 x1 x1 Example: Two classes, two features Model Class Boundary Model Class Characteristics
Duality • Dualism Parametric class description Bayes classifier Decision boundary Parametric Discriminant Functions • For example modeling class features by Gaussians with same (across-class) variance results in hyper-plane discriminant functions
Discriminant Functions • Discriminant functions gi(x) are functions of the features x of a class i • A sample x is classified to class c for which gi(x) is maximized, i.e., c = argmaxi{gi(x)} • The function gi(x) = gj(x) defines class boundaries for each pair of (different) classes i and j
Linear Discriminant Functions • Two class problem: A single discriminant function is defined as: g(x) = g1(x) – g2(x) • If g(x) is a linear function g(x) = wT x + w0 then the boundary is a hyper-plane (point, line, plane for 1-D, 2-D, 3-D features respectively)
Linear Discriminant Functions x2 -c/a a x1 + b x2 + c = 0 w = (a,b) x1 -c/b
Non Linear Discriminant Functions • Quadratic discriminant functions g(x) = w0 + i wi xi + ij wij xi xj for examples for a two class 2-D problem g(x) = a + b x1 + c x2 + d x12 • Any non-linear discriminant function can become linear by increasing the dimensionality, e.g., y1 = x1, y2 = x2, y3 = x12 (2D nonlinear 3D linear) g(y) = a + b y1 + c y2 + d y3
Parameter Estimation • The parameters w are estimated by functional minimization • The function to be minimized J models the average distance of training samples from the decision boundary for either • Misclassifier training samples • All training samples • The function J is minimized using gradient descent
Gradient Descent • Iterative procedure towards a local minimum a(k+1) = a(k) – n(k) J(a(k)) where k is the iteration number, n(k) is the learning rate and J(a(k)) is the gradient of the function to be minimized evaluated at a(k) • Newton descent is the gradient descent with learning rate equal to the inverse Hessian matrix
Distance Functions • Perceptron Criterion Function Jp (a) = misclassified ( - aT y) • Relaxation With Margin b Jr (a) = misclassified (aT y - b)2 / ||y|| 2 • Least Mean square (LMS) Js (a) = all samples (aT yi - bi)2 • Ho-Kashyap rule Js (a,b) = all samples (aT yi - bi)2
Discriminant Functions • Working on misclassified samples only (Perceptron, Relaxation with Margin) • provides better results • but converges only for separable training sets
High Dimensionality • Using non-linear discriminant functions and linearizing them in a high dimensional space • can make ANY training set separable • large # of parameters (curse of dimensionality) • Support vector machines: A smart way to select appropriate terms (dimensions) is needed