1 / 14

Discriminant Functions

Discriminant Functions. Alexandros Potamianos School of ECE, NTUA Fall 2013-2014. Discriminant Functions. Main Idea: Describe parametrically the decision boundary (instead of the properties of the class), e.g., the two classes are separated by a straight line

ccornish
Download Presentation

Discriminant Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminant Functions Alexandros Potamianos School of ECE, NTUA Fall 2013-2014

  2. Discriminant Functions • Main Idea: Describe parametrically the decision boundary (instead of the properties of the class), e.g., the two classes are separated by a straight line a x1 + b x2 + c = 0, with parameters (a,b,c) (instead of the feature PDFs are 2-D Gaussians)

  3. a x1 + b x2 + c = 0 x2 N(1,1) x2 N(2,2) 12 22 w1 w1 w2 w2 11 21 x1 x1 Example: Two classes, two features Model Class Boundary Model Class Characteristics

  4. Duality • Dualism Parametric class description Bayes classifier  Decision boundary  Parametric Discriminant Functions • For example modeling class features by Gaussians with same (across-class) variance results in hyper-plane discriminant functions

  5. Discriminant Functions • Discriminant functions gi(x) are functions of the features x of a class i • A sample x is classified to class c for which gi(x) is maximized, i.e., c = argmaxi{gi(x)} • The function gi(x) = gj(x) defines class boundaries for each pair of (different) classes i and j

  6. Linear Discriminant Functions • Two class problem: A single discriminant function is defined as: g(x) = g1(x) – g2(x) • If g(x) is a linear function g(x) = wT x + w0 then the boundary is a hyper-plane (point, line, plane for 1-D, 2-D, 3-D features respectively)

  7. Linear Discriminant Functions x2 -c/a a x1 + b x2 + c = 0 w = (a,b) x1 -c/b

  8. Non Linear Discriminant Functions • Quadratic discriminant functions g(x) = w0 + i wi xi + ij wij xi xj for examples for a two class 2-D problem g(x) = a + b x1 + c x2 + d x12 • Any non-linear discriminant function can become linear by increasing the dimensionality, e.g., y1 = x1, y2 = x2, y3 = x12 (2D nonlinear  3D linear) g(y) = a + b y1 + c y2 + d y3

  9. Parameter Estimation • The parameters w are estimated by functional minimization • The function to be minimized J models the average distance of training samples from the decision boundary for either • Misclassifier training samples • All training samples • The function J is minimized using gradient descent

  10. Gradient Descent • Iterative procedure towards a local minimum a(k+1) = a(k) – n(k) J(a(k)) where k is the iteration number, n(k) is the learning rate and J(a(k)) is the gradient of the function to be minimized evaluated at a(k) • Newton descent is the gradient descent with learning rate equal to the inverse Hessian matrix

  11. Distance Functions • Perceptron Criterion Function Jp (a) = misclassified ( - aT y) • Relaxation With Margin b Jr (a) = misclassified (aT y - b)2 / ||y|| 2 • Least Mean square (LMS) Js (a) = all samples (aT yi - bi)2 • Ho-Kashyap rule Js (a,b) = all samples (aT yi - bi)2

  12. Discriminant Functions • Working on misclassified samples only (Perceptron, Relaxation with Margin) • provides better results • but converges only for separable training sets

  13. High Dimensionality • Using non-linear discriminant functions and linearizing them in a high dimensional space • can make ANY training set separable • large # of parameters (curse of dimensionality) • Support vector machines: A smart way to select appropriate terms (dimensions) is needed

More Related