Basic Classification

Basic Classification Which is that?

The Classification Problem • On the basis of some examples, determine in which class a previously unobserved instance belongs • Can be analogous to learning • Supervised: a teacher prescribes class composition • Unsupervised: class memberships are formed autonomously

Common Classification Methods • Template Matching • Correlation • Bayesian Classifier • Neural Networks • Fuzzy Clustering • Support Vector Machines • Principle Component Analysis • Independent Component Analysis

Template Matching • Identify or create class templates • For a given entity x • Find the distances from x to each of the class templates • Associate x with the class whose template is minimally distant • Optionally, update the class template

Example 1 x2 m2 m1 x1

Example 1: Create Class Templates • Class exemplars are ordered pairs <x1, x2>, which may be written as vectors xT = <x1, x2> • The mean vectors mi are obtained by averaging the component values of the class exemplars for each class i

Example 1: Find Minimum Distance • Distance from a vector x to each class mean mi, Distancei(x) = ||x-mi|| = [(x-mi)T(x-mi)]½ • Note: [(x-mi)T(x-mi)] = xTx-xTmi-miTx+miTmi = ||x||2-2xTmi+||mi||2 = ||x||2 – 2 (xTmi – ½ ||mi||2) • xTx =||x||2 is fixed for all i • Thus, Distancei(x) is minimized when the quantity (xTmi – ½ ||mi||2) is maximized

Example 1: The Decision Boundary • The decision boundary d with respect to classes i and j, dij(x) = Distancei(x)-Distancej(x) = 0 → • (||x||2 – 2 (xTmi – ½ ||mi||2)) – (||x||2 – 2 (xTmj – ½ ||mj||2)) = 0 → • (xTmi – ½ ||mi||2) - (xTmj – ½ ||mj||2) = 0 → • dij(x) = xT (mi-mj) - ½ (mi-mj)T (mi+mj) = 0 • Note: This is not the same as Eq. 12.2-6

Example 2: Details

Example 2: Decision Boundary • dij(x) = xT (mi-mj) - ½ (mi-mj)T (mi+mj) = 0 • (m1-m2)T = <1.5, 1.8>-<4.5, 5.1> = <-3, -3.3> • (m1+m2)T = <1.5, 1.8>+<4.5, 5.1> = <6, 6.9> • -½(m1-m2)T (m1+m2)= 20.385 • d12(x) = <x1, x2><-3, -3.3>T + 20.385 = 0 = -3x1 + -3.3x2 + 20.385 = 0 = 3x1 + 3.3x2 - 20.385 = 0

Correlation • Commonly used to locate similar patterns in 1- or 2-dimensional domain • Identify pattern x to which to correlate • For x • Find the correlation of x to samples • Associate x with the samples whose correlation to x are largest • Report location of highly correlated samples

Example 3: Finding Eyes x

Computational Matters • Normalized correlation is typically computed using Pearson’s r

Notation and Interpretation • The number n of pairs of values x and y for which the degree of correlation is to be determined • |r| ≤ 1 • r = 0, if x and y are uncorrelated • r > 0, if y increases (decreases) as x increases (decreases), i.e., x and y are positively correlated (to some degree) • r < 0, if y decreases (increases) as x increases (decreases), i.e., x and y are negatively correlated (to some degree) • To assess the relative strengths of two values r1 and r2, compare their squares. • If r1= 0.2 and r2=0.4, r2 indicates 4 times as strong a correlation.

Example 4: 5x5 Grid Patterns r = 0.343 r = 0.514 r = -1.0

Bayesian Classifier • Optimal for Gaussian data

Fuzzy Classifiers • Jang, Sun, and Mizutani, Neuro-Fuzzy and Soft Computing • Fuzzy C-Means (FCM)

Neural Networks • Feedforward networks and the backpropagation training algorithm • Adaptive resonance theory • Kohonen netowrks

Basic Classification