CS26110 AI Toolbox

CS26110AI Toolbox k-NN classification

Classification problem • Decide which class a particular pattern belongs to on the basis of a set of measurements • Typically, the data tells us something about the patterns for each class • We need to construct a classifier from the training data that will tell us about a previously unseen pattern • Sometimes requires us to calculate distances in pattern space – Euclidean distance, etc

Classification problem • Input features : • Word frequency • {(campaigning, 1), (democrats, 2), (basketball, 0), …} • Class label: • ‘Politics’, or ‘Sport’ Politics Sport Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, … Topic:

Training/test data f(x2) x2 x6(Headache) class attribute = decision/diagnosis = target function x7 f(x7)=?

Applications... • Disease diagnosis • x: Properties of patient (symptoms, lab tests) • f: Disease (or maybe: recommended therapy) • Part-of-Speech tagging • x: An English sentence (e.g. The can will rust) • f: The part of speech of a word in the sentence • Face recognition • x: Bitmap picture of person’s face • f: Name/identify the person

Instance-based learning • One way of solving tasks of approximating discrete or real valued target functions • Have training data points: (xn, f(xn)), n=1...N • Key idea: • Just store the training examples • When a test example is given then find the closest match or matches

Nearest neighbour algorithm • 1-NN: Given a test instance xm, • First locate the nearest training example xn • Then f(xm):= f(xn) • k-NN: Given a test instance xm, • First locate the k nearest training examples • If target function = discrete then take vote among its k nearest neighbourselse take the mean of the f values of the k nearest neighbours (prediction)

k-NN • Choosing the value of k: • If k is too small, sensitive to noisy points • If k is too large, neighborhood may include points from other classes • Choose an odd value for k, to eliminate ties • k = 1: • Belongs to square class • k = 3: • Belongs to triangle class ? • k = 7: • Belongs to square class

When to consider k-NN? • Not more then ~20 dimensions (features) per instance • Lots of training data • Advantages: • Training is very fast • Can learn complex target functions • Don’t lose information • Can handle outliers if k is sufficiently high • Disadvantages: • ? (will see them shortly…)

Geometric interpretation of 1-NN 1 2 Feature 2 1 2 2 1 Feature 1

Regions for 1-NN Each data point defines a “cell” of space that is closest to it. All points within that cell are assigned that class 1 2 Feature 2 1 2 2 1 Feature 1

1-NN decision boundary Overall decision boundary = union of cell boundaries where class decision is different on each side 1 2 Feature 2 1 2 2 1 Feature 1

1-NN decision boundary 1 2 Feature 2 1 2 2 1 Feature 1

k-NN for speech synthesis • http://www.youtube.com/watch?v=PB4qATziTlQ

one two three five four six? Mondrian paintings 15

Perform 3-NN • d(xm, xi) = √(S (xm(a) – xi(a))2 )

Data 1&6: √( 4 + 0 + 1 + 0) = √5 (No) 2&6: √( 0 + 1 + 9 + 1) = √11 (No) 3&6: √( 1 + 1 + 16 + 0) = √18 (Yes) 4&6: √( 1 + 0 + 9 + 0) = √10 (Yes) 5&6: √( 1 + 0 + 1 + 1) = √3 (No) 5&6: √3 (No) 1&6: √5 (No) 4&6: √10 (Yes) 2&6: √11 (No) 3&6: √18 (Yes) 1-NN: No 2-NN: No 3-NN: No

Real-valued classes • In the Mondrian case we were trying to predict a yes/no answer for each new test instance • There are many problems where we want a real number as the answer • For real-valued decision, the k-NN algorithm calculates the mean of the k nearest neighbours

Real-valued class example • Given a set of CPU configurations, predict the performance: • (The performance could be in some arbitrary units)

k-NN algorithm • Theoretical considerations • As k increases • we are averaging over more neighbours • the effective decision boundary is more “smooth” • As n increases, the optimal k value tends to increase in proportion to log n • E.g. 100 data points, k=2

A problem... What is the most likely class? ?

Distance-weighted k-NN • We might want to weight nearer neighbours more heavily • Then it makes sense to use all training examples instead of just k

Difficulties with k-NN algorithms • Have to calculate the distance of the test case from all training cases • There may be irrelevant attributes amongst the attributes – so may have to employ attribute selection beforehand

What to take away • What classification involves • Be able to apply k-NN to data and interpret the results

Resources • k-NN applet • http://www.theparticle.com/applets/ml/nearest_neighbor/ • More information on the algorithm • http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/

CS26110 AI Toolbox