250 likes | 271 Views
Instance Based Learning. Nearest Neighbor. Remember all your data When someone asks a question Find the nearest old data point Return the answer associated with it In order to say what point is nearest, we have to define what we mean by "near".
E N D
Nearest Neighbor • Remember all your data • When someone asks a question • Find the nearest old data point • Return the answer associated with it • In order to say what point is nearest, we have to define what we mean by "near". • Typically, we use Euclidean distance between two points. Nominal attributes: distance is set to 1 if values are different, 0 if they are equal
Predicting Bankruptcy • Now, let's say we have a new person with R equal to 0.3 and L equal to 2. • What y value should we predict? And so our answer would be "no".
Scaling • The naïve Euclidean distance isn't always appropriate. • Consider the case where we have two features describing a car. • f1 = weight in pounds • f2 = number of cylinders. • Any effect of f2will be completely lost because of the relative scales. • So, rescale the inputs to put all of the features on about equal footing:
Time and Space • Learning is fast • We just have to remember the training data. • Space is n. • What takes longer is answering a query. • If we do it naively, we have to, for each point in our training set (and there are n of them) compute the distance to the query point (which takes about m computations, since there are m features to compare). • So, overall, this takes about m * n time.
Noise Someone with an apparently healthy financial record goes bankrupt.
Remedy: K-Nearest Neighbors • k-nearest neighbor algorithm: • Just like the old algorithm, except that when we get a query, we'll search for the k closest points to the query points. • Output what the majority says. • In this case, we've chosen k to be 3. • The three closest points consist of two "no"s and a "yes", so our answer would be "no". Find the optimal k using cross-validation
Other Variants • IB2: save memory, speed up classification • Work incrementally • Only incorporate misclassified instances • Problem: noisy data gets incorporated • IB3: deal with noise • Discard instances that don’t perform well • Keep a record of the number of correct and incorrect classification decisions that each exemplar makes. • Two predetermined thresholds are set on success ratio. • If the performance of exemplar falls below the low threshold it is deleted. • If the performance exceeds the upper threshold it is used for prediction.
Instance-based learning: IB2 • IB2: save memory, speed up classification • Work incrementally • Only incorporate misclassified instances • Problem: noisy data gets incorporated Data: “Who buys gold jewelry” (25,60,no) (45,60,no) (50,75,no) (50,100,no) (50,120,no) (70,110,yes) (85,140,yes) (30,260,yes) (25,400,yes) (45,350,yes) (50,275,yes) (60,260,yes)
Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes) This is the final answer. I.e. we memorize only these 5 points. However, let’s compute gradually the classifier.
Instance-based learning: IB2 • Data: • (25,60,no)
Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) Since so far the model has only the first instance memorized, this second instance gets wrongly classified. So, we memorize it as well.
Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) So far the model has the two first instances memorized. The third instance gets properly classified, since it happens to be closer with the first. So, we don’t memorize it.
Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) So far the model has the two first instances memorized. The fourth instance gets properly classified, since it happens to be closer with the second. So, we don’t memorize it.
Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) So far the model has the two first instances memorized. The fifth instance gets properly classified, since it happens to be closer with the first. So, we don’t memorize it.
Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) So far the model has the two first instances memorized. The sixth instance gets wrongly classified, since it happens to be closer with the second. So, we memorize it.
Instance-based learning: IB2 • Continuing in a similar way, we finally get, the figure in the right. • The colored points are the one that get memorized. This is the final answer. I.e. we memorize only these 5 points.
Instance-based learning: IB3 • IB3: deal with noise • Discard instances that don’t perform well • Keep a record of the number of correct and incorrect classification decisions that each exemplar makes. • Two predetermined thresholds are set on success ratio. • An instance is used for training: • If the number of incorrect classifications is the first threshold, and • If the number of correct classifications the second threshold.
Instance-based learning: IB3 • Suppose the lower threshold is 0, and upper threshold is 1. • Shuffle the data first • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes)
Instance-based learning: IB3 • Suppose the lower threshold is 0, and upper threshold is 1. • Shuffle the data first • (25,60,no) [1,1] • (85,140,yes) [1,1] • (45,60,no) [0,1] • (30,260,yes) [0,2] • (50,75,no) [0,1] • (50,120,no) [0,1] • (70,110,yes) [0,0] • (25,400,yes) [0,1] • (50,100,no) [0,0] • (45,350,yes) [0,0] • (50,275,yes) [0,1] • (60,260,yes) [0,0]
Instance-based learning: IB3 • The points that will be used in classification are: • (45,60,no) [0,1] • (30,260,yes) [0,2] • (50,75,no) [0,1] • (50,120,no) [0,1] • (25,400,yes) [0,1] • (50,275,yes) [0,1]
Rectangular generalizations • When a new exemplar is classified correctly, it is generalized by simply merging it with the nearest exemplar. • The nearest exemplar may be either a single instance or a hyper-rectangle.
Rectangular generalizations • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes)
Classification • If the new instance lies within a rectangle then output the rectangle class • If the new instance lies in the overlap of several rectangles, then output the class of the rectangle whose center is the closest to the new data instance. • If the new instance lies outside any of the rectangles, output the class of the rectangle, which is the closest to the data instance. • The distance of a point from a rectangle is: • If an instance lies within rectangle, d=0 • If outside, d = distance from the closest rectangle part, i.e. distance from some point in the rectangle boundary. Class 1 Class 2 Separation line