Instance-Based Learning (IBL) involves storing training data and using similar instances for classification. Explore k-Nearest Neighbor, Distance-Weighted Nearest Neighbor, Lazy vs Eager Learning, and other IBL methods. Understand advantages, disadvantages, and applications of IBL in this informative presentation.

Presentation Transcript

  1. KDD Group Presentation Instance-Based Learning Wednesday, November 15, 2000 Cecil P. Schmidt Department of Computer Information Sciences, KSU http://www.cis.ksu.edu/~cps4444

  2. Presentation Outline • What is Instance Based Learning? • k-Nearest Neighbor Learning • Distance-Weighted Nearest Neighbor Algorithm • Other IBL Methods • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy Versus Eager Learning • Research Opportunities in IBL • Summary • Bibliography

  3. What is Instance Based Learning? • Description • Instance Base Learning (IBL) methods initially store the presented training data. • Upon encountering a new query, the set of similar related instances are retrieved and used for classification. • Differences between IBL methods and others • local versus global approximation of query • unique approximation to target function per distinct query instance • Advantages • complex global approximation becomes simpler local approximation • instance points can be complex, symbolic representations • Disadvantages • Nearly all computation takes place at classification time • Classification typically looks at all attributes of the query instance whereas they might not all be important causing misclassification.

  4. k-Nearest Neighbor Learning • Most basic of IBL Methods • Assumes all instances correspond to points in the n-dimensional space n • Nearest neighbors of a query instance are defined in terms of standard Euclidean distance. • Distance d definition • let an arbitrary instance x be described by feature vector a1(x), a2(x),…an(x) where ar (x) denotes the value of the rth attribute of instance x. The distance between xi and xj is d(xi, xj), where d(xi, xj)  sqrt(r = 1..n (ar(xi) - ar(xj))2 • Distance Calculation Example • find distance between vector 2,2,4 and vector 2,2,2 • sqrt((2-2)2 + (2-2)2 + (2-4)2) = 2

  5. k-Nearest Neighbor Learning Algorithm: Discrete-Valued Target Function • Algorithm Consists of two parts which include a training part and a classification part. • Training Algorithm • For each training_example x, f(x), add the example to the list train_examples • Classification Algorithm • Given a query instance xq to be classified • Let x1 … xk denote the k instances from training_examples that are nearest to xq • Return f-hat(xq) argmaxv Vi=1..k (v,f(xi )) where (a,b) = 1 if a = b and (a,b) = 0 otherwise

  6. k-Nearest Neighbor Learning Algorithm: Discrete-Valued Target Function Classification Example • Classification Example • Assume following training_examples • x1 = 1,2,7,8,+, x2 = 1,3,5,6,- • Classify x3 = 1,2,7,6 (Step 1) Compute distance from each point in training examples d(x1 ,x3 ) = 2 d(x2 ,x3 ) = 1 (Step 2) Classify x3 based on the nearest k points where k = 1 we classify x3 as - since it is nearest x2

  7. K-Nearest Neighor: Real-Valued Target Function • Used to Approximate continuous-valued target function • Calculates mean value of k-nearest training examples rather than their most common value. • Replaces final line of discrete-valued target function with the following: • f-hat(xq) (i=1..k f(xi ))/k • Example • Given training_examples (x1,1),(x2,1),(x3,0) • We wish to classify x using 2-nearest neighbors • d(x1,x4) = 5, d(x2,x4) = 2, d(x3,x4) = 4 • therefore x4 is nearest x2 and x3 • taking the mean we get (1+0)/2 = 0.5

  8. Distance-weighted Nearest Neighbor Algorithm • Weights contribution of each k neighbors according to their distance to the query point xq, giving greater weight to closer neighbors • Example • weight each neighbor according to inverse square of its distance from xq f-hat(xq) argmaxv Vi=1..k wi(v,f(xi )) where wi  1/d(xq ,xi )2 if xq exactly matches xi then wi = 1 • Modification of real-valued target function normalizes contributions of various weights f-hat(xq)i=1..kwi f(xi )/ i=1..kwi • Adding distance weighting, all training examples can be used to influence classification of xq • Classifier will run more slowly • Referred to as a global method when all training points are used • Referred to as a local method when k-nearest training examples are used

  9. Other IBL Methods • Locally Weighted Regression • Constructs explicit approximation of f over a local region surrounding xq. • Uses nearby or distance-weighted training examples to form local approximation • local refers to the fact that only data near the query point is used • weighted refers to fact that contribution of each training example is weighted by distance from query point • regression term is used because it is used widely in statistical learning community for approximating real-valued function. • Radial Basis Functions • Closely related to distance-weighted regression and ANNs • Provide global approximation to the target function, represented by a linear combination of many local kernel functions (kernel function K is function of distance used to determine the weight of each training example) • Case-Based Reasoning • Uses rich symbolic descriptions to represent instances versus real-valued points in an n-dimensional Euclidean space. • Requires more elaborate retrieval of similar instances (nearest neighbors) • Applied to conceptual design problems - similar designs

  10. Lazy Versus Eager Learing • Lazy Learing • Defers decision to generalize beyond training data until each new query instance is encountered • examples include k-nearest neighbor, locally-weighted regression, and case-based reasoning • Eager Learning • Generalizes beyond the training data before observing the new query • examples include Decision Tree Learning algorithms such as ID3 and ANNs • Differences • Computation Time • Lazy methods generally require less computation during training • Lazy methods require more computation time during classification • Classification Differences • Lazy methods may consider the query instance xq when deciding how to generalize beyond the training data D • Eager methods have already chosen their (global) approximation to the target function. • Lazy learner has option of representing target function by a combination of many local approximation, whereas eager methods must commit at training time.

  11. Research Opportunities In IBL • Improved methods for indexing instances which may be rich relational descriptions (CBR) • Development of eager methods which employ multiple local approximations to achieve similar effects as lazy learning methods but reduce computation at classification time. (RBF learning attempts this) • Applications (Can you think of some others?) • Finding similar software patterns in currently implemented software or in analysis or design models. • Matching database query to most likely instances

  12. Summary • Nearest neighbor algorithms are examples of IBL methods which delay much of the classification computation until classification time. • These algorithms use local approximation versus global approximation which can have the effect of having unique target functions for each query instance. • Ability to model complex target functions by a collection of less complex local approximations • Information present in training examples is never lost • Distance metric can become misleading as all attributes are considered • Employ lazy learning versus eager learning techniques

  13. Bibliography • Primary material for this presentation came from: • Mitchell, T. (1997) Machine Learning, MIT Press and The McGraw-Hill Companies, Inc.Boston, Mass.

