130 likes | 142 Views
Discover the concept and methods of Instance-Based Learning (IBL) for classification tasks, including k-Nearest Neighbor method, distance-weighted techniques, and other IBL approaches like Radial Basis Functions. Learn about the advantages, disadvantages, and practical examples of IBL algorithms. Explore how IBL differs from traditional learning methods and the research opportunities in this field.
E N D
KDD Group Presentation Instance-Based Learning Wednesday, November 15, 2000 Cecil P. Schmidt Department of Computer Information Sciences, KSU http://www.cis.ksu.edu/~cps4444
Presentation Outline • What is Instance Based Learning? • k-Nearest Neighbor Learning • Distance-Weighted Nearest Neighbor Algorithm • Other IBL Methods • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy Versus Eager Learning • Research Opportunities in IBL • Summary • Bibliography
What is Instance Based Learning? • Description • Instance Base Learning (IBL) methods initially store the presented training data. • Upon encountering a new query, the set of similar related instances are retrieved and used for classification. • Differences between IBL methods and others • local versus global approximation of query • unique approximation to target function per distinct query instance • Advantages • complex global approximation becomes simpler local approximation • instance points can be complex, symbolic representations • Disadvantages • Nearly all computation takes place at classification time • Classification typically looks at all attributes of the query instance whereas they might not all be important causing misclassification.
k-Nearest Neighbor Learning • Most basic of IBL Methods • Assumes all instances correspond to points in the n-dimensional space n • Nearest neighbors of a query instance are defined in terms of standard Euclidean distance. • Distance d definition • let an arbitrary instance x be described by feature vector a1(x), a2(x),…an(x) where ar (x) denotes the value of the rth attribute of instance x. The distance between xi and xj is d(xi, xj), where d(xi, xj) sqrt(r = 1..n (ar(xi) - ar(xj))2 • Distance Calculation Example • find distance between vector 2,2,4 and vector 2,2,2 • sqrt((2-2)2 + (2-2)2 + (2-4)2) = 2
k-Nearest Neighbor Learning Algorithm: Discrete-Valued Target Function • Algorithm Consists of two parts which include a training part and a classification part. • Training Algorithm • For each training_example x, f(x), add the example to the list train_examples • Classification Algorithm • Given a query instance xq to be classified • Let x1 … xk denote the k instances from training_examples that are nearest to xq • Return f-hat(xq) argmaxv Vi=1..k (v,f(xi )) where (a,b) = 1 if a = b and (a,b) = 0 otherwise
k-Nearest Neighbor Learning Algorithm: Discrete-Valued Target Function Classification Example • Classification Example • Assume following training_examples • x1 = 1,2,7,8,+, x2 = 1,3,5,6,- • Classify x3 = 1,2,7,6 (Step 1) Compute distance from each point in training examples d(x1 ,x3 ) = 2 d(x2 ,x3 ) = 1 (Step 2) Classify x3 based on the nearest k points where k = 1 we classify x3 as - since it is nearest x2
K-Nearest Neighor: Real-Valued Target Function • Used to Approximate continuous-valued target function • Calculates mean value of k-nearest training examples rather than their most common value. • Replaces final line of discrete-valued target function with the following: • f-hat(xq) (i=1..k f(xi ))/k • Example • Given training_examples (x1,1),(x2,1),(x3,0) • We wish to classify x using 2-nearest neighbors • d(x1,x4) = 5, d(x2,x4) = 2, d(x3,x4) = 4 • therefore x4 is nearest x2 and x3 • taking the mean we get (1+0)/2 = 0.5
Distance-weighted Nearest Neighbor Algorithm • Weights contribution of each k neighbors according to their distance to the query point xq, giving greater weight to closer neighbors • Example • weight each neighbor according to inverse square of its distance from xq f-hat(xq) argmaxv Vi=1..k wi(v,f(xi )) where wi 1/d(xq ,xi )2 if xq exactly matches xi then wi = 1 • Modification of real-valued target function normalizes contributions of various weights f-hat(xq)i=1..kwi f(xi )/ i=1..kwi • Adding distance weighting, all training examples can be used to influence classification of xq • Classifier will run more slowly • Referred to as a global method when all training points are used • Referred to as a local method when k-nearest training examples are used
Other IBL Methods • Locally Weighted Regression • Constructs explicit approximation of f over a local region surrounding xq. • Uses nearby or distance-weighted training examples to form local approximation • local refers to the fact that only data near the query point is used • weighted refers to fact that contribution of each training example is weighted by distance from query point • regression term is used because it is used widely in statistical learning community for approximating real-valued function. • Radial Basis Functions • Closely related to distance-weighted regression and ANNs • Provide global approximation to the target function, represented by a linear combination of many local kernel functions (kernel function K is function of distance used to determine the weight of each training example) • Case-Based Reasoning • Uses rich symbolic descriptions to represent instances versus real-valued points in an n-dimensional Euclidean space. • Requires more elaborate retrieval of similar instances (nearest neighbors) • Applied to conceptual design problems - similar designs
Lazy Versus Eager Learing • Lazy Learing • Defers decision to generalize beyond training data until each new query instance is encountered • examples include k-nearest neighbor, locally-weighted regression, and case-based reasoning • Eager Learning • Generalizes beyond the training data before observing the new query • examples include Decision Tree Learning algorithms such as ID3 and ANNs • Differences • Computation Time • Lazy methods generally require less computation during training • Lazy methods require more computation time during classification • Classification Differences • Lazy methods may consider the query instance xq when deciding how to generalize beyond the training data D • Eager methods have already chosen their (global) approximation to the target function. • Lazy learner has option of representing target function by a combination of many local approximation, whereas eager methods must commit at training time.
Research Opportunities In IBL • Improved methods for indexing instances which may be rich relational descriptions (CBR) • Development of eager methods which employ multiple local approximations to achieve similar effects as lazy learning methods but reduce computation at classification time. (RBF learning attempts this) • Applications (Can you think of some others?) • Finding similar software patterns in currently implemented software or in analysis or design models. • Matching database query to most likely instances
Summary • Nearest neighbor algorithms are examples of IBL methods which delay much of the classification computation until classification time. • These algorithms use local approximation versus global approximation which can have the effect of having unique target functions for each query instance. • Ability to model complex target functions by a collection of less complex local approximations • Information present in training examples is never lost • Distance metric can become misleading as all attributes are considered • Employ lazy learning versus eager learning techniques
Bibliography • Primary material for this presentation came from: • Mitchell, T. (1997) Machine Learning, MIT Press and The McGraw-Hill Companies, Inc.Boston, Mass.