1.26k likes | 2.54k Views
Chapter Eight Instance Based Learning. Machine Learning Tom M. Mitchell. Outline. Introduction K-Nearest Neighbor Locally Weighted Regression Radial Basis Functions Case-Based Reasoning Lazy and Eager Learning. What Is Instance Based Learning.
E N D
Chapter Eight Instance Based Learning Machine Learning Tom M. Mitchell
Outline • Introduction • K-Nearest Neighbor • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy and Eager Learning
What Is Instance Based Learning • Compare with previous learning algorithm • Decision tree, Neural network… • Key characteristic • Simply store training examples and delay processing until a new instance must be classified. • Lazy learning • Advantage and disadvantage
Instance Based Learning • Introduction • K-Nearest Neighbor • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy and Eager Learning
K-Nearest Neighbor • Instance representation x=<a1(x), a2(x),…, an(x)> • Euclidean distance • Target function may be either discrete-valued or real-valued
KNN Algorithm for Discrete-Valued Target Function • Training algorithm: • Store each training example <x,f(x)> • Classification algorithm: • Given xq to be classified • Find k nearest neighbor x1…xk of xq
An Example different results of 1-nearest neighbor and 5-nearest neighbor algorithm
Voronoi Diagram • Decision surface induced by 1-Nearest neighbor over the entire instance space. • Convex polygon indicates the region of instance spaceclosest to that point.
KNN Algorithm for Real-Valued Target Function • For just replace the previous formula with
Distance-Weighted KNN • Might want nearer neighbors with more heavy weight: • For discrete-valued target functions: • For real-valued target functions: • Shepard method and d(xq,xi) is the distance between xq and xiNote now it makes sense to use all training examples instead of just k
Remarks on KNN Algorithm • Inductive bias • Similar classification of nearby instances… • Curse of dimensionality • Similarity metric mislead by irrelevant attributes • Solutions: • Weight each attribute differently: • Use cross-validation to automatically choose weights • Set weightj to 0 to eliminate the most irrelevant attributes: • Leave-one-out (Moore&Lee, 1994) • Stretch each axis by a variable value. • Efficient memory indexing • kd-tree
Kd-tree (k-dimensional tree) • A binary search tree • Partition the set of points into equal halves. • Partition first on d1, then d2,…,dk before cycling back to d1.
Instance Based Learning • Introduction • K-Nearest Neighbor • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy and Eager Learning
Some Terminology • Regression: • Approximating a real-valued target function. • Residual: • Kernel function: • wi=K(d(xi,xq))
Locally Weighted Regression • Form an explicit approximation for region surrounding xq • Fit linear function to k nearest neighbors • Fit quadratic… • General approach: • To construct that fits the training examples in the neighborhood surrounding xq • Calculate • Different local approximation for each query instance.
Locally Weighted Linear Regression • Using linear function to approximate f: • Recall chapter 4:
Locally Weighted Linear Regression(cont.) • Three possible error criteria: • Choose criterion 3 and get gradient descent training rule: • Other methods to directly solve for w0…wn • Atkeson et al.(1997), Bishop(1995)
Remarks on Locally Weighted Regression • In most cases, the target function is approximated by a constant, linear, or quadratic function. • More complex functional forms not often found • High cost • Simple approximations are quite well
Instance Based Learning • Introduction • K-Nearest Neighbor • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy and Eager Learning
Radial Basis Function • Function to be learned: One common choice for is: • Global approximation to target function, in terms of linear combination of local approximations. • “eager” instead of “lazy”.
Radial Basis Function Networks • ai(x) are attributesdescribing instance x. • The first layer computes variousKu(d(xu,x)). • Second layer computeslinear combination of first-layer unit values. • Hidden unit activation is close to 0 unless x is near xu
Training RBF Networks • Stage one: define hidden units by choosing k, xu and • Allocate Gaussian kernel function for each training example <xi, f(xi)>. • Choose set of kernel functions that is smaller than the number of training examples. • Scatter uniformly over instance space • Or nonuniformly • Stage two: train wu • Gradient descent by global error criterion
Instance Based Learning • Introduction • K-Nearest Neighbor • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy and Eager Learning
Case-Based Reasoning • Key properties of KNN and locally weighted regression: • Lazy learning • Analyzing similar instances • Points in Euclidean space • For CBR: • Using more rich symbolic descriptions • Need different “distance” metric • Application: mechanical device design, legal cases reasoning…
CADET System • What is CADET? • Employs CBR to design simple mechanical devices. • 75 stored examples of mechanical devices. • Training example:<qualitative function, mechanical structure> • New query: desired function • Target value: mechanical structure for this function
Case-Based Reasoning in CADET • Given function specification for new design, CADET search its library to find an exact match. • If found, return this case. • If not, find cases matching subgraphs.i.e., isomorphism subgraph searching,then piece them together. • Elaborate the original function graph to match more cases. • eg: rewrite as:
Correspondence between CADET and instance-based methods • Instance space X: space of all function graphs • Target function f: function graph -> structure • Training example <x,f(x)>: describe some function graph x and structure f(x)
Several properties of CBR • Instance represented by rich structural descriptions • CADET… • Multiple cases retrieval (and combined) to form solution to new problem • KNN… • Tight coupling between case retrieval, knowledge-based reasoning and problem solving.
Instance Based Learning • Introduction • K-Nearest Neighbor • Locally Weighted Regression • Radial Basis Functions • Case-Based Reasoning • Lazy and Eager Learning
Lazy and Eager Learning • Lazy: wait for query before generalizing • KNN, locally weighted regression, CBR • Eager: generalize before seeing query • RBF networks • Differences: • Computation time • Global and local approximations to the target function • Use same H, lazy can represent more complex functions. (e.g., consider H=linear functions)
Summary • Differences and advantages • KNN algorithm:the most basic instance-based method. • Locally weighted regression: generalization of KNN. • RBF networks:blend of instance-based method and neural network method. • Case-based reasoning