180 likes | 294 Views
4. Instance-Based Learning. 4.1 Introduction Instance-Based Learning: Local approximation to the target function that applies in the neighborhood of the query instance Cost of classifying new instances can be high: Nearly all computations take place at classification time
E N D
4. Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the target function that applies in the neighborhood of the query instance • Cost of classifying new instances can be high: Nearly all computations take place at classification time • Examples: k-Nearest Neighbors • Radial Basis Functions: Bridge between instance-based learning and artificial neural networks 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning K-Nearest Neighbors 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Most plausible hypothesis 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Now? Or maybe… 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Is the simplest hypothesis always the best one? 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.2k-Nearest Neighbor Learning Instance x = [ a1(x), a2(x),..., an(x) ] n d(xi,xj) = [ (xi-xj).(xi-xj) ]½ = Euclidean Distance • Discrete-Valued Target Functions ƒ : n V = {v1, v2,.., vs) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Prediction for a new query x: (k nearest neighbors of x) ƒ(x) = argmaxvV i=1,k[v,ƒ(xi)] [v,ƒ(xi)] = 1 if v =ƒ(xi) , [v,ƒ(xi)] = 0 otherwise • Continuous-Valued Target Functions ƒ(x) = (1/k) i=1,kƒ(xi) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning • Distance-Weighted k-NN ƒ(x) = argmaxvV i=1,kwi[v,ƒ(xi)] ƒ(x) = i=1,kwiƒ(xi)/ k i=1,kwi wi = [d(xi,x)]-2 Weights more heavily closest neighbors 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning • Remarks for k-NN • Robust to noise • Quite effective for large training sets • Inductive bias: The classification of an instance will be most similar to the classification of instances that are nearby in Euclidean distance • Especially sensitive to the curse of dimensionality • Elimination of irrelevant attributes by suitably chosen the metric: d(xi,xj) = [ (xi-xj).G.(xi-xj) ]½ 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.3 Locally Weighted Regression Builds an explicit approximation to ƒ(x) over a local region surrounding x (usually a linear or quadratic fit to training examples nearest to x) Locally Weighted Linear Regression: ƒL(x) = w0+ w1x1+ ...+ wnxn E(x) = i=1,k[ƒL(xi)-ƒ(xi)]2 (xi nn of x) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Generalization: ƒL(x) = w0+ w1x1+ ...+ wnxn E(x) = i=1,NK[d(xi,x)] [ƒL(xi)-ƒ(xi)]2 K[d(xi,x)] = kernel function Other possibility: ƒQ(x) = quadratic function of xj 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.4 Radial Basis Functions Approach closely related to distance-weighted regression and artificial neural network learning ƒRBF(x) = w0+ µ=1,kwµ K[d(xµ,x)] K[d(xµ,x)] = exp[-d2(xµ,x)/ 22µ] = Gaussian kernel function 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Training RBF Networks 1st Stage: • Determination of k (=number of basis functions) • xµ and µ (kernel parameters) Expectation-Maximization (EM) algorithm 2nd Stage: • Determination of weights wµ Linear Problem 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.6 Remarks on Lazy and Eager Learning Lazy Learning: stores data and postpones decisions until a new query is presented Eager Learning: generalizes beyond the training data before a new query is presented • Lazy methods may consider the query instance x when deciding how to generalize beyond the training data D (local approximation) • Eager methods cannot (they have already chosen their global approximation to the target function) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006