830 likes | 2.76k Views
Locally Weighted Learning. Machine Learning Dr. Barbara Hammer. Locally Weighted Learning. Instance-based Learning (“Lazy Learning”) Local Models k-Nearest Neighbor Weighted Average Locally weighted regression Case-based reasoning. When to consider Nearest Neighbor Algorithm?.
E N D
Locally Weighted Learning Machine Learning Dr. Barbara Hammer
Locally Weighted Learning • Instance-based Learning (“Lazy Learning”) • Local Models • k-Nearest Neighbor • Weighted Average • Locally weighted regression • Case-based reasoning
When to consider Nearest Neighbor Algorithm? • Instances map to points in Rn • Less then 20 attributes per instance • Lots of training data • Advantages: • Training is very fast • Learning complex target functions • Don’t lose information • Disadvantages: • Slow at query • Easily fooled by irrelevant attributes
k-Nearest Neighbor Algorithm (Classification) • Let an arbitrary instances x be described: x={a1(x), a2(x), ..., an(x)} • The distance between two instances and is defined:
k-Nearest Neighbor Algorithm • Training Algorithm: • Store all training examples <x, f(x)> • Classification Algorithm: • Given a query instance xq to be classified, • Let x1, … xk denote the k instances from the list of training examples • Return (for discrete-valued target function) • where δ(a,b)=1 if a=b and whereδ(a,b)=0otherwise
k-Nearest Neighbor Examples (discrete-valued target function) • k=1 • k=5
k-Nearest Neighbor Examples (real-valued target function)
Distance-Weighted Nearest Neighbor Algorithm • Idea • Might want to weight nearer neighbors more heavily • Rationale: Instances closer to xq tend to have target function closer to f(xq)
Distance-Weighted Nearest Neighbor Algorithm Distance-weighted function where • weights are proportional to distance; • d(xq, xi) is Euclidean distance. special case xq=xi, then f^(xq):= f(xi) NOTE:Now it makes sense to use all training data instead of just k
Distance-Weighted Nearest NeighborAlgorithm for real-valued target function: where Weighting (kernel) function: K(d) -Gaussian kernel
Locally Weighted Linear Regression • Idea: • k-NN forms local approximation for each query point xq • Why not form an explicit approximation f^(x) for region surrounding xq • Fit linear function to k nearest neighbors • Fit quadratic, ... • Thus producing ``piecewise approximation'' to f • Minimize error over k nearest neighbors ofxq • Minimize error entire set of examples, weighting by distances • Combine two above
Locally Weighted Linear Regression • Local linear function: f^(x)=β0+ β1a1(x)+…+ βnan(x) • Error criterions: Combine E1(xq) and E2(xq)
Locally Weighted Linear Regression How it works • For each point (xk, yk) compute wk • Let WX = Diag(w1,w2,…,wn)X • Let WY = Diag(w1,w2,…,wn)Y • β = (WXTWX-1)(WXTWY)
Locally-weighted regression (f2) Locally-weighted regression (f4) Locally-weighted regression (f3) LWR Example f1 (simple regression) Training data Predicted value using simple regression Predicted value using locally weighted (piece-wise) regression Yike Guo, Advanced Knowledge Management, 2000
References • Mitchell, Machine Learning, McGraw-Hill, 1997 • Duda,Hart,Storck, Pattern Classification, John Wiley, 2001 • Christopher G. Atkeson, Andrew W. Moore, Stefan Schaal, Locally Weighted Learning, 1996