Locally Weighted Learning

Locally Weighted Learning Machine Learning Dr. Barbara Hammer

Locally Weighted Learning • Instance-based Learning (“Lazy Learning”) • Local Models • k-Nearest Neighbor • Weighted Average • Locally weighted regression • Case-based reasoning

When to consider Nearest Neighbor Algorithm? • Instances map to points in Rn • Less then 20 attributes per instance • Lots of training data • Advantages: • Training is very fast • Learning complex target functions • Don’t lose information • Disadvantages: • Slow at query • Easily fooled by irrelevant attributes

k-Nearest Neighbor Algorithm (Classification) • Let an arbitrary instances x be described: x={a1(x), a2(x), ..., an(x)} • The distance between two instances and is defined:

k-Nearest Neighbor Algorithm • Training Algorithm: • Store all training examples <x, f(x)> • Classification Algorithm: • Given a query instance xq to be classified, • Let x1, … xk denote the k instances from the list of training examples • Return (for discrete-valued target function) • where δ(a,b)=1 if a=b and whereδ(a,b)=0otherwise

k-Nearest Neighbor Examples (discrete-valued target function) • k=1 • k=5

k-Nearest Neighbor Examples (real-valued target function)

Distance-Weighted Nearest Neighbor Algorithm • Idea • Might want to weight nearer neighbors more heavily • Rationale: Instances closer to xq tend to have target function closer to f(xq)

Distance-Weighted Nearest Neighbor Algorithm Distance-weighted function where • weights are proportional to distance; • d(xq, xi) is Euclidean distance. special case xq=xi, then f^(xq):= f(xi) NOTE:Now it makes sense to use all training data instead of just k

Distance-Weighted Nearest NeighborAlgorithm for real-valued target function: where Weighting (kernel) function: K(d) -Gaussian kernel

Distance-Weighted Nearest NeighborExamples

Locally Weighted Linear Regression • Idea: • k-NN forms local approximation for each query point xq • Why not form an explicit approximation f^(x) for region surrounding xq • Fit linear function to k nearest neighbors • Fit quadratic, ... • Thus producing ``piecewise approximation'' to f • Minimize error over k nearest neighbors ofxq • Minimize error entire set of examples, weighting by distances • Combine two above

Locally Weighted Linear Regression • Local linear function: f^(x)=β0+ β1a1(x)+…+ βnan(x) • Error criterions: Combine E1(xq) and E2(xq)

Locally Weighted Linear Regression How it works • For each point (xk, yk) compute wk • Let WX = Diag(w1,w2,…,wn)X • Let WY = Diag(w1,w2,…,wn)Y • β = (WXTWX-1)(WXTWY)

Locally-weighted regression (f2) Locally-weighted regression (f4) Locally-weighted regression (f3) LWR Example f1 (simple regression) Training data Predicted value using simple regression Predicted value using locally weighted (piece-wise) regression Yike Guo, Advanced Knowledge Management, 2000

References • Mitchell, Machine Learning, McGraw-Hill, 1997 • Duda,Hart,Storck, Pattern Classification, John Wiley, 2001 • Christopher G. Atkeson, Andrew W. Moore, Stefan Schaal, Locally Weighted Learning, 1996

Locally Weighted Learning

Locally Weighted Learning

Presentation Transcript

Communicating locally

Learning Larger Margin Machine Locally and Globally

Contending Locally

Weighted Clustering

Weighted Graphs

Manifold learning: Locally Linear Embedding

Improving Supervised Classification using Confidence Weighted Learning

Importance Weighted Active Learning

SHOP LOCALLY

Weighted Matching

Weighted Averages

Manifold learning: Locally Linear Embedding

Math Locally

Weighted Guidelines

Weighted Averages

Weighted Ball

Weighted Graphs

Manifold learning: Locally Linear Embedding

Eat Locally

WEIGHTED GRAPHS

Weighted Voting