580 likes | 1.1k Views
Kernel Methods. Arie Nakhmani. Outline. Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers. Kernel Smoothers – The Goal. Estimating a function by using noisy observations, when the parametric model for this function is unknown
E N D
Kernel Methods Arie Nakhmani
Outline • Kernel Smoothers • Kernel Density Estimators • Kernel Density Classifiers
Kernel Smoothers – The Goal • Estimating a function by using noisy observations, when the parametric model for this function is unknown • The resulting function should be smooth • The level of “smoothness” should be set by a single parameter
Example N=100 sample points What is it: “smooth enough” ?
Example N=100 sample points
Exponential Smoother Smaller a smoother line, but more delayed
Exponential Smoother • Simple • Sequential • Single parameter a • Single value memory • Too rough • Delayed
Moving Average Smoother m=11 Larger m smoother, but straightened line
Moving Average Smoother • Sequential • Single parameter: the window size m • Memory for m values • Irregularly smooth • What if we have p-dimensional problem with p>1 ???
Nearest Neighbors Smoother m=16 Larger m smoother, but biased line x0
Nearest Neighbors Smoother • Not sequential • Single parameter: the number of neighbors m • Trivially extended to any number of dimensions • Memory for m values • Depends on metrics definition • Not smooth enough • Biased end-points
Low Pass Filter 2nd order Butterworth: Why do we need kernel smoothers ???
Low Pass Filter The same filter…for log function
Low Pass Filter • Smooth • Simply extended to any number of dimensions • Effectively, 3 parameters: type, order, and bandwidth • Biased end-points • Inappropriate for some functions (depends on bandwidth)
Kernel Average Smoother • Nadaraya-Watson kernel-weighted average: with the kernel: • for Nearest Neighbor Smoother • for Locally Weighted Average t
Popular Kernels • Epanechnikov kernel: • Tri-cube kernel: • Gaussian Kernel:
Non-Symmetric Kernel • Kernel example: Which kernel is that ???
Kernel Average Smoother • Single parameter: window width • Smooth • Trivially extended to any number of dimensions • Memory-based method – little or no training is required • Depends on metrics definition • Biased end-points
Local Linear Regression • Kernel-weighted average minimizes: • Local linear regression minimizes:
Local Linear Regression • Solution: where: • Other representation: equivalent kernel
Local Polynomial Regression • Why stop at local linear fits? • Let’s minimize:
Conclusions • Local linear fits can help bias dramatically at the boundaries at a modest cost in variance. Local linear fits more reliable for extrapolation. • Local quadratic fits do little at the boundaries for bias, but increase the variance a lot. • Local quadratic fits tend to be most helpful in reducing bias due to curvature in the interior of the domain. • λ controls the tradeoff between bias and variance. • Larger λ makes lower variance but higher bias
Local Regression in • Radial kernel:
Popular Kernels • Epanechnikov kernel • Tri-cube kernel • Gaussian kernel
Higher Dimensions • The boundary estimation is problematic • Many sample points are needed to reduce the bias • Local regression is less useful for p>3 • It’s impossible to maintain localness (low bias) and sizeable samples (low variance) at the same time
Structured Kernels • Non-radial kernel: • Coordinates or directions can be downgraded or omitted by imposing restrictions on A. • Covariance can be used to adapt a metric A. (related to Mahalanobis distance) • Projection-pursuit model
Structured Regression • Divide into a set (X1,X2,…,Xq) with q<p and the remainder of the variables collect in vector Z. • Conditionally linear model: • For given Z fit a model by locally weighted least squares:
Density Estimation Mixture of two normal distributions constant window estimation original distribution sample set
Kernel Density Estimation Smooth Parzen estimate:
Comparison Mixture of two normal distributions Usually Bandwidth selection is more important than kernel function selection
Kernel Density Estimation • Gaussian kernel density estimation: where denote the Gaussian density with mean zero and standard deviation . • Generalization to : LPF
Kernel Density Classification • For a J class problem:
Radial Basis Functions • Function f(x) is represented as expansion in basis functions: • Radial basis functions expansion (RBF): • where the sum-of-squares is minimized with respect to all the parameters (for Gaussian kernel):
Radial Basis Functions • When assuming constant lj=l : the problem of “holes” • The solution - Renormalized RBF:
Additional Applications • Local likelihood • Mixture models for density estimation and classification • Mean-shift
Conclusions • Memory-based methods: the model is the entire training data set • Infeasible for many real-time applications • Provides good smoothing result for arbitrary sampled function • Appropriate for interpolation and extrapolation • When the model is known, better use another fitting methods