250 likes | 479 Views
Kernel methods - overview. Kernel smoothers Local regression Kernel density estimation Radial basis functions. Introduction. Kernel methods are regression techniques used to estimate a response function from noisy data Properties:
E N D
Kernel methods- overview • Kernel smoothers • Local regression • Kernel density estimation • Radial basis functions Data Mining and Statistical Learning - 2008
Introduction Kernel methods are regression techniques used to estimate a response function from noisy data Properties: • Different models are fitted at each query point, and only those observations close to that point are used to fit the model • The resulting function is smooth • The models require only a minimum of training Data Mining and Statistical Learning - 2008
A simple one-dimensional kernel smoother where Data Mining and Statistical Learning - 2008
Kernel methods, splines and ordinary least squares regression (OLS) • OLS: A single model is fitted to all data • Splines: Different models are fitted to different subintervals (cuboids) of the input domain • Kernel methods: Different models are fitted at each query point Data Mining and Statistical Learning - 2008
Kernel-weighted averages and moving averages The Nadaraya-Watson kernel-weighted average where indicates the window size and the function D shows how the weights change with distance within this window The estimated function is smooth! K-nearest neighbours The estimated function is piecewise constant! Data Mining and Statistical Learning - 2008
Epanechnikov kernel Tri-cube kernel Examples of one-dimesional kernel smoothers Data Mining and Statistical Learning - 2008
Issues in kernel smoothing • The smoothing parameter λ has to be defined • When there are ties at xi : Compute an average y value and introduce weights representing the number of points • Boundary issues • Varying density of observations: • bias is constant • the variance is inversely proportional to the density Data Mining and Statistical Learning - 2008
Boundary effects of one-dimensionalkernel smoothers Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression Data Mining and Statistical Learning - 2008
Local linear regression Find the intercept and slope parameters solving The solution is a linear combination of yi: Data Mining and Statistical Learning - 2008
Kernel smoothing vs local linear regression Kernel smoothing Solve the minimization problem Local linear regression Solve the minimization problem Data Mining and Statistical Learning - 2008
Properties of local linear regression • Automatically modifies the kernel weights to correct for bias • Bias depends only on the terms of order higher than one in the expansion of f. Data Mining and Statistical Learning - 2008
Local polynomial regression • Fitting polynomials instead of straight lines Behavior of estimated response function: Data Mining and Statistical Learning - 2008
Polynomial vs local linear regression Advantages: • Reduces the ”Trimming of hills and filling of valleys” Disadvantages: • Higher variance (tails are more wiggly) Data Mining and Statistical Learning - 2008
Selecting the width of the kernel Bias-Variance tradeoff: Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance. Data Mining and Statistical Learning - 2008
Selecting the width of the kernel • Automatic selection ( cross-validation) • Fixing the degrees of freedom Data Mining and Statistical Learning - 2008
Local regression in RP The one-dimensional approach is easily extended to p dimensions by • Using the Euclidian norm as a measure of distance in the kernel. • Modifying the polynomial Data Mining and Statistical Learning - 2008
Local regression in RP ”The curse of dimensionality” • The fraction of points close to the boundary of the input domain increases with its dimension • Observed data do not cover the whole input domain Data Mining and Statistical Learning - 2008
Structured local regression models Structured kernels (standardize each variable) Note: A is positive semidefinite Data Mining and Statistical Learning - 2008
Structured local regression models Structured regression functions • ANOVA decompositions (e.g., additive models) Backfitting algorithms can be used • Varying coefficient models (partition X) • INSERT FORMULA 6.17 Data Mining and Statistical Learning - 2008
Structured local regression models Varying coefficient models (example) Data Mining and Statistical Learning - 2008
Local methods • Assumption: model is locally linear ->maximize the log-likelihood locally at x0: • Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et -> yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt) Data Mining and Statistical Learning - 2008
Kernel density estimation • Straightforward estimates of the density are bumpy • Instead, Parzen’s smooth estimate is preferred: Normally, Gaussian kernels are used Data Mining and Statistical Learning - 2008
Radial basis functions and kernels Using the idea of basis expansion, we treat kernel functions as basis functions: where ξj –prototype parameter, λj-scale parameter Data Mining and Statistical Learning - 2008
Radial basis functions and kernels Choosing the parameters: • Estimate {λj,ξj} separately from βj (often by using the distribution of X alone) and solve least-squares. Data Mining and Statistical Learning - 2008