Lecture 7. Kernel Smoothing Methods

Lecture 7. Kernel Smoothing Methods Instructed by Jinzhu Jia

Outline • Description of Kernel • One-dimensional Kernel smoothing • Selecting the Width of the Kernel • Local Regression in Rp • Kernel Density Estimation and Classification

Nearest Neighbor Smoothing Average according to uniform weights on all k neighbors

Nearest Neighbor Smoothing • Properties: • Approximates E(Y|X) • Not continuous • To overcome discontinuity: assign weights that die off smoothly with distance from the target point.

Epanechnikov Kernel

Kernel • Similarity to the target ``x0” • Larger implies lower variance but high bias

Examples • Example: EpanechnikovKernel – • K-NN: neighborhood sizereplaces • Tri-cube:

Local Linear Regression

Local linear regression

Local linear regression • Make a first order bias correction

Local Polynomial Regression

Local Polynomial Regression • The price of bias reduction? Variance!

Selecting the Width of the Kernel • controls the width of the local region • Epanechnikov or tri-cube: is the radius of the support region • KNN: is the number of neighbors • Gaussian: is the standard deviation

Selecting the Width of the Kernel • A natural bias-variance tradeoff as we change the width of the averaging window (take local average as an exmaple) • If the window is narrow, variance will be bigger and bias is smaller • If the window is wide, variance will be smaller and bias is big, because some x will be far away from x0 • So CV, AIC, BIC can all be used to select a good

Local Regression in

Local Regression in • Boundary effects? • Less useful in high dimensions • Difficult for visualization

Data Visualization

Structured Local Regreesion Models in • When p/n is big, local regression is not helpful, unless we have some structural information.

Structured Local Regreesion Models in • Structured regression functions • One-dimensional local regression for each stage

Varying Coefficient Models • Divide the p predictors into two sets: • and Z • We assume the conditional linear model:

Varying Coefficient Models: an example

Local Likelihood and Other Models • From global to local: • Example: local logistic regression model.

Kernel Density Estimation and Classification • Kernel Density Estimation • Smooth version • One choice of the kernel: histogram

Kernel Density Estimation and Classification

Naïve Bayes Classifier • Assumption: given class G=j, features are independent

Mixture Models for Density Estimation and Classification • Gaussian mixture model: • EM algorithm is used for the parameter estimatation

Homework • Due May 9 • ESLII_print 5, pp216. Exercise: 6.2, 6.12

Lecture 7. Kernel Smoothing Methods

Lecture 7. Kernel Smoothing Methods

Presentation Transcript

Smoothing Methods for LM in IR

Chapter 6 Kernel Smoothing Methods

Kernel Methods Part 2

Overview of Kernel Methods

Kernel Methods: Basics

Lecture 7 – Information Gathering Methods

Kernel Methods and SVM’s

Kernel Methods

Kernel Methods

Kernel methods

Exponential Smoothing Methods

Kernel synchronization methods

Kernel – Based Methods

Kernel Methods

Lecture 4 Ngrams Smoothing

Kernel Methods

4 – Exponential Smoothing Methods

Lecture #5: MAPS WITH GAPS-- Small geographic area estimation, kriging, and kernel smoothing

Kernel Density Estimation, Kernel Methods, and fast learning

Kernel Methods

Kernel methods - overview

Chapter 6: Kernel Methods