1 / 48

Outline

Outline. Outline: Motivations Smoothing Degree of Freedom. Motivation: Why nonparametric ?. Simple Linear Regression: E(Y|X)=α+βX assume that the mean of Y is a linear function of X (+)easy in computation, description, interpretation, …etc. (-) limit of uses.

raja
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline Outline: • Motivations • Smoothing • Degree of Freedom

  2. Motivation: Why nonparametric? Simple Linear Regression: E(Y|X)=α+βX • assume that the mean of Y is a linear function of X • (+)easy in computation, description, interpretation, …etc. • (-) limit of uses

  3. Note that the hat matrix in LSE of regression symmetric and idempotent constant preserving i.e. S1=1 = # of linearly independent predictors in a model = # of parameters in a model

  4. If the dependence of E(Y) on Xis far from linear, • one can extend straight-line regression by adding terms like X2 to the model • but it is difficult to guess the most appropriate function form just from looking at the data.

  5. Example: Diabetes data 1. Diabetes data (Sockett et al., 1987): a study of the factors affecting patterns of insulin-dependent diabetes mellitus in children. • Response: logarithm (C-peptide concentration at diagnosis) • Predictors: age and base deficit.

  6. What is smoothers? A tool for summarizing the trend of a response Y as a function of one or more predictor measurements X1,X2,…,Xp.

  7. Idea of smoothers Simplest Smoothers occur in the case of a categorical predictor, Example: sex (male, female), Example: color (red, blue, green) To smooth Y simply average the values of Y in each category

  8. How about non-categorical predictor? • usually lack replicates at each predictor value • mimic category averaging through “local averaging”i.e. average the Y values in neighborhoods around each target value

  9. Two main uses of smoothers • Description: to enhance the visual appearance of the scatterplot of Y vs. X. • Estimate the dependence of the mean of Y on the predictor

  10. Two main decisions to be made in scatterplot smoothing • how to averaging the response values in each neighborhood ?(which brand of smoother?) • how big to take the neighborhoods ? (smoothing parameters=?)

  11. Scatterplot Smoothing Notations: • y=(y1, y2,…,yn)T • x=( x1, x2,…,xn)T with x1< x2<…<xn • Def: s(x0)=S(y|x=x0)

  12. Some scatterplot smoothers :1. Bin smoothers • Choose cut points • Def : the indices of data points in each region. • (-): estimate is not smooth (jumps at each cut points).

  13. 2.Running-mean smoothers (moving average) • Choose asymmetric nearest neighborhood • define the running mean • (+):simple • (-):don’t work well (wiggly), severely biased near the end points

  14. 3.Running-line smoothers Def: where and are the LSE for the data points in (-): jagged =>weighted LSE

  15. 4. Kernel smoothers Def: where d(t) is a smooth even function decreasing in |t|,  =bandwidth, C0 chosen so that the weights sum to 1 Example. Gaussian kernel Example. Epanechnikov kernel Example. the minimum variance kernel

  16. 5.Running medians smoothers Def: • make the smoother resistant to outliers in the data • nonlinear smoother

  17. 6. Regression splines ! The regions are separated by a sequences of knots ! Piecewise polynomial e.g. piecewise cubic polynomial joint smoothly at these knots ps. more knots more flexible

  18. 6a. piecewise-cubic spline (1) s is cubic polynomial in any subintervals (2) s has the two continuous derivates (3) s has a third derivative that is step function with jumps at knots

  19. (Continue) its parametric expression where a+ denotes the positive part of a • it can be rewritten as a linear combination of K+4 basis function • de Boor (1978): B-spline basis functions

  20. 6b. Nature spline Def: Regression spline (see 6a) + boundary regions

  21. 7. Cubic smoothing splines Find f that minimize the penalized residual sum of square • first term: closeness to the data • second term: penalize curvature in the function • : (1) large values produce smoother curve • (2) small values produce wiggly curve

  22. 8. Locally-weighted running-line smoothers (loess) Cleveland (1979) • define N(x0)=k nearest neighbors of x0 • Using tri-cube weight function in WLSE

  23. Smoothers for multiple predictors • multiple-predictor smoothers: example: kernel (see figure) (-):difficulty of interpretation and computation • Additive model • semi-parametric model

  24. “Curse of dimensionality” Neighborhoods with a fixed number of points become less local as the dimensions increase (Bellman, 1961) • For p=1 and span=.1 should length .1. • For p=10 the side length need to be .8.

  25. additive model Additive: Y = f 1(X1)+...+fp (X2 ) + e • The selection, estimation are usually based on the smoothing, backfitting, BRUTO, ACE, Projector, etc. (Hastie, 1990)

  26. Backfitting (see HT 90) • BRUTO Algorithm (see HT 90) is a forward model selection procedure using a modified GCV, defined latter, to choose the significant variables and their smoothing parameters.

  27. (Smoothing in details) assume where , X independent with

  28. The bias-variance trade-off Example. running-mean

  29. To expand f in Taylor series assuming data are equally spaced with , and ignoring R

  30. and the optimal k is chosen by minimizing as

  31. Automatic selection of smoothing parameters (1)Average mean-squared error (2)Average predictive squared error where is a new observation at Xi.

  32. Some estimates of PSE:1. CV Cross Validation (CV) where indicates the fit at xi, computed by leaving out the ith data point.

  33. Fact: Since

  34. 2. Average squared residual(ASR) • is not a good estimate of PSE

  35. linear smoothers def1: def2: where is called smoother matrix. (free of y) e.g. running-mean, running-line, smoothing spline, kernel, loess and regression spline

  36. The Bias-variance trade-offfor linear smoothers

  37. Cross Validation (CV) constant preserving: weights Sij=>Sij /(1-Sii) => => =>

  38. Generalized Cross Validation

  39. Degree of freedom of a smoother Why need df? (here?) The same data set and computational power of modern computers are used routinely in the formulation, selection, estimation, diagnostic and prediction of statistical model.

  40. EDF (Ye, 1998) Idea: A modeling / forecast procedure said to be stable if small changes in Y produces small changes in the fitted values .

  41. More precisely (EDF) for , we would like to have, where is a small matrix. => can be viewed as the slope of the straight line

  42. Data Perturbation Procedure For an integer m > 1 (the Monte Carlo sample size), generateδ1, ...,δm as i.i.d. N(0, t2In) where t > 0 and In is the n×n identity matrix. • Use the “perturbed” data Y +δj, to refit • For i =1,2, ..., n, the slope of the LS line fitted to ( (Yi +δij), δij), j=1, ..., m, gives an estimate of hii.

  43. An application Table 1. MSE & SD of five models fitted to lynx data About SD: Fit the same class of models to the first 100 obs., keeping the last 100 for out-of-sample predictions. SD = the standard deviation of the multi-step ahead prediction errors.

More Related