1 / 62

Non/Semiparametric Regression and Clustered/Longitudinal Data

This book explores the use of non or semiparametric regression methods for analyzing clustered or longitudinal data. It covers various applications including panel data, matched studies, family studies, and finance.

freedman
Download Presentation

Non/Semiparametric Regression and Clustered/Longitudinal Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non/Semiparametric Regression and Clustered/Longitudinal Data Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu

  2. Outline • Series of Semiparametric Problems: • Panel data • Matched studies • Family studies • Finance applications

  3. Outline • General Framework: • Likelihood-criterion functions • Algorithms: kernel-based • General results: • Semiparametric efficiency • Backfitting and profiling • Splines and kernels: Summary and conjectures

  4. Acknowledgments Xihong Lin Harvard University

  5. Basic Problems • Semiparametric problems • Parameter of interest, called • Unknown function • The key is that the unknown function is evaluated multiple times in computing the likelihood for an individual

  6. Example 1: Panel Data • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster

  7. Example 1: Marginal Parametric Model • Y = Response • X,Z = time-varying covariates • General Result: We can improve efficiency for(b,q)by accounting for correlation: Generalized Least Squares (GLS)

  8. Example 1: Marginal Semiparametric Model • Y = Response • X,Z = varying covariates • Question: can we improve efficiency for bby accounting for correlation?

  9. Example 1: Marginal Nonparametric Model • Y = Response • X = varying covariate • Question: can we improve efficiency by accounting for correlation? (GLS)

  10. Example 2: Matched Studies • Prospective logistic model: i = person, S = stratum • The usual idea is that the stratum-dependent random variables may have been chosen by an extremely weird process, hence impossible to model.

  11. Example 2: Matched Studies • The usual likelihood is determined by • Note how the conditioning removes • Also note: function evaluated twice per stratum

  12. Example 3: Model in Finance • Model in finance • Note how the function is evaluated m-times for each subject

  13. Example 3: Model in Finance • Model in finance • Previous literature used an integration estimator, namely first solved via backfitting: • Computation was pretty horrible • For us, exact computation, general theory

  14. Example 4: Twin Studies • Family consists of twins, followed longitudinally • Baseline for each twin modeled nonparametrically via • Longitudinal modeled parametrically via

  15. General Formulation • These examples all have common features: • They have a parameter • They have an unknown function • The function is evaluated multiple times for each unit (individual, matched pair, family) • This distinguishes it from standard semiparametric models

  16. General Formulation • Yij = Response • Xij,Zij = possibly varying covariates • Loglikelihood (or criterion function) • All my examples have the criterion function

  17. General Formulation: Examples • Loglikelihood (or criterion function) • As stated previously, this is not a standard semiparametric problem, because of the multiple function evaluations

  18. General Formulation: Overview • Loglikelihood (or criterion function) • For these problems, I will give constructive methods of estimation with • Asymptotic expansions and inference available • If the criterion function is a likelihood function, then the methods are semiparametric efficient. • Methods avoid solving integral equations

  19. The Semiparametric Model • Y = Response • X,Z = time-varying covariates • Question: can we improve efficiency for bby accounting for correlation, i.e., what method is semiparametric efficient?

  20. Semiparametric Efficiency • The semiparametric efficient score is readily worked out. • Involves a Fredholm equation of the 2nd kind • Effectively impossible to solve directly: • Involves densities of each X conditional on the others • The usual device of solving integral equations does not work here (or at least is not worth trying)

  21. The Efficient Score (Yuck!)

  22. My Approach • First pretend that if you knew , then you could solve for . • I am going to suggest an algorithm for then estimating • I am then going to turn to the question of estimating

  23. Profiling in Gaussian Problems • Profile methods work like this. • Fix • Apply your smoother • Call the result • Maximize the Gaussian Loglikelihood function in • Explicit solution for most smoothers in Gaussian cases

  24. Profiling • Profile methods maximize • This can be difficult numerically in nonlinear problems • A type of backfitting is often much easier numerically

  25. Backfitting Methods • Backfitting methods work like this. • Fix • Apply your smoother • Call the result • Maximize the Loglikelihood function in : • Iterate until convergence (explicit solution for most smoothers, but different from profiling)

  26. Backfitting/Profiling Example • Partially linear model, one function • Define • Fit the expectations by local linear kernel regression (or whatever)

  27. Backfitting/Profiling Example • The Estimators are • These are numerically different, but asymptotically equivalent • The equivalence is a subtle calculation, even in this simple context

  28. Backfitting/Profiling Example • The asymptotic equivalence of profiling and backfitting in this partially linear model has one subtlety • Profiling: off-the-shelf smoothers are OK • Backfitting: off-the-shelf smoothers need to be undersmoothed to get rid of asymptotic bias

  29. Backfitting/Profiling • Hu, et al. (2004, Biometrika) showed that in general problems: • Backfitting is generally more variable than profiling, for linear-type problems • Backfitting and profiling need not necessarily have the same limit distributions

  30. General Formulation: Revisited • Yij = Response • Xij,Zij = varying covariates • Loglikelihood (or criterion function) • The key is that the function is evaluated multiple times for each individual • The goal is to estimate and efficiently

  31. General Formulation: Revisited • What I want to show you is a constructive solution, i.e., one that can be computed • Different from solving integral equations • Completely general • Theoretically sound • The methodology is based on kernel methods, i.e., local methods. • First a little background

  32. Simple Local Likelihood • Consider a nonparametric regression with iid data • The Loglikelihood function is

  33. Simple Local Likelihood • Let K be a density function, and h a bandwidth • Your target is the function at x • The kernel weights for local likelihood are • If K is the uniform density, only observations within h of x get any weight

  34. Simple Local Likelihood Only observations within h = 0.25 of x = -1.0 get any weight

  35. Simple Local Likelihood • Near x, the function should be nearly linear • The idea then is to do a likelihood estimate local to x via weighting, i.e., maximize • Then announce

  36. Simple Local Likelihood • In the linear model, local likelihood is local linear regression • It is essentially equivalent to loess, splines, etc. • I’ll now use local likelihood ideas to solve the general problem

  37. General Formulation: Revisited • Likelihood (or criterion function) • The goal is to estimate the function at a target value t • Fix . Pretend that the formulation involves different functions

  38. General Formulation: Revisited • Pretend that the formulation involves different functions • Pretend that are known • Fit a local linear regression via local likelihood: • Get the local score function for

  39. General Formulation: Revisited • Repeat: Pretend knowing • Fit a local linear regression: • Get the local score function • Finally, solve • Explicit solution in the Gaussian cases

  40. Main Results • Semiparametric Efficient for • Backfitting (under-smoothed) = profiling • The equivalence of backfitting and profiling is not obvious in the general case.

  41. Main Results • Explicit variance formulae • High-order expansions for parameters and functions • Used for estimating population quantities such as population means, etc.

  42. Marginal Approaches • The most standard approach is a marginal one • Often, we can write, for known G, • Similar would be to write the likelihood function for single observations:

  43. Marginal Approaches • The marginal approaches ignore the correlation structure • Lots, and lots, and lots of papers • Methods tend to be very inefficient if the correlation structure is important

  44. Econometric Example • In panel data, interest can be in random-fixed effects models • Our usual variance components model: is independent of everything • If so, this is a version of our partially linear model, hence already solved by us

  45. Econometric Example • Econometricians though worry that is correlated with Z or X • This says that represents unmeasured variables. This is the fixed-effects model • They want to know the effects of (X,Z), controlling for individual factors

  46. Econometric Example • Starting model: • Get rid of the terms, e.g., • A special case of our model!

  47. Econometric Example • Model: • The terms are correlated over j = 2,…,m • The variance efficiency loss of ignoring these correlations is (2+m)/4

  48. Econometric Example • Example: China Health and Nutrition Survey • No parametric part • Response Y = caloric intake (log scale) • Predictor X = income • Initial random effects model result suggests that for very low incomes, an increase in income is NOT associated with an increase in calories

  49. Econometric Example • Random effects model suggests that for very low incomes, an increase in income is NOT associated with an increase in calories • The fixed effects model fits with economic theory and common sense • Specification test confirms this

  50. Econometric Example • The fixed effectscubic regression fit is far too steep at either end. • The nonparametric fit makes much more sense

More Related