Non/Semiparametric Regression and Clustered/Longitudinal Data

Non/Semiparametric Regression and Clustered/Longitudinal Data Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu

Outline • Series of Semiparametric Problems: • Panel data • Matched studies • Family studies • Finance applications

Outline • General Framework: • Likelihood-criterion functions • Algorithms: kernel-based • General results: • Semiparametric efficiency • Backfitting and profiling • Splines and kernels: Summary and conjectures

Acknowledgments Xihong Lin Harvard University

Basic Problems • Semiparametric problems • Parameter of interest, called • Unknown function • The key is that the unknown function is evaluated multiple times in computing the likelihood for an individual

Example 1: Panel Data • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster

Example 1: Marginal Parametric Model • Y = Response • X,Z = time-varying covariates • General Result: We can improve efficiency for(b,q)by accounting for correlation: Generalized Least Squares (GLS)

Example 1: Marginal Semiparametric Model • Y = Response • X,Z = varying covariates • Question: can we improve efficiency for bby accounting for correlation?

Example 1: Marginal Nonparametric Model • Y = Response • X = varying covariate • Question: can we improve efficiency by accounting for correlation? (GLS)

Example 2: Matched Studies • Prospective logistic model: i = person, S = stratum • The usual idea is that the stratum-dependent random variables may have been chosen by an extremely weird process, hence impossible to model.

Example 2: Matched Studies • The usual likelihood is determined by • Note how the conditioning removes • Also note: function evaluated twice per stratum

Example 3: Model in Finance • Model in finance • Note how the function is evaluated m-times for each subject

Example 3: Model in Finance • Model in finance • Previous literature used an integration estimator, namely first solved via backfitting: • Computation was pretty horrible • For us, exact computation, general theory

Example 4: Twin Studies • Family consists of twins, followed longitudinally • Baseline for each twin modeled nonparametrically via • Longitudinal modeled parametrically via

General Formulation • These examples all have common features: • They have a parameter • They have an unknown function • The function is evaluated multiple times for each unit (individual, matched pair, family) • This distinguishes it from standard semiparametric models

General Formulation • Yij = Response • Xij,Zij = possibly varying covariates • Loglikelihood (or criterion function) • All my examples have the criterion function

General Formulation: Examples • Loglikelihood (or criterion function) • As stated previously, this is not a standard semiparametric problem, because of the multiple function evaluations

General Formulation: Overview • Loglikelihood (or criterion function) • For these problems, I will give constructive methods of estimation with • Asymptotic expansions and inference available • If the criterion function is a likelihood function, then the methods are semiparametric efficient. • Methods avoid solving integral equations

The Semiparametric Model • Y = Response • X,Z = time-varying covariates • Question: can we improve efficiency for bby accounting for correlation, i.e., what method is semiparametric efficient?

Semiparametric Efficiency • The semiparametric efficient score is readily worked out. • Involves a Fredholm equation of the 2nd kind • Effectively impossible to solve directly: • Involves densities of each X conditional on the others • The usual device of solving integral equations does not work here (or at least is not worth trying)

The Efficient Score (Yuck!)

My Approach • First pretend that if you knew , then you could solve for . • I am going to suggest an algorithm for then estimating • I am then going to turn to the question of estimating

Profiling in Gaussian Problems • Profile methods work like this. • Fix • Apply your smoother • Call the result • Maximize the Gaussian Loglikelihood function in • Explicit solution for most smoothers in Gaussian cases

Profiling • Profile methods maximize • This can be difficult numerically in nonlinear problems • A type of backfitting is often much easier numerically

Backfitting Methods • Backfitting methods work like this. • Fix • Apply your smoother • Call the result • Maximize the Loglikelihood function in : • Iterate until convergence (explicit solution for most smoothers, but different from profiling)

Backfitting/Profiling Example • Partially linear model, one function • Define • Fit the expectations by local linear kernel regression (or whatever)

Backfitting/Profiling Example • The Estimators are • These are numerically different, but asymptotically equivalent • The equivalence is a subtle calculation, even in this simple context

Backfitting/Profiling Example • The asymptotic equivalence of profiling and backfitting in this partially linear model has one subtlety • Profiling: off-the-shelf smoothers are OK • Backfitting: off-the-shelf smoothers need to be undersmoothed to get rid of asymptotic bias

Backfitting/Profiling • Hu, et al. (2004, Biometrika) showed that in general problems: • Backfitting is generally more variable than profiling, for linear-type problems • Backfitting and profiling need not necessarily have the same limit distributions

General Formulation: Revisited • Yij = Response • Xij,Zij = varying covariates • Loglikelihood (or criterion function) • The key is that the function is evaluated multiple times for each individual • The goal is to estimate and efficiently

General Formulation: Revisited • What I want to show you is a constructive solution, i.e., one that can be computed • Different from solving integral equations • Completely general • Theoretically sound • The methodology is based on kernel methods, i.e., local methods. • First a little background

Simple Local Likelihood • Consider a nonparametric regression with iid data • The Loglikelihood function is

Simple Local Likelihood • Let K be a density function, and h a bandwidth • Your target is the function at x • The kernel weights for local likelihood are • If K is the uniform density, only observations within h of x get any weight

Simple Local Likelihood Only observations within h = 0.25 of x = -1.0 get any weight

Simple Local Likelihood • Near x, the function should be nearly linear • The idea then is to do a likelihood estimate local to x via weighting, i.e., maximize • Then announce

Simple Local Likelihood • In the linear model, local likelihood is local linear regression • It is essentially equivalent to loess, splines, etc. • I’ll now use local likelihood ideas to solve the general problem

General Formulation: Revisited • Likelihood (or criterion function) • The goal is to estimate the function at a target value t • Fix . Pretend that the formulation involves different functions

General Formulation: Revisited • Pretend that the formulation involves different functions • Pretend that are known • Fit a local linear regression via local likelihood: • Get the local score function for

General Formulation: Revisited • Repeat: Pretend knowing • Fit a local linear regression: • Get the local score function • Finally, solve • Explicit solution in the Gaussian cases

Main Results • Semiparametric Efficient for • Backfitting (under-smoothed) = profiling • The equivalence of backfitting and profiling is not obvious in the general case.

Main Results • Explicit variance formulae • High-order expansions for parameters and functions • Used for estimating population quantities such as population means, etc.

Marginal Approaches • The most standard approach is a marginal one • Often, we can write, for known G, • Similar would be to write the likelihood function for single observations:

Marginal Approaches • The marginal approaches ignore the correlation structure • Lots, and lots, and lots of papers • Methods tend to be very inefficient if the correlation structure is important

Econometric Example • In panel data, interest can be in random-fixed effects models • Our usual variance components model: is independent of everything • If so, this is a version of our partially linear model, hence already solved by us

Econometric Example • Econometricians though worry that is correlated with Z or X • This says that represents unmeasured variables. This is the fixed-effects model • They want to know the effects of (X,Z), controlling for individual factors

Econometric Example • Starting model: • Get rid of the terms, e.g., • A special case of our model!

Econometric Example • Model: • The terms are correlated over j = 2,…,m • The variance efficiency loss of ignoring these correlations is (2+m)/4

Econometric Example • Example: China Health and Nutrition Survey • No parametric part • Response Y = caloric intake (log scale) • Predictor X = income • Initial random effects model result suggests that for very low incomes, an increase in income is NOT associated with an increase in calories

Econometric Example • Random effects model suggests that for very low incomes, an increase in income is NOT associated with an increase in calories • The fixed effects model fits with economic theory and common sense • Specification test confirms this

Econometric Example • The fixed effectscubic regression fit is far too steep at either end. • The nonparametric fit makes much more sense

Non/Semiparametric Regression and Clustered/Longitudinal Data