Efficient Nonparametric Regression for Clustered Longitudinal Data

Non/Semiparametric Regression and Clustered/Longitudinal Data Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Postdoctoral Training Program: http://stat.tamu.edu/B3NC

Where am I From? Wichita Falls, my hometown I-45 College Station, home of Texas A&M Big Bend National Park I-35

Acknowledgments Raymond Carroll Oliver Linton Alan Welsh Series of papers are on my web site Lin, Wang and Welsh: Longitudinal data Linton and Mammen: time series data Xihong Lin Naisyin Wang Enno Mammen

Outline • Longitudinal models: • panel data • Background: • splines = kernels for independent data • Nonparametric case: • do splines = kernels? • Semiparametric case: • partially linear model: • does it matter what nonparametric method is used?

Panel Data (for simplicity) • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster

Panel Data (for simplicity) • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster • Important point: • The cluster size m is meant to be fixed • This is not a time series problem where the cluster size increases to infinity • We have equivalent time series results

The Nonparametric Model • Y = Response • X = time-varying covariate • Question : can we improve efficiency by accounting for correlation?

Independent Data • Two major methods • Splines (smoothing, P-splines, etc.) with penalty parameter = l • Kernels (local averages, local linear, etc.), with kernel function K and bandwidth h

Independent Data • Two major methods • Splines (smoothing, P-spline, etc.) • Kernels (local averages, etc.) • Both are linear in the responses • Both give similar answers in data • Silverman showed that the weight functions are asymptotically equivalent • In this sense, splines = kernels

The weight functions Gn(t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

Working Independence • Working independence: • Ignore all correlations • Posit some reasonable marginal variances • Splines and kernels have obvious weighted versions • Weighting important for efficiency • Splines and kernels are linear in the responses • The Silverman result still holds • In this sense, splines = kernels

Accounting for Correlation • Splines have an obvious analogue for non-independent data • Let be a working covariance matrix • Penalized Generalized least squares (GLS) • Because splines are based on likelihood ideas, they generalize quickly to new problems • Kernels have no such obvious analogue

Accounting for Correlation • Kernels are not so obvious • Local likelihood kernel ideas are standard in independent data problems • Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

Kernels and Correlation • Problem: how to define locality for kernels? • Goal: estimate the function at t • Let be a diagonal matrix of standard kernel weights • Standard Kernel method: GLS pretending inverse covariance matrix is • The estimate is inherently local

Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red: r = 0.0 Green: r = 0.4 Blue: r= 0.8 Note the locality of the kernel method The weight functions Gn(t=.25,x) in a specific case for independent data

Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red: r = 0.0 Green: r = 0.4 Blue: r= 0.8 Note the lack of locality of the spline method The weight functions Gn(t=.25,x) in a specific case for independent data

Splines and Correlation Specific case: m=3, n=35 Complex correlation structure Red: Nearly singular Green: r = 0.0 Blue: r= AR(0.8) Note the lack of locality of the spline method The weight functions Gn(t=.25,x) in a specific case for independent data

Splines and Standard Kernels • Accounting for correlation: • Standard kernels remain local • Splines are not local • Numerical results can be confirmed theoretically • Don’t we want our nonparametric regression estimates to be local?

Results on Kernels and Correlation • GLS with weights • Optimal working covariance matrix is working independence! • Using the correct covariance matrix • Increases variance • Increases MSE • Splines Kernels (or at least these kernels)

Better Kernel Methods: SUR • Iterative, due to Naisyin Wang • Consider current state in iteration • For every j, • assume function is fixed and known for • Use the seemingly unrelated regression (SUR) idea • For j, form estimating equation for local averages/linear for jth component only using GLS with weights • Sum the estimating equations together, and solve

SUR Kernel Methods • It is well known that the GLS spline has an exact, analytic expression • We have shown that the SUR kernel method has an exact, analytic expression • Both methods are linear in the responses • Relatively nontrivial calculations show that Silverman’s result still holds • Splines = SUR Kernels

Nonlocality • The lack of locality of GLS splines and SUR kernels is surprising • Suppose we want to estimate the function at t • All observations in a cluster contribute to the fit, not just those with covariates near t • Somewhat similar to GLIM’s, there is a residual-adjusted pseudo-response that • has expectation = response • Has local behavior in the pseudo-response

Nonlocality • Wang’s SUR kernels = pseudo kernels with a clever linear transformation. Let • SUR kernels are working independence kernels

The Semiparametric Model • Y = Response • X,Z = time-varying covariates • Question: can we improve efficiency for bby accounting for correlation?

The Semiparametric Model • General Method: Profile likelihood • Given b, solve for Q, say • Your favorite nonparametric method applied to • Working independence • Standard kernel • SUR kernel

Profile Methods • Given b, solve for Q, say • Then fit WI/GLS to the model with mean • Standard kernel methods have awkward, nasty properties • SUR kernel methods have nice properties • Semiparametric asymptotically efficient

ARE for b of Working Independence Cluster Size: Black: m = 3 Red: m = 4 Green: m = 5 Blue: m = 6 Scenario: X’s: common correlation 0.3 Z’s: common correlation 0.6 X & Z: common correlation 0.6 e: common correlation r Note: Efficiency depends on cluster size

Profile Methods • Given b, solve for Q, say • Then fit GLS to the model with mean • If you fit working independence for your estimate of Q, there is not that great a loss of efficiency

ARE for b of Working Independence/Profile Method Cluster Size: Black: m = 3 Red: m = 4 Green: m = 5 Blue: m = 6 Scenario: X’s: common correlation 0.3 Z’s: common correlation 0.6 X & Z: common correlation 0.6 e: common correlation r Note: Efficiency depends on cluster size

Conclusions • In nonparametric regression • Kernels = splines for working independence • Working independence is inefficient • Standard kernels splines for correlated data • SUR kernels = splines for correlated data • In semiparametric regression • Profiling SUR kernels is efficient • Profiling: GLS for b and working independence for Q is nearly efficient

Efficient Nonparametric Regression for Clustered Longitudinal Data