Splines and Kernels in Nonparametric Regression for Clustered Data

Non/Semiparametric Regression and Clustered/Longitudinal Data Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Postdoctoral Training Program: http://stat.tamu.edu/B3NC

Palo Duro Canyon, the Grand Canyon of Texas West Texas  East Texas  Wichita Falls, my hometown Guadalupe Mountains National Park College Station, home of Texas A&M University Midland I-45 Big Bend National Park I-35

Acknowledgments Raymond Carroll Oliver Linton Alan Welsh Series of papers are on my web site Lin, Wang and Welsh: Longitudinal data (Mammen & Linton for pseudo-observation methods) Linton and Mammen: time series data Xihong Lin Naisyin Wang Enno Mammen

Outline • Longitudinal models: • panel data • Background: • splines = kernels for independent data • Correlated data: • do splines = kernels?

Panel Data (for simplicity) • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster

Panel Data (for simplicity) • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster • Important points: • The cluster size m is meant to be fixed • This is not a multiple time series problem where the cluster size increases to infinity • Some comments on the single time series problem are given near the end of the talk

The Marginal Nonparametric Model • Y = Response • X = time-varying covariate • Question: can we improve efficiency by accounting for correlation?

Nonstandard Example: Colon carcinogenesis experiments. Cell-Based Measures in single rats of DNA damage, differentiation, proliferation, apoptosis, P27, etc.

The Marginal Nonparametric Model • Important assumption • Covariates at other waves are not conditionally predictive, i.e., they are surrogates • This assumption is required for any GLS fit, including parametric GLS

Independent Data • Splines (smoothing, P-splines, etc.) with penalty parameter = l • Ridge regression fit • Some bias, smaller variance • is over-parameterized least squares • is a polynomial regression

Independent Data • Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h • As the bandwidth h 0, only observations with X near t get any weight in the fit

Independent Data • Major methods • Splines • Kernels • Smoothing parameters required for both • Fits: similar in many (most?) datasets • Expectation: some combination of bandwidths and kernel functions look like splines 12

Independent Data • Splines and kernels are linear in the responses • Silverman showed that there is a kernel function and a bandwidth so that the weight functions are asymptotically equivalent • In this sense, splines = kernels

The weight functions Gn(t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

Working Independence • Working independence: Ignore all correlations • Fix up standard errors at the end • Advantage: the assumption is not required • Disadvantage: possible severe loss of efficiency if carried too far

Working Independence • Working independence: • Ignore all correlations • Should posit some reasonable marginal variances • Weighting important for efficiency • Weighted versions: Splines and kernels have obvious analogues • Standard method: Zeger & Diggle, Hoover, Rice, Wu & Yang, Lin & Ying, etc.

Working Independence • Working independence: • Weighted splines and weighted kernels are linear in the responses • The Silverman result still holds • In this sense, splines = kernels

Accounting for Correlation • Splines have an obvious analogue for non-independent data • Let be a working covariance matrix • Penalized Generalized least squares (GLS) • GLS ridge regression • Because splines are based on likelihood ideas, they generalize quickly to new problems

Accounting for Correlation • Splines have an obvious analogue for non-independent data • Kernels are not so obvious • One can do theory with kernels • Local likelihood kernel ideas are standard in independent data problems • Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

Kernels and Correlation • Problem: how to define locality for kernels? • Goal: estimate the function at t • Let be a diagonal matrix of standard kernel weights • Standard Kernel method: GLS pretending inverse covariance matrix is • The estimate is inherently local

Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red: r = 0.0 Green: r = 0.4 Blue: r= 0.8 Note the locality of the kernel method The weight functions Gn(t=.25,x) in a specific case 18

Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red: r = 0.0 Green: r = 0.4 Blue: r= 0.8 Note the lack of locality of the spline method The weight functions Gn(t=.25,x) in a specific case

Splines and Correlation Specific case: m=3, n=35 Complex correlation structure Red: Nearly singular Green: r = 0.0 Blue: r= AR(0.8) Note the lack of locality of the spline method The weight functions Gn(t=.25,x) in a specific case

Splines and Standard Kernels • Accounting for correlation: • Standard kernels remain local • Splines are not local • Numerical results can be confirmed theoretically

Results on Kernels and Correlation • GLS with weights • Optimal working covariance matrix is working independence! • Using the correct covariance matrix • Increases variance • Increases MSE • Splines Kernels (or at least these kernels) 24

Pseudo-Observation Kernel Methods • Better kernel methods are possible • Pseudo-observation: original method • Construction: specific linear transformation of Y • Mean = Q(X) • Covariance = diagonal matrix • This adjusts the original responses without affecting the mean

Pseudo-Observation Kernel Methods • Construction: specific linear transformation of Y • Mean = Q(X) • Covariance = diagonal • Iterative: • Efficiency: More efficient than working independence • Proof of Principle: kernel methods can be constructed to take advantage of correlation

Efficiency of Splines and Pseudo-Observation Kernels Exchng: Exchangeable with correlation 0.6 AR: autoregressive with correlation 0.6 Near Sing: A nearly singular matrix

What Do GLS Splines Do? • GLS Splines are really working independence splines using pseudo-observations • Let • GLS Splines are working independence splines

GLS Splines and SUR Kernels • GLS Splines are working independence splines • Algorithm: iterate until convergence • Idea: for kernels, do same thing • This is Naisyin Wang’s SUR method

Better Kernel Methods: SUR • Another view • Consider current state in iteration • For every j, • assume function is fixed and known for • Use the seemingly unrelated regression (SUR) idea • For j, form estimating equation for local averages/linear for jth component only using GLS with weights • Sum the estimating equations together, and solve

SUR Kernel Methods • It is well known that the GLS spline has an exact, analytic expression • We have shown that the SUR kernel method has an exact, analytic expression • Both methods are linear in the responses • Relatively nontrivial calculations show that Silverman’s result still holds • Splines = SUR Kernels

Nonlocality • The lack of locality of GLS splines and SUR kernels is surprising • Suppose we want to estimate the function at t • Result: All observations in a cluster contribute to the fit, not just those with covariates near t • Locality: Defined at the cluster level

Nonlocality • Wang’s SUR kernels = pseudo kernels with a clever linear transformation. Let • SUR kernels are working independence kernels

Locality of Splines • Splines = SUR kernels (Silverman-type result) • GLS spline: • Iterative • standard independent spline smoothing • SUR pseudo-observations at each iteration • GLS splines are not local • GLS splines are local in (the same!) pseudo-observations

Time Series Problems • Time series problems: many of the same issues arise • Original pseudo-observation method • Two stages • Linear transformation • Mean Q(X) • Independent errors • Single standard kernel applied • Potential for great gains in efficiency (even infinite for AR problems with large correlation)

Time Series: AR(1) Illustration, First Pseudo Observation Method • AR(1), correlation r: • Regress Yt0 on Xt

Time Series Problems • More efficient methods can be constructed • Series of regression problems: for all j, • Pseudo observations • Mean • White noise errors • Regress for each j: fits are asymptotically independent • Then weighted average • Time series version of SUR-kernels for longitudinal data?

Time Series: AR(1) Illustration, New Pseudo Observation Method • AR(1), correlation r: • Regress Yt0 on Xt and Yt1 on Xt-1 • Weights: 1 and r2

Time Series Problems • AR(1) errors with correlation r • Efficiency of original pseudo-observation method to working independence: • Efficiency of new (SUR?) pseudo-observation method to original method: 36

The Semiparametric Model • Y = Response • X,Z = time-varying covariates • Question: can we improve efficiency for bby accounting for correlation?

Profile Methods • Given b, solve for Q, say • Basic idea: Regress • Working independence • Standard kernels • Pseudo –observations kernels • SUR kernels

Profile Methods • Given b, solve for Q, say • Then fit GLS or W.I. to the model with mean • Question: does it matter what kernel method is used? • Question: How bad is using W.I. everywhere? • Question: are there efficient choices?

The Semiparametric Model: Special Case • If X does not vary with time, simple semiparametric efficient method available • The basic point is that has common mean and covariance matrix • If were a polynomial, GLS likelihood methods would be natural

The Semiparametric Model: Special Case • Method: Replace polynomial GLS likelihood with GLS local likelihood with weights • Then do GLS on the derived variable • Semiparametric efficient

Profile Method: General Case • Given b, solve for Q, say • Then fit GLS or W.I. to the model with mean • In this general case, how you estimate Q matters • Working independence • Standard kernel • Pseudo-observation kernel • SUR kernel

Profile Methods • In this general case, how you estimate Q matters • Working independence • Standard kernel • Pseudo-observation kernel • SUR kernel • The SUR method leads to the semiparametric efficient estimate • So too does the GLS spline

Longitudinal CD4 Count Data (Zeger and Diggle) Semiparametric GLS refit Semiparametric GLS Z-D Working Independence Est. s.e. Est. s.e. Est. s.e.

Conclusions (1/3): Nonparametric Regression • In nonparametric regression • Kernels = splines for working independence (W.I.) • Working independence is inefficient • Standard kernels splines for correlated data

Conclusions (2/3): Nonparametric Regression • In nonparametric regression • Pseudo-observation methods improve upon working independence • SUR kernels = splines for correlated data • Splines and SUR kernels are not local • Splines and SUR kernels are local in pseudo-observations

Splines and Kernels in Nonparametric Regression for Clustered Data