350 likes | 365 Views
This article presents a semiparametric framework for modeling correlations among functions in a colon carcinogenesis experiment. It discusses nonparametric methods, asymptotic summaries, and analysis techniques. The study aims to understand the structure of p27 in the colon when animals are exposed to a carcinogen.
E N D
Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University http://stat.tamu.edu/~carroll
Outline • Problem: Modeling correlations among functions and apply to a colon carcinogenesis experiment • Biological background • Semiparametric framework • Nonparametric Methods • Asymptotic summary • Analysis • Summary
Biologist Who Did The Work Meeyoung Hong Postdoc in the lab of Joanne Lupton and Nancy Turner Data collection represents a year of work
Basic Background • Apoptosis: Programmed cell death • Cell Proliferation: Effectively the opposite • p27: Differences in this marker are thought to stimulate and be predictive of apoptosis and cell proliferation • Our experiment: understand some of the structure of p27 in the colon when animals are exposed to a carcinogen
Data Collection • Structure of Colon • Note the finger-like projections • These are colonic crypts • We measure expression of cells within colonic crypts
Data Collection • p27 expression: Measured by staining techniques • Brighter intensity = higher expression • Done on a cell by cell basis within selected colonic crypts • Very time intensive
Data Collection • Animals sacrificed at 4 times: 0 = control, 12hr, 24hr and 48hr after exposure • Rats: 12 at each time period • Crypts: 20 are selected • Cells: all cells collected, about 30 per crypt • p27: measured on each cell, with logarithmic transformation
Nominal Cell Position • X = nominal cell position • Differentiated cells: at top, X = 1.0 • Proliferating cells: in middle, X=0.5 • Stem cells: at bottom, X=0
Standard Model • Hierarchical structure: cells within crypts within rats within times
Standard Model • Hierarchical structure: cell locations of cells within crypts within rats within times • In our experiment, the residuals from fits at the crypt level are essentially white noise • However, we also measured the location of the colonic crypts
Crypt Locations at 24 Hours, Nominal zero Scale: 1000’s of microns Our interest: relationships at between 25-200 microns
Standard Model • Hypothesis: it is biologically plausible that the nearer the crypts to one another, the greater the relationship of overall p27 expression. • Expectation: The effect of the carcinogen might well alter the relationship over time • Technically: Functional data where the functions are themselves correlated
Basic Model • Two-Levels: Rat and crypt level functions • Rat-Level: Modeled either nonparametrically or semiparametrically • Semiparametrically: low-order regression spline • Nonparametrically: some version of a kernel fit
Basic Model • Crypt-Level: A regression spline, with few knots, in a parametric mixed-model formulation Linear spline: In practice, we used a quadratic spline. The covariance matrix of the quadratic part used the Pourahmadi-Daniels construction
Advertisement David Ruppert Matt Wand http://stat.tamu.edu/~carroll/semiregbook/ Please buy our book! Or steal it
Basic Model • Crypt-Level: We modeled the functions across crypts semiparametrically as regression splines • The covariance matrix of the parameters of the spline modeled as separable • Matern family used to model correlations
Basic Model • Crypt-Level: regression spline, few knots • Separable covariance structure
Theory for Parametric Version • The semiparametric approach we used fits formally into a new general theory • Recall that we fit the marginal function at the rat level “nonparametrically”. • At the crypt level, we used a parametric mixed-model representation of low-order regression splines Xihong Lin
General Formulation • Yij = Response • Xij, Zij = cell and crypt locations • Likelihood (or criterion function) • The key is that the function is evaluated multiple times for each rat • This distinguishes it from standard semiparametric models • The goal is to estimate and efficiently
General Formulation: Overview • Likelihood (or criterion function) • For iid versions of these problems, we have developed constructive kernel-based methods of estimation with • Asymptotic expansions and inference available • If the criterion function is a likelihood function, then the methods are semiparametric efficient. • Methods avoid solving integral equations
General Formulation: Overview • We also show • The equivalence of profiling and backfitting • Pseudo-likelihood calculations • The results were submitted 7 months ago, and we confidently expect 1st reviews within the next year or two , depending on when the referees open their mail
General Formulation: Overview • In the application, modifications necessary both for theoretical and practical purposes • Techniques described in the talk of Tanya Apanasovich can be used here as well to cut down on the computational burden due to the correlated functions.
Nonparametric Fits • Equal Spacing: Assume cell locations are equally spaced • Define = covariance between crypt-level functions that are D apart, one at location x1 and the other at location x2 • Assume separable covariance structure
Nonparametric Fits • Discrete Version: Pretend D, x1 and x2 take on a small discrete set of values (we actually use a kernel-version of this idea) • Form the sample covariance matrix per rat at D, x1 and x2 , then average across rats. • Call this estimate
Nonparametric Fits • Separability: Now use the separability to get a rough estimate of the correlation surface.
Nonparametric Fits • The estimate is not a proper correlation function • We fixed it up using a trick due to Peter Hall (1994, Annals), thus forming , a real correlation function • Basic idea is to do a Fourier transform, force it to be non-negative, then invert • Slower rates of convergence than the parametric fit, more variability, etc. • Asymptotics worked out (non-trivial)
Semiparametric Method Details • In the example, a cubic polynomial actually suffices for the rat-level functions • Maximize in the covariance structure of the crypt-level splines, including Matern-order • The covariance structure includes • Matern order m (gridded) • Matern parameter a • Spline smoothing parameter • Quadratic part’s covariance matrix (Pourahmadi-Daniels) • AR(1) for residuals
Results: Part I • Matern Order: Matern order of 0.5 is the classic autoregressive model • Our Finding: For all times, an order of about 0.15 was the maximizer • Simulations: Can distinguish from autocorrelation • Different Matern orders lead to different interpretations about the extent of the correlations
Marginal over time Spline Fits Note • time effects • Location effects
Comparison of Fits • The semiparametric and nonparametric fits are roughly similar • At 200 microns, quite far apart, the correlations in the functions are approximately 0.40 for both • Surprising degree of correlation
Summary • We have studied the problem of crypt-signaling in colon carcinogenesis experiments • Technically, this is a problem of hierarchical functional data where the functions are not independent in the standard manner • We developed (efficient, constructive) semiparametric and nonparametric methods, with asymptotic theory • The correlations we see in the functions are surprisingly large.
Statistical Collaborators Yehua Li Naisyin Wang
Summary Insiration for this work: Water goanna, Kimberley Region, Australia