1 / 74

Raymond J. Carroll Department of Statistics Member, Center for Statistical Bioinformatics

The Interface of Functional and Longitudinal Data. Raymond J. Carroll Department of Statistics Member, Center for Statistical Bioinformatics Director, Institute for Applied Mathematics and Computational Science Texas A&M University http://stat.tamu.edu/~carroll. My Charge.

dorie
Download Presentation

Raymond J. Carroll Department of Statistics Member, Center for Statistical Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Interface of Functional and Longitudinal Data Raymond J. Carroll Department of Statistics Member, Center for Statistical Bioinformatics Director, Institute for Applied Mathematics and Computational Science Texas A&M University http://stat.tamu.edu/~carroll

  2. My Charge • “Please feel free to talk about anything you wish” (Dangerous) • “Your thinking about longitudinal data and perhaps functional data from a wider perspective “ • “Goals of the workshop are to inspire new researchers, and to take stock of where the interface of longitudinal-functional data and dynamics is headed”

  3. What I Want to Talk about Mother and joey, Tidbinbilla (outside Canberra), September 2010

  4. What I Want to Talk about Namadji National Park July 2005

  5. What I Will Talk About • I will talk about some of the problems I have worked on • No technical solutions, the other speakers look to be providing them • Investigators think marginally, statisticians think of random effects

  6. Some Observations • In my work, there is a tension between • Providing answers to my collaborators that they can understand • Developing new general methodology publishable in statistics and that can solve more general problems • Thinking about parts of the actual problem that my collaborators would not have thought about • It’s easy to get caught up in either of the 1st two

  7. Some Observations • When I am simply providing answers to stated questions, I find similar themes as the distinction between marginal models such as GEE and nonlinear mixed effects models for longitudinal data • GEE is simply easier • Most scientists think marginally because they are uncomfortable with the idea of variability

  8. What I Will Talk About • Think what the typical smart biologist knows about statistics. • t-tests, ANOVA, simple linear regression • All the focus is on the mean, none on the variability

  9. Some Observations • What we have to do is to deliver the analysis the data collectors can understand, and teach them about variability • Pictures work wonders: functions are no harder to understand than histograms, and understanding variability can help investigators tell stories

  10. Some Observations • We need to advance the field of statistics • Deeper understanding of the underlying process, through random effects modeling, often helps inform future studies and helps investigators tell their story

  11. An Old Colon Carcinogenesis Project • Experiment with 2 lipids (fish oil and corn oil) with and without butyrate (a fatty acid) supplementation, with p27 or MGMT repair measured as the response • Longitudinal, maybe even dynamic, hierarchical and functional. • Hierarchical because each of the treatment groups has multiple samples, and each of them have multiple functions • Functional because of the biology

  12. Colon Cancer Data Jeff Morris Ciprian Crainiceanu Ana-Maria Staicu Naisyin Wang Veera B Yehua Li

  13. Functional • The colonic crypts have cells, near the bottom (x=0) are the stem cells, near the top (x=1) are the differentiated cells

  14. MGMT Repair Enzyme, 1 crypt • MGMT curve in one crypt. • Original analysis found large diet effects

  15. MGMT Repair Enzyme, 1 crypt • The large diet effects on the MGMT repair enzyme are real. • There are also large diet effects on apoptosis

  16. MGMT Repair Enzyme, 1 crypt • What do biologists do (define original analysis)? • They simplify the data so that they can do ANOVA, duh! • They average all the response (p27 or MGMT, about 200 observations in each analysis) in the bottom 1/3rd, Middle 1/3rd and top 1/3rd. Then they run 3 ANOVA.

  17. MGMT Repair Enzyme, 1 crypt • They then they tell a story about all the ANOVA they have done. • We all smile about this, but my collaborator (Joanne Lupton) just got elected into the U. S. National Academy of Science.

  18. MGMT Repair Enzyme, 1 crypt • I like to think that our more nuanced analyses help her tell her stories, which is hopefully not wishful thinking!

  19. MGMT Repair Enzyme, 1 crypt • Wavelet functional coefficients for apoptotic index in the top 1/3 of the crypt, for fish oil and for corn oil. From Morris and Carroll (2006): “fish-oil-fed animals who had a large amount of apoptosis near their lumenal surface also had high levels of the DNA repair enzyme MGMT near their lumenal surface, meaning that the two major mechanisms for dealing with DNA damage were correlated. This relationship was not so strong for corn-oil-fed animals”.

  20. MGMT Repair Enzyme, the stiry • We did a full-blown wavelet-based functional mixed model analysis to get these conclusions. Could it have been done marginally? • Probably Yes, but then that’s dull. • However, we (a) know much more about the pattern of variability and (b) we built up methods and software that can be used in a wide variety of settings

  21. Longitudinal • Colon carcinogenesis is a localized phenomenon. The crypts closest to one another are highly correlated

  22. Colon Cancer Data • The locality hypothesis says that colon cancer starts because of highly localized damage. • Longitudinal and hierarchical FDA can tell us many things about this hypothesis, e.g., where is localized damage more likely to occur? • While most research focuses on the proximal and distal portions of the colon, FDA reveals that there is as much or more in the middle

  23. Basic Model for p27

  24. Colon Cancer Data • Lots of fun fitting this longitudinal, hierarchical functional data set • What did the investigators want to know? • They were interested in how correlated neighboring crypts are, consistent with the locality hypothesis.

  25. Colon Cancer Data • The Bayesian analysis gives them strong point-wise evidence (can supplement with FDR) • Allows summary measures

  26. Colon Cancer Data • Acknowledging the longitudinal nature led to much more precise inferences. This is the interaction function between diet and treatment: guess which one allows for locality?

  27. Cell Signaling Data • Myometrial cells meant to mimic what goes on near birth were either exposed to dioxin (TCDD) or not exposed. • They were then exposed to a hormone, oxytocin, that stimulates calcium ion signaling (CA2+) • The CA2+ signal was observed at many pixels of each cell for 512 time points (85 minutes)

  28. Cell Signaling Data Josue Martinez Jianhua Huang

  29. Cell Signaling Data • The cells were segmented, and intensity of the signals were obtained for each pixel, each cell and all time points. • Roughly 25 cells in each treatment group (control and TCDD) • Hierarchical because of pixels within cells within treatments

  30. Cell Signaling Data • Functional because pixels are measured over time • Possibly different levels of spatial because the cells are in spatial alignment • Lots of preprocessing: cell segmentation, adjustment for saturation, and more

  31. Cell Signaling Data First two minutes of the experiment for the TCDD treated plate. Next comes two movies of the data

  32. Cell Signaling Data All cells (Control and TCDD), at a basal state in which the cells were cultured, 0-4 minutes and 40-80 minutes after oxytocin exposure

  33. Cell Signaling Data All cells (Control and TCDD), at a low estrogen state, just before pregnancy (note the delayed response due to TCDD)

  34. Cell Signaling Data All cells (Control and TCDD), at a high estrogen state, near full-term in pregnancy

  35. Cell Signaling Data All cells (Control and TCDD), at a high estrogen state, near full-term in pregnancy, after normalization and registration

  36. Cell Signaling Data All cells (Control and TCDD), at a high estrogen state, near full-term in pregnancy, after normalization and registration. Areas under the curve (p < 0.001)

  37. Cell Signaling Data • You should see that in this analysis, we have not made use of the structure of the data. • We have thought like GEE people, and indeed reduced the comparison of control and TCDD to single numbers, e.g., peak time and area under the curve. • We did lots of dimension reduction (4 weighted SVD) to get here

  38. Cell Signaling Data • There was a lot of work to get the data into a format for analysis • Question: what can hierarchical, possible spatial FDA do for us here, and given the structure, how should an analysis proceed? • I feel that there is a lot more that we can learn about the process by thinking more deeply about the modeling

  39. Bat Chirp Data • Bats of the same species, residing in Austin (city bats) and College Station (Aggie bats)

  40. Bat Chirp Data Josue Martinez Jeff Morris

  41. Bat Chirp Data

  42. Bat Chirp Data • Bat chirps were recorded, some multiple times for each bat. • The hierarchy is species, bat, replicate • I believe this analysis is a poster child for why to think functionally and hierarchically

  43. A Representative Bat Chirp

  44. Bat Chirp Spectrogam

  45. Bat Chirp Data • The chirp is mainly composed of frequencies that start at about 40 kilohertz (kHz) and slowly decrease to 20 kHz from 0 to 8 milliseconds into the chirp. • The bat then transitions to predominant frequencies at 60 kHz that slowly decrease back down to 40 kHz and then rise up to 60 kHz towards the end of the chirp. • Frequencies above ∼ 80 kHz are harmonics of the fundamental signal.

  46. One Chirp per Bat

  47. Bat Chirp Data • It seems clear to me that this is an inherently functional problem. • Trying to reduce it to a single number to do a t-test seems difficult to contemplate, but it is not impossible. • People have tried t-tests and classification based on measures such as duration, start frequency, end frequency, etc.

  48. Bat Chirp Data • One could simply take each pixel of the spectrogram and do t-tests, with FDR control • This would ignore the replicate data, would ignore the correlated nature of the data, would do no dimension reduction, etc. • What did the biologist want to know? Kisi Bohn

  49. Bat Chirp Data • She wanted to know if the bats from the same species (City Bats and Aggie Bats) evolved and have different vocalizations • What did we want to do: • Answer her question precisely, and let her tell a story (the marginal question, imprecisely framed) • Use all the data • Understand the variability

  50. Bat Chirp Data • We wavelet transformed the spectrograms, fit a 2-D hierarchical WFFM, transformed back, and did analysis of the results (see next)

More Related