270 likes | 408 Views
My first 100 Tb of data. STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP. Ciprian M. Crainiceanu Johns Hopkins University http://www.biostat.jhsph.edu/smnt. Members of the group. Key personnel C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di Senior Students
E N D
My first 100 Tb of data STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP Ciprian M. Crainiceanu Johns Hopkins University http://www.biostat.jhsph.edu/smnt
Members of the group • Key personnel • C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di • Senior Students • V. Zipunnikov, J.-A. Goldsmith • Other statisticians (>20) • Scientific collaborators • Direct collaboration • Solving important scientific problems • Diverse scientific applications
Scientific Collaborators • Susan Bassett – fMRI, Alzheimer’s • Danny Reich – DTI, DCE-MRI, MS • Brian Schwartz – lead exposure, VBM, DTI, white matter imaging • Stewart Mostofsky – fMRI, rsfcMRI, Autism, ADHD, Turrets • Naresh Punjabi – EEG, sleep, sleep diseases • Dzung Pham / PilouBazin – Cortical shape, thickness, lesion detection, MS • Dean Wong – PET, fMRI substance abuse • Susan Resnick –BLSA • Jerry Prince – BLSA, ADNI • Jim Pekar, Peter Van Zijl – 7T MRI, fMRI, rsfcMRI preprocessing, scanner physics • Christos Davatzikos- RAVENS • Susumu Mori – DTI, tractography • Dana Boatman – ECOG, EEG, epilepsy • Graham Redgrave – fMRI, DTI, Huntington’s, anorexia/bulimia • Tudor Badea, Bruno Jednyak – Neuron classification, morphometry, 3D structure and shape • Tom Glass – Gizmos • Merck – EEG, neuroimaging • Pfizer – imaging biomarkers?
Longitudinal Functional Principal Component Analysis (LFPCA) • I=1000, J=4, D=100: 15’ • I=1000, J=8, D=200: 70’ Greven, Crainiceanu, Caffo, Reich, 2010. LFPCA, EJS, to appear
A simple regression formula • Data compression via longitudinal PCA • MoM estimators of covariance matrices, smoothing • Need: all covariance operators • Solution: regress Yij(d)Yik(d’) on 1, Tik, Tij, TikTij, djk
Functional regression • No paper on longitudinal functional regression • No paper published with this data structure • Longitudinal extensions are not “simple” • Technical details are hard without the correct “recipe” for known and published “ingredients” • No available method that scales up Goldsmith, Feder, Crainiceanu, Caffo, Reich, 2010. PFR, JCGS, to appear Goldsmith, Crainiceanu, Caffo, Reich, 2010. LPFR, to appear?
PVD Yi = P ViD + Ei • P is T*A • D is B*F • Vi is A*B • A << T, B << F
Singular Value Decomposition (SVD) summarizes variance One subject Time Subject-specific Data Frequency. Frequency Diagonal Matrix Eigenvariates Eigenfrequencies
Default PVD (Start here) Eigenvariates SVD Subject-specific Data Eigenfrequencies Low rank approximation SVD Population decomposition Stacked across subjects Projecting original data onto population bases ... … Subject-specific Data Caffo BS, Crainiceanu CM, Verduzco G, Joel SE, Mostofsky SH, Bassett SS, Pekar JJ. Two-Stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. NeuroImage (In Press).
Currently: • Deploying PVD to the 1000 Functional Connectomes Project • http://www.nitrc.org/projects/fcon_1000/ • Comparing rsfcMRI in stroke versus normal subjects
Main message, backed by 100Tb of data • Eventually, good tech makes into observational and clinical trials • Longitudinal/Multilevel FDA is the natural next step in FDA • Data is changing the way we do business: availability, size, complexity • Likely: funding will be based much more on relevance than on technical ability