330 likes | 344 Views
This study presents a new statistical method for analyzing longitudinal multifactor expression data, specifically applied to time course burn data. The method identifies genes that respond differently to burn based on gender and age factors. The analysis involves data preprocessing, gene classification, and GO enrichment analysis.
E N D
A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It’s Application to Time Course Burn Data Baiyu Zhou Department of Statistics Stanford University 10/06/2008
Outline • Data description • Brief review: current statistical methods • Proposed statistical method • Application on Burn data
Data Description • Two data sets: (1) burn + gender (2) burn + age • (1) burn + gender • (2) burn + age Age effect on burn patients Gender effect on burn patients • Gene expression from each patient (blood) was measured at different • time points after burn. The data sets are longitudinal (time course) and involve multiple factors (Burn/control; gender or age)
Brief Review : Currentmethods (1) • Time course microarray data analysis • Time course clustering. Identify co-expressed genes • Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9 • Fit smooth function, use gene specific summary statistic to characterize • the significance of change over time or between biological conditions • Storey et al., Proc Natl Acad Sci U S A. 2005 Sep 6; 102(36):12837-42. • Empirical Bayes method to rank differentially expressed genes between • biological conditions. • Tai et al. Annals of Statistics 34(5), 2387–2412.
Multifactor microarray data analysis • ANOVA for gene selection • Pavlidis et al., Methods. 2003;31:282–289. • Nonparametric ANOVA, but has restrictions on # of replicates and noise • distribution • Gao et al., Bioinformatics 2006 22(12):1486-1494; • We have developed a non paremetric ANOVA (NANOVA) method and • gene classification algorithmfor microarray data analysis • easily handle balanced/unbalanced experiment design • free of distributional assumption • estimating FDR • robust to outliers • Zhou et al., in manuscript Brief Review : Currentmethods (2) There is no existing method for analyzing longitudinal multifactor expression data !
Methodology be a gene expression from an individual over p time Let • points. Each individual is associated with two factors (e.g. gender; burn). • We want to identify genes : • respond differently for male and female burn patients • (2) Respond to burn • ...... • Some genes might respond to burn at : • Early stage • Late stage • Which time point to use? (t1, t2 ….tp or their average ?) • We call (1), (2) … ANOVA structures (interaction effect, main effect). • In p-dimensional space, there is a direction on which the interested ANOVA • structure is most prominent . We first estimate this direction , project data • into the estimated direction and perform NANOVA analysis and gene • classification algorithm.
Gene Classification We use NANOVA to classify genes into 5 classes by factor effects • C1 (interaction): factor effects are dependant • C2 (additive): have both factor effects, but factors are independent • C3 ( effect): have only effect • C4 ( effect): have only effect • C5 : no factor effects
Burn Data Analysis • Data preprocessing • In our analysis, we used two time points : early and middle stage. Only used patients • have both data points. • Filtering probe sets : CV (coefficient of variation) > 0.5; median expression > 50
Burn Data Analysis • After applying the proposed method, we classified genes (probes) into different • gene sets (FDR = 0.05 ) • Burn effect is dominating • Burn effect is dependant on age for a large set of genes • gender has a smaller effect than age in burn patients.
C1 Genes Have burn and age/gender effect. Burn effect is dependant on age/gender Red: burn; green: control; circle: adult; triangle: children Each point is a group mean (e.g. burn children)
C2 Genes Have burn and age/gender effect. Burn effect is independent of age/gender Red: burn; green: control: circle: adult; triangle: children
C3 Genes Only have burn effect. No age/gender effect Red: burn; green: control: circle: adult; triangle: children
C4 Genes Only have age/gender effect. No Burn effect Red: burn; green: control: circle: adult; triangle: children
GO Enrichment Analysis Top ranking pathways in C3 ( Burn + gender) http://david.abcc.ncifcrf.gov/
GO Enrichment Analysis Top ranking pathways in C3 ( Burn + Age) http://david.abcc.ncifcrf.gov/
GO Enrichment Analysis Top ranking pathways in C2 ( Burn + Gender) Top ranking pathways in C2 ( Burn + Age) http://david.abcc.ncifcrf.gov/
GO Enrichment Analysis Top ranking pathways in C1 ( Burn + Age) http://david.abcc.ncifcrf.gov/
A Few Interesting Pathways Some pathways are important for burn patients. Although they don’t have gender difference, they are very different in adults and children patients.
Interpretation of Projection Direction • The projection direction is gene specific • The following 4 genes are from C3 ( Burn + Gender) • Burn effect is most prominent: • At early stage • At middle stage • on the average of the two stages • on the change of the gene expression between early stage and middle stage • The projection direction contains temporal information of gene expression • (1) which time points are important • (2) what kind of patterns (e.g. average or change) are important
Temporal Information in Projection Direction We did GO analysis on 200 probe sets from C3 (Burn + Gender), which have (1) strong early stage signals or (2) Strong middle stage signals • Enriched in acute response genes: kinase cascade, immune response …… • Enriched in DNA repair, metabolism, cell cycle genes ……
Temporal Information of Pathways Projection direction contains temporal information about pathways Example 1: T cell receptor signaling pathway ( C3 of Burn + Gender) Most genes cluster together. Projection direction indicates importance in both early and middle stage
Temporal Information of Pathways Example 2: Hematopoietic cell lineage ( C3 of Burn + Gender) Most genes form sub clusters. It might be interesting to analyze these two sub clusters of genes.
Summary • A new approach to analyze longitudinal mutifactor expression data • (1) Classify genes into different gene sets based on factor effects, suited for • explorative study • (2) The projection direction contains temporal information • Application on burn data pointed out some important genes/pathways and • their roles in male/female or adult/children burn patients.
References • Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9 • Storey et al., Proc Natl Acad Sci USA. 2005 Sep 6; 102(36):12837-42. • Tai et al. Annals of Statistics 34(5), 2387–2412. • Pavlidis et al., Methods. 2003;31:282–289. • Gao et al., Bioinformatics 2006 22(12):1486-1494. • Anderson et al., Ann. Statist. Volume 13, Number 2 (1985) • Dennis et al., Genome Biology 2003; 4(5):P3
Acknowledgement • Wing Wong • Weihong Xu, Wenzhong Xiao • Ted Anderson