1 / 33

Statistical Method for Analyzing Longitudinal Multifactor Expression Data: Application to Time Course Burn Data

This study presents a new statistical method for analyzing longitudinal multifactor expression data, specifically applied to time course burn data. The method identifies genes that respond differently to burn based on gender and age factors. The analysis involves data preprocessing, gene classification, and GO enrichment analysis.

damond
Download Presentation

Statistical Method for Analyzing Longitudinal Multifactor Expression Data: Application to Time Course Burn Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It’s Application to Time Course Burn Data Baiyu Zhou Department of Statistics Stanford University 10/06/2008

  2. Outline • Data description • Brief review: current statistical methods • Proposed statistical method • Application on Burn data

  3. Data Description • Two data sets: (1) burn + gender (2) burn + age • (1) burn + gender • (2) burn + age Age effect on burn patients Gender effect on burn patients • Gene expression from each patient (blood) was measured at different • time points after burn. The data sets are longitudinal (time course) and involve multiple factors (Burn/control; gender or age)

  4. Brief Review : Currentmethods (1) • Time course microarray data analysis • Time course clustering. Identify co-expressed genes • Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9 • Fit smooth function, use gene specific summary statistic to characterize • the significance of change over time or between biological conditions • Storey et al., Proc Natl Acad Sci U S A. 2005 Sep 6; 102(36):12837-42. • Empirical Bayes method to rank differentially expressed genes between • biological conditions. • Tai et al. Annals of Statistics 34(5), 2387–2412.

  5. Multifactor microarray data analysis • ANOVA for gene selection • Pavlidis et al., Methods. 2003;31:282–289. • Nonparametric ANOVA, but has restrictions on # of replicates and noise • distribution • Gao et al., Bioinformatics 2006 22(12):1486-1494; • We have developed a non paremetric ANOVA (NANOVA) method and • gene classification algorithmfor microarray data analysis • easily handle balanced/unbalanced experiment design • free of distributional assumption • estimating FDR • robust to outliers • Zhou et al., in manuscript Brief Review : Currentmethods (2) There is no existing method for analyzing longitudinal multifactor expression data !

  6. Methodology be a gene expression from an individual over p time Let • points. Each individual is associated with two factors (e.g. gender; burn). • We want to identify genes : • respond differently for male and female burn patients • (2) Respond to burn • ...... • Some genes might respond to burn at : • Early stage • Late stage • Which time point to use? (t1, t2 ….tp or their average ?) • We call (1), (2) … ANOVA structures (interaction effect, main effect). • In p-dimensional space, there is a direction on which the interested ANOVA • structure is most prominent . We first estimate this direction , project data • into the estimated direction and perform NANOVA analysis and gene • classification algorithm.

  7. Gene Classification We use NANOVA to classify genes into 5 classes by factor effects • C1 (interaction): factor effects are dependant • C2 (additive): have both factor effects, but factors are independent • C3 ( effect): have only effect • C4 ( effect): have only effect • C5 : no factor effects

  8. Burn Data Analysis • Data preprocessing • In our analysis, we used two time points : early and middle stage. Only used patients • have both data points. • Filtering probe sets : CV (coefficient of variation) > 0.5; median expression > 50

  9. Burn Data Analysis • After applying the proposed method, we classified genes (probes) into different • gene sets (FDR = 0.05 ) • Burn effect is dominating • Burn effect is dependant on age for a large set of genes • gender has a smaller effect than age in burn patients.

  10. C1 Genes Have burn and age/gender effect. Burn effect is dependant on age/gender Red: burn; green: control; circle: adult; triangle: children Each point is a group mean (e.g. burn children)

  11. Top ranking C1 genes : Burn + Gender

  12. Top ranking C1 genes : Burn + Age

  13. C2 Genes Have burn and age/gender effect. Burn effect is independent of age/gender Red: burn; green: control: circle: adult; triangle: children

  14. Top ranking C2 genes : Burn + Gender

  15. Top ranking C2 genes : Burn + Age

  16. C3 Genes Only have burn effect. No age/gender effect Red: burn; green: control: circle: adult; triangle: children

  17. Top ranking C3 genes : Burn + Gender

  18. Top ranking C3 genes : Burn + Age

  19. C4 Genes Only have age/gender effect. No Burn effect Red: burn; green: control: circle: adult; triangle: children

  20. Top ranking C4 genes : Burn + Gender

  21. Top ranking C4 genes : Burn + Age

  22. GO Enrichment Analysis Top ranking pathways in C3 ( Burn + gender) http://david.abcc.ncifcrf.gov/

  23. GO Enrichment Analysis Top ranking pathways in C3 ( Burn + Age) http://david.abcc.ncifcrf.gov/

  24. GO Enrichment Analysis Top ranking pathways in C2 ( Burn + Gender) Top ranking pathways in C2 ( Burn + Age) http://david.abcc.ncifcrf.gov/

  25. GO Enrichment Analysis Top ranking pathways in C1 ( Burn + Age) http://david.abcc.ncifcrf.gov/

  26. A Few Interesting Pathways Some pathways are important for burn patients. Although they don’t have gender difference, they are very different in adults and children patients.

  27. Interpretation of Projection Direction • The projection direction is gene specific • The following 4 genes are from C3 ( Burn + Gender) • Burn effect is most prominent: • At early stage • At middle stage • on the average of the two stages • on the change of the gene expression between early stage and middle stage • The projection direction contains temporal information of gene expression • (1) which time points are important • (2) what kind of patterns (e.g. average or change) are important

  28. Temporal Information in Projection Direction We did GO analysis on 200 probe sets from C3 (Burn + Gender), which have (1) strong early stage signals or (2) Strong middle stage signals • Enriched in acute response genes: kinase cascade, immune response …… • Enriched in DNA repair, metabolism, cell cycle genes ……

  29. Temporal Information of Pathways Projection direction contains temporal information about pathways Example 1: T cell receptor signaling pathway ( C3 of Burn + Gender) Most genes cluster together. Projection direction indicates importance in both early and middle stage

  30. Temporal Information of Pathways Example 2: Hematopoietic cell lineage ( C3 of Burn + Gender) Most genes form sub clusters. It might be interesting to analyze these two sub clusters of genes.

  31. Summary • A new approach to analyze longitudinal mutifactor expression data • (1) Classify genes into different gene sets based on factor effects, suited for • explorative study • (2) The projection direction contains temporal information • Application on burn data pointed out some important genes/pathways and • their roles in male/female or adult/children burn patients.

  32. References • Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9 • Storey et al., Proc Natl Acad Sci USA. 2005 Sep 6; 102(36):12837-42. • Tai et al. Annals of Statistics 34(5), 2387–2412. • Pavlidis et al., Methods. 2003;31:282–289. • Gao et al., Bioinformatics 2006 22(12):1486-1494. • Anderson et al., Ann. Statist. Volume 13, Number 2 (1985) • Dennis et al., Genome Biology 2003; 4(5):P3

  33. Acknowledgement • Wing Wong • Weihong Xu, Wenzhong Xiao • Ted Anderson

More Related