90 likes | 190 Views
Exploratory Data Analysis Software for Systems Biology. explo R ase. Michael Lawrence Co-Major Professors: Di Cook and Eve Wurtele. Outline. Background Exploratory Data Analysis Software Foundation Systems Biology Introduction to explo R ase
E N D
Exploratory Data Analysis Software for Systems Biology exploRase Michael Lawrence Co-Major Professors: Di Cook and Eve Wurtele
Outline • Background • Exploratory Data Analysis • Software Foundation • Systems Biology • Introduction to exploRase • Explore some data (demo) • The Future • Acknowledgments • explo • R • ase
Exploratory Data Analysis • Exploratory Data Analysis (EDA) is the art of being a data detective. • Instead of working to confirm a single hypothesis, we approach the data with a loose set of questions and/or suspicions and remain open to the unexpected. • We look for trends and patterns, as well as deviations from those expected. • Interactive graphics facilitate fluid data exploration and link different views of the data by visual cues.
Software Foundation • R – Platform for statistical computing. Provides flexible data analysis via scripts. • GGobi – Leading open-source interactive graphics software. Mature and actively developed. Based on the GTK+ toolkit. • Rggobi2 – Link between R and GGobi. R's scripted analysis meets GGobi's interactive graphics. Overhaul of previous version. • RGtk2 – Binding between R and GTK (open-source GUI toolkit). Allows GUI's to be written in R and integrated with GGobi's GUI via Rggobi2.
Systems Biology • The highly computational study of biology as a complex (biochemical) system. • Based on the analysis and modeling of metabolic, regulatory, and protein interaction networks. • Networks are often inferred from or analyzed in conjunction with high-throughput data (transcriptomics, proteomics, metabolomics) describing system states.
exploRase is... • An RGtk2-based graphical user interface that makes advanced statistical tools like R and GGobi more accessible to a systems biologist. • Facilitates loading of experimental data into R and GGobi. • Will serve as a front-end to BioConductor (R software for Bioinformatics).
Let's go exploring... • Dataset : Arabidopsis transcriptomic data from 8k Affymetrix chips. • 2 Genotypes: WT and bio1 (deficient in biotin synth) • Half of samples given biotin, total of 4 treatments. • 2 replicates • Data Source: Basil Nikolau's lab
The Future • Integrate with an interactive network visualization tool (for example: Cytoscape) • Add more analysis methods and data aggregation utilities, with emphasis on interactivity • Refine user interface • Context sensitive help and wizards • Integration with other MetNet tools
Acknowledgments • GGobi: Debby Swayne (AT&T), Di Cook, Duncan Temple Lang (UC-Davis), Andreas Buja (Wharton), Heike Hofmann, Michael Lawrence, Hadley Wickham • Rggobi2: Hadley Wickham (R), Michael Lawrence (C), Original: Duncan Temple Lang, Debby Swayne • RGtk2: Michael Lawrence, Original: Duncan Temple • Money: MGET Fellowship, NSF Arabidopsis 2010