340 likes | 359 Views
Explore the virtual realm of microarray data analysis with BioVLAB, a high-level workflow execution workbench powered by computing clouds. This tool simplifies the complex process of gene expression analysis for biologists and small research labs, offering on-demand resources and a suite of applications within a reconfigurable GUI. Discover the dynamics of cell activity, gene interaction networks, and protein function predictions through this efficient and user-friendly platform.
E N D
BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun Kim School of Informatics Indiana University
CONTENTS • Introduction • Approach • Related Works • Microarray technology • System Architecture • Experiments • Conclusion • Demo
INTRODUCTION • Analysis of high throughput microarray experiment • Performing microarray analysis is a demanding task for biologists and small research labs • Computing infrastructure issue • Computationally intensive • Nontrivial to integrate various bioinformatics applications • Exploratory data analysis issue • Multiple tasks in a single batch • Repetitive execution
APPROACH • On-demand computing resources • A suite of microarray analysis applications • Reconfigurable GUI workflow composer can alleviate technical burden • Well defined workflow can be repetitively used • Web portal • Reusable, reconfigurable, high-level workflow execution workbench powered by computing clouds for microarray gene expression analyses
RELATED WORKS • Efficient and user-friendly workflow composers and execution engine • SIBIOS, BioWBI, KDE Bioscience • Distributed and heterogeneous computing resources + Workflow system • Taverna, Triana, Kepler, GNARE, RENCI-Bioportal
MICROARRAY TECHNOLOGY • A subset of genes is expressed corresponding to environmental changes and its changing needs • Dynamics of cell activity • Measure gene expression levels of hundreds of thousands of genes within a cell • Usage • Function prediction: Guilt by association • Interaction: Co-expression of genes in transcription networks reveals how they interact. • Drug discovery: Identify genes related to certain disease and detect effectiveness of new drugs Source: www.liv.ac.uk/lmf/about_microarrays.htm
RESEARCH GOALS • Gene expression analysis • Search for similar patterns of genes • Similar patterns of gene may reveal the function of a gene with unknown function • Extraction of differentially expressed genes • Statistical evaluation • Clustering • Protein function prediction • Genes with similar expression may need to be studied as a group • Component analysis • Hidden structure of expression patterns may be revealed • Expression network analysis • Expose hidden structures • Protein-protein interaction (PPI) network analysis • Central issue: key role in understanding how a cellular system works • Modularity in structure in a network may reflect higher-level functional organization of cellular components
MICROARRAY ANALYSIS COMMON TASK • Output of a task can plugged into another task • Repeat the same set of tasks with small changes of parameters
SYSTEM ARCHITECTURE • Workflow composer and execution engine • Application services • Web portal Application Services Workflow Composer & Execution Execute Create Manage Data Web Portal
WORKFLOW COMPOSER & EXECUTION ENGINE • Introduced in the scientific communities to execute a batch of multiple tasks • Enables repetitive tasks easily • Directed acyclic graph • Node: application to execute • Starting node: input • End node: output • Edge: a flow of data Input Task A Task B Task C Output
XBaya • GUI Workflow composer and execution engine • Developed at IU • Drag-and-drop compose from workbench • Monitor status of workflow execution Workbench Panel Workflow Composer Panel Drag-and-drop Application Information Panel Monitor Panel
APPLICATION SERVICES • Interoperability among applications can be achieved by Application Services • Generic Service Toolkit (Gfac) • Gfac converts command-line bioinformatics application into a web service • On-demand computing resources • Amazon Elastic Computing Cloud (EC2) • Remote storage services • Amazon Simple Storage Services (S3) • Microsoft Application-Based Storage
BioVLAB APPLICATIONDEVELOPMENT PROCEDURE User • Develop a command line app. Gfac Registration form Admin • Install the app. in Amazon EC2 • Let the app. store any output to Amazon S3 / Microsoft Application-Based Storage • Make a virtual machine image • Register the app. by using Gfac User • Instantiate EC2 and run the app. by using XBaya (Gfac user manual)
WEB PORTAL • Adiministrator • Management of registered applications by Gfac registry portlet • User management and access control • User • access of stored data • Built by Open Grid Computing Environments (OGCE)
ANALYSIS RESOURCES • R: statistical learning • Bioconductor: microarray analysis • Data acquisition: NCBI GEO Microarray DB • Similar expression pattern: correlation • Differentially expressed gene: limma package • Clustering: K-means, hierarchical clustering, QT clustering, biclustering, Self organizing map (SOM) • Component Analysis: principal component analysis (PCA) and Independent component analysis (ICA) • Network: Database of Interacting Proteins (DIP), Perl Graph package and GraphViz
EXPERIMENT • Data set: GDS38 • Remotely retrieved from the NCBI GEO database • Time-series gene expression data to observe cell cycle in Saccharomyces cerevisiae yeast genome. • 7680 spots in each 16 samples • Each sample was taken every 7 minutes as cell went through cell cycle. • Expression analysis • PPI network analysis
CONCLUSION • Microarray data analysis in virtual environment • Coupling computing clouds and GUI workflow engine • Effective system design for small research labs
FUTURE WORKS • Integration of more packages and analyses • A system of great flexibility • Integrate various high throughput data • Microarray, mass spectronomy, massively parallel sequencing, etc • Integrate various computing resources • Clouds, grid, and multi-core PCs • Integrate various public resources • NCBI, KEGG, PDB, etc
ACKNOWLEDGEMENT • The work is partially supported by NSF MCB 0731950 and a MetaCyt Microbial Systems Biology grant from Lilly Foundations. • Extreme Computing Group at IU • Suresh Marru, Srinath Perera, and Chathura Herath