290 likes | 546 Views
Bioinformatics in the Department of Computer Science. Lenwood S. Heath Department of Computer Science Blacksburg, VA 24061. College of Engineering Northern Virginia Engineering Showcase March 5, 2004. Bioinformatics Faculty. Layne Watson. Cliff Shaffer. Naren Ramakrishnan.
E N D
Bioinformatics in the Department of Computer Science Lenwood S. Heath Department of Computer Science Blacksburg, VA 24061 College of Engineering Northern Virginia Engineering Showcase March 5, 2004
Bioinformatics Faculty Layne Watson Cliff Shaffer Naren Ramakrishnan Alexey Onufriev Roger Ehrich Eunice Santos Chris North Adrian Sandu Lenny Heath T. M. Murali Joao Setubal, CS and VBI
Relevant Expertise • Algorithms— Heath, Santos, Setubal, Shaffer, Watson • Computational structural biology — Onufriev, Sandu • Computational systems biology — Murali • Data mining — Ramakrishnan • Genomics — Heath, Murali, Ramakrishnan • Human-omputer interaction, visualization — North • Image processing — Ehrich, Watson • High performance computing— Sandu, Santos, Watson • Numerical analysis — Onufriev, Watson • Optimization — Watson • Problem solving environments — Ramakrishnan, Shaffer 3/5/2004 Bioinformatics in Computer Science
Selected Collaborations • Virginia Tech: Biochemistry, Biology, Fralin Biotechnology Center, Plant Physiology, Veterinary Medicine, Virginia Bioinformatics Institute (VBI), Wood Science • North Carolina State University: Forest Biotechnology Center • Duke: Biology • University of Illinois: Plant Biology 3/5/2004 Bioinformatics in Computer Science
Selected Funding • NSF IBN 0219322: ITR: Understanding Stress Resistance Mechanisms in Plants: Multimodal Models Integrating Experimental Data, Databases, and the Literature. L. S.Heath;R. Grene, B. I. Chevone,N. Ramakrishnan,L. T. Watson.$499,973. • NSFEIA-01903660: A Microarray Experiment Management System. N. Ramakrishnan, L. S. Heath, L. T. Watson,R. Grene,J. W. Weller (VBI). $600,000. • DARPAN00014-01-1-0852: Dryophile Genes to Engineer Stasis-Recovery of Human Cells. M. Potts,L. S. Heath,R. F. Helm, N. Ramakrishnan, T. O. Sitz, F. Bloom, P. Price (Life Technologies), J. Battista (LSU). $4,532,622. • NSF MCB-0083315: Biocomplexity---Incubation Activity: A Collaborative Problem Solving Environment for Computational Modeling of Eukaryotic Cell Cycle Controls. J. J. Tyson,L. T. Watson, N. Ramakrishnan, C. A. Shaffer,J. C. Sible.$99,965. • NIH 1 R01 GM64339-01: ``Problem Solving Environment for Modeling the Cell Cycle. J. J. Tyson, J. Sible, K. Chen,L. T. Watson, C. A. Shaffer, N. Ramakrishnan,P. Mendes (VBI). 211,038. • Air Force Research Laboratory F30602-01-2-0572: The Eukaryotic Cell Cycle as a Test Case for Modeling Cellular Regulation in a Collaborative Problem Solving Environment. J. J. Tyson, J. C. Sible, K. C. Chen,L. T. Watson, C. A. Shaffer, N. Ramakrishnan.$1,650,000.
Research Resources System X • Third fastest computer on the planet Laboratory for Advanced Scientific Computing & Applications (LASCA) • Parallel algorithms & math software • Anantham Cluster • Grid computing Bioinformatics Research LAN • Linux, Mac OS X, Windows • Bioinformatics databases and analysis 3/5/2004 Bioinformatics in Computer Science
JigCell: A PSE for Eukaryotic Cell Cycle Controls Marc Vass, Nick Allen, Jason Zwolak, Dan Moisa, Clifford A. Shaffer, Layne T. Watson, Naren Ramakrishnan, and John J. Tyson Departments of Computer Science and Biology 3/5/2004 Bioinformatics in Computer Science
DNA …TACCCGATGGCGAAATGC... mRNA …AUGGGCUACCGCUUUACG... …Met - Gly - Tyr - Arg - Phe - Thr... Protein -P Enzyme ATP ADP E4 E1 E3 Reaction Network X Y Z E2 Cell Physiology Computational Molecular Biology 3/5/2004 Bioinformatics in Computer Science
Cell Cycle of Budding Yeast Cln2 Clb2 Clb5 Sic1 Sic1 P Sister chromatid separation Cdc20 PPX Lte1 Esp1 Budding Pds1 Tem1 Esp1 Net1P Esp1 Bub2 Cdc15 Cln2 SBF Unaligned chromosomes Pds1 SBF Net1 RENT Mcm1 Unaligned chromosomes Cdh1 Mcm1 Cdc20 Mad2 Cdc20 Cdc14 Cln3 Cdc15 and Bck2 Cdh1 Mcm1 APC Clb2 Cdc14 growth CDKs Swi5 SCF Cdc14 ? Cdc20 MBF Clb5 Esp1 DNA synthesis
JigCell Problem-Solving Environment Experimental Database WiringDiagram DifferentialEquations ParameterValues Simulation Analysis Visualization Automatic Parameter Estimation 3/5/2004 Bioinformatics in Computer Science
Why do these calculations? • Is the model “yeast-shaped”? • Bioinformatics role: the model organizes experimental information. • New science: prediction, insight JigCell is part of the DARPA BioSPICE suite of software tools for computational cell biology. 3/5/2004 Bioinformatics in Computer Science
Expresso: A Next Generation Software System for Microarray Experiment Management and Data Analysis 3/5/2004 Bioinformatics in Computer Science
Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis • Integration of design, experimentation, and analysis • Data mining; inductive logic programming (ILP) • Closing the loop • Drought stress experiments with pine trees and Arabidopsis 3/5/2004 Bioinformatics in Computer Science
Scenarios for Effects of Abiotic Stress on Gene Expression in Plants 3/5/2004 Bioinformatics in Computer Science
Data Mining with ILP • ILP (inductive logic programming) is a data mining algorithm for inferring relationships or rules. • ILP groups related data and chooses in favor of relationships having short descriptions. • ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications). • Hybrid reasoning: Information Integration • “Is there a relationship between genes in a given functional category and genes in a particular expression cluster?” • ILP mines this information in a single step 3/5/2004 Bioinformatics in Computer Science
Rule Inference in ILP • Infers rules relating gene expression levels to categories, both within a probe pair and across probe pairs, without explicit direction • Example Rule: • [Rule 142] [Pos cover = 69 Neg cover = 3] • level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive). • Interpretation: • “If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.” 3/5/2004 Bioinformatics in Computer Science
ILP in the Expresso Pipeline Expresso is a next generation software system for microarray experiments that provides a database interface to ILP functionality. 3/5/2004 Bioinformatics in Computer Science
Status of Expresso • Capabilities • Data capture and storage • Statistical analysis • Data mining by ILP • Microarray experiment design — GeneSieve • Expresso-assisted experiment composition • Closing the experimental loop • Successful microarray experiment analysis • Pine, Norway spruce, yeast, Deinococcus radiodurans (an extremophile microorganism), human cell lines • Planned microarray experiment analysis • Potato, Arabidopsis thaliana, tomato, rice, corn 3/5/2004 Bioinformatics in Computer Science
Networks in Bioinformatics • Mathematical Model(s) for Biological Networks • Representation: What biological entities and parameters to represent and at what level of granularity? • Operations and Computations: What manipulations and transformations are supported? • Presentation: How can biologists visualize and explore networks? 3/5/2004 Bioinformatics in Computer Science
Reconciling Networks Munnik and Meijer, FEBS Letters, 2001 Shinozaki and Yamaguchi-Shinozaki, Current Opinion in Plant Biology, 2000 3/5/2004 Bioinformatics in Computer Science
Multimodal Networks • Nodes and edges have flexible semantics to represent: • Time • Uncertainty • Cellular decision making; process regulation • Cell topology and compartmentalization • Rate constants • Phylogeny • Hierarchical 3/5/2004 Bioinformatics in Computer Science
Using Multimodal Networks • Help biologists find new biological knowledge • Visualize and explore • Generating hypotheses and experiments • Predict regulatory phenomena • Predict responses to stress • Incorporate into Expresso as part of closing the loop 3/5/2004 Bioinformatics in Computer Science
Conclusions • Engaged faculty with the right expertise • Numerous life science collaborations • Federal research funding • First-class computational resources • A variety of cutting-edge bioinformatics research projects 3/5/2004 Bioinformatics in Computer Science
Bioinformatics Education • Courses in Computer Science • Courses in the Life Sciences • Bioinformatics Option • Doctoral Program in Genetics, Bioinformatics, and Computational Biology 3/5/2004 Bioinformatics in Computer Science
Doctoral Program in Genetics, Bioinformatics, and Computational Biology Multidisciplinary: biology, biochemistry, crop science, plant physiology, computer science, mathematics, statistics, veterinary medicine 3/5/2004 Bioinformatics in Computer Science
Anantham Cluster • Previous cluster specs • 200 AMD 1 GHz processors • 1 GB RAM per processor • 2 TB disk space • 2.56 Gb/s Myrinet network Previous 200 processor cluster 3/5/2004 Bioinformatics in Computer Science