300 likes | 416 Views
Network Biology Data Biological, Conceptual and Computational Issues around Network, System, and Pathway Data The Abstract and The Concrete. Topic Outline. Lessons from Genome Program and Abstract Ideas to transform data to information when looking at systems data.
E N D
Network Biology Data Biological, Conceptual and Computational Issues around Network, System, and Pathway DataThe Abstract andThe Concrete
Topic Outline • Lessons from Genome Program and Abstract Ideas to transform data to information when looking at systems data. • Two examples of Concrete Tools (ready for use) • WebGestalt (for large sets of genes) • Ingenuity (for networks) • A Concrete Thing: Bioinformatics Resource Center (under development) • Other tools under development
Genome-encoded “parts list” as data integrator.-Common Data Elements of gene and gene Products of transcripts and proteins. Enabling Integration and Comparison of data in NEW ways… Human Genome Project (HGP): Past Lessons and Future Directions in Data… Individualized Genotype data within populations Genome Data Phenotype and System Data GeneKeyDB and related work as an integrative foundation that can help merge with other data.
HGP Highlighted some ways to succeed or fail with large data sets. ? Lessons Learned applicable for systems bio of expression, proteomics, genetic data sets? Yes. ?But, are some new approaches needed to understand SYSTEM data? Yes. Genome Data`
Biggest Lesson: A Biodata item has 2 questions attached to it…Mayr…HGP showed importance of the why questions in thinking about and organizing data. Other genotype, phenotype, system data Genome Data A datum… How? Why?
HGP results and Future Issues for new data…. Genotype + Environment + DEVELOPMENT ==> Phenotype 1) Astounding Results Importance of Network thinking in development and physiology for data to explain phenotype (e.g. PAX6) 2) Some relevance from HGP data approaches, but…Need new bioinformatics tools for network data and thinking…
Δ data in Cellular signaling networks Δ data in Regulatory networks Δ data in protein coding
Δ data in Cellular signaling networks A way of thinking about data… Bioinformatics: Finding the (genotypic, environmental data) difference that makes the (phenotypic data) difference. (Many differences that make an interesting difference, NOT at protein coding, but at complex networks) Δ data in Regulatory networks Δ data in protein coding
A Biological network can be expressed and manipulated in terms of “graph theory.” Combinatorial algorithms are needed to analyze graphs. 1.7 + 1.2 + 0.9 + What is a “Network” way of viewing data… • Nodes or Vertices • May be • Genes • Gene products • Hormones, signals • Metabolites • Publications • Functional Sequence Elements • Edges or Lines • may be • Undirected vs. directed • Weighted vs. unweighted. • Could be… • Co-expression Networks • Gene Regulatory networks • Cell-Cell communication and signal transduction networks. • Phylogenetic relationships among genes, species, networks: orthology, paralogy, etc. (trees, clades, etc.) • Gene Ontology or other Directed Acyclic Graphs. e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101
A Biological network can be expressed and manipulated in terms of “graph theory.” Combinatorial algorithms are needed to analyze graphs. 1.7 + 1.2 + 0.9 + What is a “Network” way of viewing data… • Nodes or Vertices • May be • Genes • Gene products • Hormones, signals • Metabolites • Publications • Functional Sequence Elements • Edges or Lines • may be • Undirected vs. directed • Weighted vs. unweighted. • Experimental correlation (can be undirected) vs. mechanistic & directed Tightly connected modules might be found… Might be loosely analogous to a protein sequence module that is conserved, duplicated, and diverged. Might see similarity across different tissue, species, etc. e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101
Existing Knowledge Large Molecular data sets Genetic Data Phenotype Data WebQTL Williams et al UTHSC Microarray data, proteome, etc. MuTrack GeneKeyDB Gene-centered data integration (via GeneKEyDB, BioFoundation) Comparative, Boolean, other operations on Gene Sets & Networks WebGestalt and Ingenuity are two examples Network modules: Duplicated Diverged Converged Network Analysis CS, Stats, Bio Sequence and Network Modularity Comparative Cladistic Phylogenetic Analysis Graph Algorithms Need to collaborate, integrate, and COMPARE to find differences in biological NETWORKS.Collaborative, Integrative, and Comparative Bioinformatics Data Storage & Collaborative Bioinformatics Integrative Bioinformatics Genotype & Phenotype Data Sets Data Visualization & Stats Comparative Bioinformatics & Data Mining
WebGestaltWeb-based Gene SetAnalysis Toolkit http://bioinfo.vanderbilt.edu/webgestalt Bing Zhang
Can upload gene sets based on • IDs (e.g. affy, locus link, protein IDs from chip, proteome, etc.) • Genome Location • Or… • 3) Gene Ontology • (common biological process, molecular function, cellular location)
Manipulate data, as set of genes or gene productsRNA expression, proteome, genomics, statistical genetics, etc. all produce list of genes that may function in a network.
1 of 3 things to doBoolean operations on multiple sets or retrieving orthologs.
2 of 3 things to doRetrieve Data and other IDs 1 of 3 things to do
e.g. What GO (biological processes, molecular functions, and cellular locations) are in the set? Are they any that seem to occur more than than expected…
Ingenuity • A commercial tool for manipulating graphs (networks). VU License http://bioinfo.vanderbilt.edu/wiki/Ingenuity • (Also some open source tools, cytoscape, GeNetViz, etc. )
Use of Commercial tool, Ingenuity by Dr N. Deanne and Dr. Beauchamp Pathways (3)
Bioinformatics Resource Center • Developing a Bioinformatics Resource Center (BRC) that will consist • Training infrastructure and applied workshops • Support faculty using existing tools and databases (CaBIG, custom statistical packages, NCBI genomics, imaging,molecular structure resources). • Collaborative IT • Establish accessible databases in shared cores and support faculty using these resources. … • Integrative IT • Web sites that integrate information from disparate data sets: • Comparative IT • Systems biology: comparing data across multiple platforms to identify new patterns—tissues and cells, molecular pathways, model organisms, toxins, etc (taken from VUMC Strategic Plan).
Other systems… • Construction projects that can be further formed by your needs… • CollabCore and Lab Blogs • Genepedia, • GeneKeyDB, BioFoundation • Extensions to Webgestalt • TFCAT, GeneCAT, CladeCAT, Pazar
Bing Zhang Stefan Kirov Leslie Galloway Barbara Jackson Betty Lou Alspaugh Oakley Crawford Suzanne Baktash Xinxia Peng Harold Shanafield Sam Wang Adam Tebbe Shawn Ericson Jeff Horner A few collaborators… Bonnie LaFleur Shawn Levy Phil Dexheimer Michael Langston CS collaborator Wyeth Wasserman Dan Goldowitz and the TMGC Rob Williams et al WebQtl, etc. Erich Baker Dan Beauchamp Natasha Deanne Chad Johnson Acknowledgments