1.02k likes | 1.17k Views
Towards the virtual organism. PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour: elucidate organelle-related pathways. Pathway diagram. WIT database. Major contributions of Pathways databases.
E N D
Towards the virtual organism • PART I: Databases and tools for biochemical pathways • PART II: Relating expression data and pathways • PART III: Guided Tour: elucidate organelle-related pathways
Pathway diagram WIT database
Major contributions of Pathways databases Without context and purpose, information is mere data . - Clement Mok • Information Resource - Literature compilation • Gene Ontology • Sequence and Genome Annotation • Relationship between pathways (function) and chromosomal position • Analysis of Gene Expression Arrays • Understanding Cellular Dynamics • Disease Process Modeling
As when a highly connected node in the internet breaks down, the disruption of p53 has severe consequences. Jeong et al. 2001 Nature
Towards the virtual organism Introduce biochemical pathways resources • What Is There (WIT/PUMA/EMP/ERGO) • Kyoto Enzyclopedia of Genes and Genomes (KEGG) • Signalling Databases • Pathways Database (PathDB) Focus on • Accessability • Database contents and models • Query features • Gene/Protein/Pathway analysis • Visualization Why do all these projects the same thing?
Why do all these projects seem to do the same thing? • Data model is a view of the world • Different database management systems • Tools particular to data model and database management systems • Different content • Analogous to model system approach to biology • E.coli, yeast, C.elegans, Drosophila, Mouse, etc. are all used to provide understanding of human biology • No one system does everything, but concepts and data can often be shared He may have stole that song from me, but I steal from everybody. - Woody Guthrie
WIT/PUMA/EMP System • Argonne National Lab and Integrated Genomics Inc, USA • http://wit.mcs.anl.gov/WIT2/ • Ross Overbeek, Evgeni Selkov, Natalia Maltsev • Team: 7 • WIT is freely downloadable (ftp://ftp.mcs.anl.gov/pub/Genomics/WIT2/)
WIT/PUMA/EMP System Focus on: sequence analysis, annotation of genomes with respect to metabolism • Annotation/Literature database • Blast, PSI-Blast • ClustalW • COG • ProtScale • Transmembrane helices/topology • Prodom • ProSite • Operons (Pairs of close bidirectional best hits)
Ways to go: from genes to pathways Starting from - • Gene/protein sequence • Gene/protein name • Organism/Genome (‘Metabolic reconstruction’) To Pathways of - • Metabolism • DNA • Regulation of metabolism
WIT Pathway Diagrams:Picture Links to further information
WIT Detail pages:Enzyme Name, Reaction EC, Description 4788 3304 Specific Activity 6502 Preparative Protocol 6306 Substrates, Coenzymes, Inhibitors, Modification, Kinetics, Genomes …. 6914 39 9500
Kyoto Encyclopedia of Genes and GenomesKEGG • Institute for Chemical Research, Kyoto University • http://www.genome.ad.jp/kegg/ • Minoru Kanehisa • System development: 9 • Data entry and curation: 18 • Academic users may freely download the package • ftp://kegg.genome.ad.jp/mirror/
KEGG: Data content and statistics • 3705 EC numbers • 11132 Enzyme names • 3794 Substrates • 5284 Metabolic reactions • 113 Pathways • mostly metabolic • 36 Organisms
KEGG: Query capabilities Focus on: display gene-centric data in the context of predefined pathways • Reconstruct pathway maps using blast • Search and color genes, enzymes and compounds in pathway diagrams and ortholog tables • Sequence: blast and fasta • Genome Maps • Generate reaction paths between compounds
´State of the Art´ static Network manually compiled manually drawn textbook knowledge KEGG picture of the glycolysis genes present in E. coli
Representation of Networks static Network manually compiled manually drawn textbook knowledge dynamic Network features complete knowledge restriction of content is up to the user experimental data can be reflected in net structure include user-owned data versus
Pathway related projects KEGG Metabolic Pathways EMP - Enzymes and Metabolic Pathways WIT - Metabolic Reconstruction UM-BBD - Microbial Biocatalysis/Biodegradatation EcoCyc - E. coli Genes and Metabolism SoyBase - Soybean Metabolism Metalgen - Genes and Metabolism Boehringer Mannheim - Biochemical Pathways IUBMB-Nicholson Minimaps PathDB - Plant Metabolic Pathways Metabolic Pathways Regulatory Pathways • KEGG Regulatory Pathways • SPAD - Signal Transduction • CSNDB - Cell Signaling Networks • Yeast Pathways in MIPS • Interactive Fly - Drosophila Genes • GIF_DB - Drosophila Gene Interactions • FlyNets - Drosophila Molecular Interactions • GeNet - Gene Networks Database • HOX-Pro - Homeobox Genes Database • Wnt Signaling Pathway • TRANSPATH - Gene Regulatory Pathways • GenMapp - Mostly mouse pathways Protein-Protein Interactions • BRITE Database for Biomolecular Relations • DIP - Database of Interacting Proteins • BIND - Biomolecular Interaction Network Database
LIGAND - Chemical Database for Enzyme Reactions ENZYME - Enzymes BRENDA - Comprehensive Enzyme Information System Worthington Enzyme Manual Klotho - Biochemical Compounds ChemFinder - Searching Chemicals ChemIDplus at NLM PROMISE - Prosthetic Groups and Metal Ions GlycoSuiteDB - Glycan Structure Database CarbBank - Complex Carbohydrate Structure Database WebElements - Periodic Table Enzymes, Compounds Transcription Factors • TRANSFAC - Transcription Factor Database • RegulonDB - E. coli Transcriptional Regulation • DBTBS - B. subtilis Transcription Factors • DPInteract - DNA binding proteins Nomenclature - General • IUBMB - Nomenclature • IUPAC - Nomenclature • SWISS-PROT - Documents • GO - Gene Ontology (FlyBase/SGD/MGD/TAIR/WormBase)
Simulation of biochemical reactions and cellular process • BioKin - Enzyme kinetic software • BioQuest - Metabolic Simulation • BioSpice - still in progess • Bioxml.org - a site collecting together a number of biologically-oriented open-source projects • DBsolve - Software for metabolic, enzymatic and receptor-ligand binding simulation • DMSS - Scalable, Discrete Event Metabolic Simulation System • E-Cell - A simulation platform for the modelling of cells at a molecular level • Electronic Arc - experimental visual simulator • Elementary Modes - has a Java simulation • Gepasi - A software package for modelling systems of biochemical reactions • Jarnac - A language for describing and manipulating cellular system models • StochSim - A general-purpose stochastic simulator of biological reaction networks. • Systems Biology Workbench - An XML based integration system • Virtual Cell - A general computational framework for modeling cell biological processes
PathDB • National Center for Genome Resources • http://www.ncgr.org/software/pathdb/ • Jeff Blanchard • Software Development: 5 • Literature Curation: 4 • The software is freely available (Client) • The database server can be installed at the site of cooperation partners
PathDB data model • Compounds • Macromolecules: lipids, polysaccharides • Information molecules: DNA, RNA • States: development, disease, genotype, phenotype, environment • metabolic reactions • protein modifications and interactions • Regulation: transcriptional, translational, posttranslational • Transport • biological hierarchies, ontologies • incomplete and conflicting knowledge
PathDB datamodel Location BiolProcess Genotype Phenotype Environment Attributes Subunit Protein Compound DNA RNA Building Blocks Construction of Entities Mediator Biochemical Entity Substrate Step Product Transition of Entities
Platform for Network Analysis Focus on: building custom networks, compare to large scale experiments • Relational database for metabolic reactions, regulation and states (disease, genotype, phenotype) • QueryTool • Query the database, e.g. to collect a set of reactions • transform between types: proteins, compounds, steps • restrict to attributes: organism, location, states • PathwayViewer • Visualize the results of the search
Query window showing “Proteins involved in Biological process DNA repair”
Transform to ‘Phenotype’ • Select ‘Caffeine Sensitivity’ and get all Proteins • Do Intersection and get all Steps
PathwayViewer • Inspect and manipulate pathways or routes between metabolites. • Alternate topological representations of a pathway: primary and secondary metabolites • Manipulate layout on screen • Control how much data is displayed • Automatically lays out pathways • hierarchical or circular algorithm • Visualization of gene expression and metabolic profiling data
Visualize Steps involved in DNA synthesis and Caffeine sensitivity
Exploring the network neighborhood- build pathways on the fly 2 1 3
CSNdb KEGG Medline aMAZE BRENDA Knowledge BIND BRITE WIT PathDB DIP Metabolism Regulation Ontologies Sequences Annotation Large-Scale Experiments What datasources are out there ? GO UMLS/MESH MBO EcoCyc Gene expression MIPS SW Protein-Protein GenBank Protein expression Protein-SmallMol EMBL Metabolic profiling
Translation/Mapping between: Cellular Location Anatomy Biological Process Molecular Function GO Gene Ontology, 2000 Ontology: Bind genes to hierarchies
Hierarchy of Complexity disease states development states phenotype macro micro organelles cell types, tissues protein, RNA, DNA, compounds molecular molecular micro macro mitosis apoptosis transcription disease development environment metabolic reactions protein-protein Interactions conformation change Entities or States Processes
PathDB Complete Wiring Diagram Reference experimental support Processes/Entities and experimental support Knowledge Metabolism Regulation Ontologies Protein-Protein Annotation Sequences Protein-SmallMol Gene expression Protein expression Metabolic profiling Large-Scale Experiments
Questions • What is the difference between between a normal and a cancer cell? • What is the effect of a knockout mutation on the cellular network? • What “classical” pathways are up or down regulated in my gene expression data? • How well does my set of gene expression arrays support my model of cellular processes? • How does a drug perturb a cellular network as judged through gene expression data? • What experiment promises to distinguish between contradictory hypotheses?
PART II Relating gene expression and pathways
Analysis of Expression Data Clustering of time courses Iyer et.al., Science, 1999 „Scatter plot“ comparing two experiments Roberts et.al., Cell, 2000
Using pathways to contextualize gene expression arrays Miki et al. PNAS, 2001
Expression Pattern Clustering J-Express B. Dysvik / I. Jonassen, U.Bergen, Norway
Mapping of Jexpress Cluster onto Pathways sce00051 Fructose and mannose metabolism EC 3.1.3.46 Fructose-2,6-bisphosphate 2-phosphatase; Fructose-2,6-bisphosphatasesce00190 Oxidative phosphorylation EC 1.9.3.1 Cytochrome-c oxidase; Cytochrome oxidase; Cytochrome a3; Cytochrome aa3 EC 3.6.1.34 H+-transporting ATP synthase; H+-transporting ATPase; Mitochondrial ATPase; Coupling facotrs (F0-F1 and C0-F1); Chloroplast ATPase; Bacterial Ca2+/Mg2+ ATPase EC 3.6.1.38 Ca2+-transporting ATPase; Calcium pumpsce00251 Glutamate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00252 Alanine and aspartate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00410 beta-Alanine metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00640 Propanoate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00650 Butanoate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce03110 ATP Synthase EC 3.6.1.34 H+-transporting ATP synthase; H+-transporting ATPase; Mitochondrial ATPase; Coupling facotrs (F0-F1 and C0-F1); Chloroplast ATPase; Bacterial Ca2+/Mg2+ ATPase Cluster represents genes of different contexts
Clustering and Incremental Pathway Construction A pathway (10 genes) from five clusters with 57 EC-annotated genes • Genes mapped to reactions • dynamically build networks from reaction DB and clustered genes Fellenberg&Mewes, 99 24 (out of 54) gene clusters (6153 ORFs, 694 EC-annotated) Pathway represents 10 genes out of 500
Principal Component Analysis (PCA) • Eigen Analysis • solve for eigenvalues and eigenvectors of a square symmetric matrix • pure sums of squares and cross products (SSCP) • scaled sums of squares and cross products (Covariance) • sums of squares and cross products (Correlation)
Principal componentsand visualization J-Express B. Dysvik / I. Jonassen, U.Bergen, Norway
Data driven vs hypotheses driven approach • Basic Assumptions ( Pathways Cluster ) • Expression time courses for pathways do not necessarily cluster together • Clustered genes do not necessarily form pathways Expression Data and Pathways • Erroneous and noisy expression data • Many genes, measurements • Many spurious hits/clusters of expression patterns • Incomplete data (measurements, kinetic parameters) • Cost of regulation: partially regulated pathways The data driven approach to Genome and Expression Analysis
Outline of a Hypothesis Driven Approach GPE-Score(Pathway) Biological Knowledge