300 likes | 317 Views
BioPAX Biological Pathways Data Exchange www.biopaxwiki.org. Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA. Pathway Data Why does HCLS care? (where we fit).
E N D
BioPAXBiological PathwaysData Exchangewww.biopaxwiki.org Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA
Pathway Data Why does HCLS care?(where we fit) Pathway Research has Broad Impact • Drug Discovery (pathway of target, safety) • Basic Science (identify pathways) • Disease Research (cancer pathways, diabetes, malaria) • Environmental Research (microbial research) Combine knowledge from multiple sources • Whole is greater than the sum of its parts • Biological knowledge is fragmented and isolated • Need database to manage resources
What is a Pathway? Depends on who you ask! Glycolysis Protein-Protein Apoptosis TFs in E. coli Gene Regulatory Networks Molecular Interaction Networks Metabolic Pathways Signaling Pathways
Genetics Microarray High Throughput Experimental Methods MassSpectrometry Two-Hybrid Protein modifications Interaction Data Expression Function Existing Literature Multiple Pathway Databases Integration Nightmare! Slide from Gary Bader
Pathway Databases So many pathway databases, their own data models, formats, and data access methods and internal inconsistencies. More than 200 and growing Source: Pathway Resource List (http://cbio.mskcc.org/prl/) Slide from Mike Cary
Molecular Interactions Pro:Pro All:All Metabolic Pathways Low Detail High Detail Interaction Networks Molecular Non-molecular Pro:Pro TF:Gene Genetic Regulatory Pathways Low Detail High Detail Small Molecules Low Detail High Detail Closes Gaps in Pathway Data Space Exchange Language Domain Database Exchange Formats Simulation Model Exchange Formats BioPAX SBML, CellML Genetic Interactions PSI-MI 2 Rate Formulas Biochemical Reactions Slide from Gary Bader
} Research Community Need WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT IntAct PSI format CSNDB TRANSPATH TRANSFAC INOH PubGene GeneWays Pathway Databases Metabolic Molecular Interaction Cell Signaling Gene Regulatory Networks Integrated Pathway Database Distributed Pathway Databases
One Interfaceone converter per data source or tool >200 DBs and tools Application Database User Without BioPAX With BioPAX Common “computable semantic” enables scientific discovery Slide from Gary Bader (adapted)
Design Goals Encapsulation • An entire pathway in one record Compatible • Use existing standards wherever possible Computable • From file reading to logical inference Successful • Buy-in from the research community
Why OWL DL? Expressivity (biology = “complex relationships” • W3C Standard (use existing (and upcoming) standards) “Semantic Web enabled” • OWL has representations in RDF and XML (XML the exchange language) Machine Computable Enable full reasoning capability from file reading to logical inference • facilitate integration of knowledge, data, tool development • uncover inconsistencies and new knowledge
Different representations of the same pathways <!ELEMENT reaction (substrate*,product*)> <!ATTLIST reaction name %keggid.type; #REQUIRED> <!ATTLIST reaction type %reaction-type.type; #REQUIRED> <!ELEMENT substrate EMPTY> <!ATTLIST substrate name %keggid.type; #REQUIRED> <!ELEMENT product EMPTY> <!ATTLIST product name %keggid.type; #REQUIRED> starts at a-D-Glucose 1P KEGG Reference Pathway GLYCOLYSIS
Different representations of the same pathways reactions.dat This file lists all chemical reactions in the PGDB. Attributes: UNIQUE-ID TYPES COMMON-NAME ACTIVATORS BASAL-TRANSCRIPTION-VALUE DBLINKS DELTAG0 DEPRESSORS EC-LIST EC-NUMBER ENZYMATIC-REACTION EQUILIBRIUM-CONSTANT IN-PATHWAY INHIBITORS LEFT MOVED-IN MOVED-OUT OFFICIAL-EC? REACTANTS REQUIREMENTS RIGHT SIGNAL SPECIES SPONTANEOUS? STIMULATORS SYNONYMS starts at b-D-glucose6-phosphate BioCYC Reference Pathway GLYCOLYSIS
BioPAX uses other ontologies • Use pointers to existing ontologies to provide supplemental annotation where appropriate • Cellular location GO Component • Cell type Cell.obo • Organism NCBI taxon DB • Incorporate other standards where appropriate • Chemical structure SMILES, CML, InChI
BioPAX Ontology: Overview an set of interactions & parts parts how the parts are known to interact Level 1 v1.0 (July 7th, 2004) Slide from Gary Bader (adapted)
OWL (semantics) Instances (data)
SBML annotated with BioPAX <sbml xmlns:bp=“http://www.biopax.org/release1/biopax-release1.owl” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <listOfSpecies> <species id=“PdhA” metaid=“PdhA”> <annotation> <bp:protein rdf:ID=“#PdhA”/> </annotation> </species> <species id=“NADP+” metaid=“NADP+”> <annotation> <bp:smallMolecule rdf:ID=“#NADP+”/> </annotation> </listOfSpecies> <listOfReactions> <reaction id=“pyruvate_dehydrogenase_cplx”> <annotation> <bp:complexAssembly rdf:ID=“#pyruvate_dehydrogenase_cplx”/> </annotation> </reaction> </listOfReactions> species is protein protein is PdhA species is small molecule small molecule is NADP+
<species id=“pyruvate” metaid=“pyruvate”> <annotation xmlns:bp=“http://biopax.org/release1/biopax-release1.owl”> <bp:smallMolecule rdf:ID=“#pyruvate”> <bp:Xref> <bp:unificationXref rdf:ID=“#unificationXref119"> <bp:DB>LIGAND</bp:DB> <bp:ID>c00022</bp:ID> </bp:unificationXref> </bp:Xref> </bp:smallMolecule> </annotation> </species> BioPAX: External References
<species id=“pyruvate” metaid=“pyruvate”> <annotation xmlns:bp=“http://biopax.org/release1/biopax_release1.owl”/> <bp:smallMolecule rdf:ID=“#pyruvate” > <bp:SYNONYMS>2-oxo-propionic acid</bp:SYNONYMS> <bp:SYNONYMS>2-oxopropanoate</bp:SYNONYMS> <bp:SYNONYMS>BTS</bp:SYNONYMS> <bp:SYNONYMS>pyruvic acid</bp:SYNONYMS> </bp:smallMolecule> </annotation> </species> BioPAX: Synonyms
Tools Protégé Ontology Editor GKB Editor SRI SWOOP Pellet Racer Fact++ Pathway Tools EditPlus (Text editor) Want More: See Jeremy & Alan
Overlap? Integration • Combine sources in a meaningful way Identity • Recognize same things in different contexts and different names Composition • Re-usable representations of composite pathway components • to help us manage, query, and reference Exchange • Agreement on: • What is to be exchanged • How to represent it • How to interpret it Want more? See Alan, Jeremy, me
Gene Ontology, Microarray Gene Expression Database BioDASH BioPAX, UniProt Corporate Semantic Web from Carole Goble ISWC2005 Hype graph Gartner hype graph
BioDASH: Bridging Chemistry and Molecular Biology • Different Views have different semantics: Lenses • When there is a correspondence between objects, a semantic binding is possible Uniprot:P49841 Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot Slide from Eric Neumann and Dennis Quan
Probe Seamark Demonstration: Identification of new drug candidates • 1. Differentiate different forms of disease • 2. Identify patients subgroups. • 3. Identify top biomarkers • 4. Identify function • 5. Identify biological and chemical properties and disease associations of biomarker • 6. Identify documents • 7. Identify role in metabolic pathways • 8. Identify compounds that interact • 9. Identify and compare function in other organisms • 10. Identify any prior art GO2Keyword.rdf Keywords.rdf ProbeSet.rdf Keyword GO2OMIM.rdf GO2UniProt.rdf Protein Gene MIM Id OMIM.rdf IntAct.rdf GO.rdf GO2Enzyme.rdf UniProt.rdf Enzyme Organism Citation Compound Taxonomy.rdf Enzymes.rdf PubMed.xml KEGG.rdf Pathway
BioPAX Supporting Groups Databases • BioCyc (www.biocyc.org) • BIND (www.bind.ca) • WIT (wit.mcs.anl.gov/WIT2) • Reactome (www.reactome.org) • PharmGKB (www.pharmgkb.org) • KEGG Grants • Department of Energy (Workshop) Groups • Memorial Sloan-Kettering Cancer Center: G. Bader, M. Cary, J. Luciano, C. Sander • SRI Bioinformatics Research Group: P. Karp, S. Paley, J. Pick • University of Colorado Health Sciences Center: I. Shah • BioPathways Consortium: J. Luciano, E. Neumann, A. Regev, V. Schachter • Argonne National Laboratory: N. Maltsev, E. Marland • Samuel Lunenfeld Research Institute: C. Hogue • Harvard Medical School: E. Brauner, D. Marks, J. Luciano, A. Regev • NIST: R. Goldberg • Stanford: T. Klein • Columbia: A. Rzhetsky • Dana Farber Cancer Institute: J. Zucker • Millennium Pharma: Alan Ruttenberg • Science Commons: Jonathan Rees Collaborating Organizations: • Proteomics Standards Initiative (PSI) • Systems Biology Markup Language (SBML) • Chemical Markup Language (CML) The BioPAX Community