260 likes | 394 Views
How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology. Joanne Luciano BioPAX Workgroup ( biopax.org ) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham MA, USA. Introduction. BioPAX = Biopathway Exchange Language
E N D
How Ontologies Add ValueBioPAX: Biological Pathway Data Exchange Ontology Joanne Luciano BioPAX Workgroup (biopax.org) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham MA, USA
Introduction BioPAX = Biopathway Exchange Language Emerged at ISMB • conceived at ISMB ’01 • born at ISMB ’02 • crawling at ISMB ’03 (Level 0.5) • walking at ISMB ’04 (Level 1.0) • now in the “terrible twos”
Ontology Intro • Natural language does a poor job at conveying complex information without ambiguity • Ontologies provide a means to give concise meanings to pieces of data from a particular domain • Thereby facilitating computational operations on the data • Ontologies are becoming increasingly common in the biological community • See http://obo.sourceforge.net/obo.htm
Ontology: Components • Class hierarchy: chemical protein • Relations & attributes: fields (slots) on the classes, can be other classes • Constraints: Define allowable values and connections within an ontology • Objects: instances of classes • Values: occupy slots • Controlled vocabularies (CVs) • BioPAX will use class, attributes, constraints, values and CVs. Objects are user responsibility * From Peter Karp, “Ontologies: Definitions, Components, Subtypes”, SRI International, presentation available at http://www.biopax.org
What is a Pathway? Depends on who you ask! Glycolysis Protein-Protein Apoptosis Lac Operon Molecular Interaction Networks Metabolic Pathways Signaling Pathways Gene Regulation
Genetics Microarray High Throughput Experimental Methods MassSpectrometry Two-Hybrid Protein modifications Interaction Data Expression Function Existing Literature PubMed Multiple Pathway Databases Integration Nightmare!
So many pathway databases…Each has its owndata model, format, and data access methods Source: Pathway Resource List (http://cbio.mskcc.org/prl/)
} Research Community Needs Semantic Aggregation, Integration, Inference(Pedantic Aggravation, Irritation, and Interference) Pathway Databases WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT IntAct PSI format CSNDB TRANSPATH TRANSFAC PubGene GeneWays
A Common Exchange Language Promotes collaboration (big science), accessibility Application Database User Without BioPAX With BioPAX Over 170 DBs and tools Common “computable semantic” enables scientific discovery
Molecular Interactions Pro:Pro All:All Metabolic Pathways Low Detail High Detail Interaction Networks Molecular Non-molecular Pro:Pro TF:Gene Genetic Regulatory Pathways Low Detail High Detail Small Molecules Low Detail High Detail Closes Gaps in Pathway Data Space Exchange Language Domain Database Exchange Formats Simulation Model Exchange Formats BioPAX SBML, CellML Genetic Interactions PSI-MI 2 Rate Formulas Biochemical Reactions
Design Goals • Encapsulation: An entire pathway in one record • Compatible: Use existing standards wherever possible • Computable: From file reading to logical inference • Successful: Buy-in from the research community
Technical Goals Interoperability • Integration and exchange of pathway data • Interchange through a common (standard) representation • accommodate existing database representations • provide a basis for future databases • enables development of tools for searching and reasoning over the data base Development of tools and API to facilitate conversion (libBioPAX)
Technical Goals (cont’d) Why OWL? Why OWL DL? Expressivity (biology = “complex relationships”) • W3C Standard (use existing standards) “Semantic Web enabled” • XML based (theexchange language in computing) • Machine Computable • Facilitate integration of knowledge, data, tool development • Uncover inconsistencies and new knowledge • OWL DL • Enable full reasoning capability for users from file reading to logical inference • Complete: all conclusions are guaranteed to be computed • Decidable: all computations will finish in finite time (with OWL Lite, short amount of time)
Social Logistics Get organized Make the decision & commitment 2 or 3 dedicated individuals to be the contact points Small core group • Bi-weekly conference calls, bi-monthly F2F • Commitment & resources • Participants willing and able cover their costs • Outside funding (DOE) Special interests and needs form subgroup task forces • Core group member(s) • Outside experts International representation & participation (Outreach & Community Building) • conferences and mailing lists • follow-up and individual Collaborate with complementary/competing representations
Social Logistics (cont’d) How we engendered buy in from the field whichmade life much easier Take things in steps: • Pathway Database vision -> Data Exchange Format as 1st step • Data Exchange Format -> Release in Levels of increasing complexity Level 1 supports Metabolic pathways, Level 2 Early success leads to early adoption, leads to increased probability of overall project success. Get “buy in” and get involvement -leads to acceptance later • Support the existing databases (BioCYC, WIT, BIND, etc.) • Got database sources to agree to participate in the development to assure that their DBs will be properly represented • Got database sources to agree to export in the new format once it is defined
Social Logistics (cont’d) Get “buy in” (continued) • Community Involvement and Support Core group (represents voice of community, small, committed) Mailing List User community interaction (BioPAX-Boston) Subgroups • International Meetings and Presentations Tool developers Modelers Users (researchers) Ontology developers Database providers Complementary representations (SBML, CellML) Like minds General Community
Implementation of BioPAX Designed using GKB Editor and Protégé BioPAX uses OWL to define the “Schema” BioPAX Instances to store the data Technically, an ontology with instance data is a knowledge base
BioPAX – Ontology Level 1: Metabolic Pathways
Mapping Pathways to BioPAX OWL (schema) Instances (Individuals) data
Challenges & Bottlenecks • Scientific • What’s a pathway? Depends on who you ask. • Technical • Each own syntax & semantics • Immaturity of tools for data integration • Social / Logistical • Community organization and adoption • Financial • mostly volunteer of stakeholders • Dept of Energy
Bridging Chemistry and Molecular Biology • Different Views have different semantics: Lenses • When there is a correspondence between objects, a semantic binding is possible Uniprot:P49841 Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot Source: Eric Neumann
Enables Computable Biology BioPAX increases collaboration and accessibility to the field and enables 'big science' because it delivers a scalable solution Capture the complex relationships inherent in Biology Solves some nasty integration problems Saves a lot of time and money
BioPAX Supporting Groups Databases • BioCyc (www.biocyc.org) • BIND (www.bind.ca) • WIT (wit.mcs.anl.gov/WIT2) • PharmGKB (www.pharmgkb.org) Grants • Department of Energy (Workshop) Groups • Memorial Sloan-Kettering Cancer Center: G. Bader, M. Cary, J. Luciano, C. Sander • SRI Bioinformatics Research Group: P. Karp, S. Paley, J. Pick • University of Colorado Health Sciences Center: I. Shah • BioPathways Consortium: J. Luciano, E. Neumann, A. Regev, V. Schachter • Argonne National Laboratory: N. Maltsev, E. Marland • Samuel Lunenfeld Research Institute: C. Hogue • Harvard Medical School: E. Brauner, D. Marks, J. Luciano, A. Regev • NIST: R. Goldberg • Stanford: T. Klein • Columbia: A. Rzhetsky • Dana Farber Cancer Institute: J. Zucker Collaborating Organizations: • Proteomics Standards Initiative (PSI) • Systems Biology Markup Language (SBML) • CellML • Chemical Markup Language (CML) The BioPAX Community