1 / 87

Linking Multiple Ontologies: The OBO Foundry Approach

Learn about OBO Foundry, Gene Ontology, relations, modularity, identifiers, and more in ontology development for bioinformatics. Explore case studies and principles in this extensive workshop outline.

lcarolyn
Download Presentation

Linking Multiple Ontologies: The OBO Foundry Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking Multiple Ontologies:The OBO Foundry Approach Chris Mungall NIAID Cell Ontology Workshop May 2008

  2. Outline • Introduction to ontologies • The OBO perspective • Case study in the Gene Ontology • The OBO Foundry: goals and principles • The OBO relation ontology • Organization of ontologies in OBO • Modularity • An example from CL • Linking CL to the OBO Foundry

  3. What is an ontology? • A computable representation of some domain • What kinds of things exists • What are the relations that hold between them? Cavitated organ Cardiovascular System is_a part_of Heart part_of part_of Mitral valve Aortic valve

  4. Aspects of an ontology • Identifiers • Uniquely identify a class / term • E.g. CL:0000037 is ID for the term “hematopoietic stem cell” • Identifier metadata • Terminological aspects • Names and synonyms/alternate labels • CL:0000037 has “hemopoietic progenitor cell” as a related synonym and “hemopoietic stem cell” as exact synonym • Logical aspects • Relations • Definitions Provenance

  5. Some ontologies and their uses • The Gene Ontology • Annotation of gene products • Analyzing high-throughput datasets • Anatomical ontologies (including CL) • Experimental metadata • Image annotation • Indicating location of gene expression • Creating Phenotypic descriptions • Others • NLP • Annotating information models • Database integration

  6. Origins of OBO: The Gene Ontology (GO) • 3 ontologies for annotating genes and gene products • These ontologies are organised as a collection of related terms, constituting nodes in a graph • Gradually incorporating other logical axioms

  7. Annotation and GO • GO Annotations: • Associations between genes and GO terms, with evidence • Met17 : “methionine metabolism” GO:0006555 • 222,000 genes and gene products have high quality annotations to GO terms • 3.4m including automated predictions • 66,000 publications curated • Variety of analysis tools • http://www.geneontology.org/GO.tools.shtml#micro

  8. GO and high-throughput biology: Over-representation of GO terms for gene sets GO::TermFinder Sherlock et al

  9. GO and the need for OBO • GO terms implicitly reference kinds of entities outwith the scope of GO • Methionine biosynthesis • Neural crest cell migration • Cardiac muscle morphogenesis • Regulation of vascularpermeability • OBO was born from the need to create source ontologies for GO term ‘cross-products’ • Define composite classes in terms of simpler ones chemical cell anatomy quality

  10. The Open Biomedical Ontologies (OBO) Foundry • A collection of orthogonal reference ontologies in the biological/biomedical domain • The OBO Foundry: Each is committed to an agreed upon set of principles governing best practices in ontology development

  11. Some OBO ontologies • Gene Ontology • ChEBI - chemical entities • OBI - investigations • PATO, MP - phenotypes • CL - cells • ENVO - environment and habitat • DO - Human diseases • CARO - common anatomy • FMA - human anatomy • SO - sequence features • Model organism anatomy • ZFA • Fly_anat • Dicty_anat • Mouse_anat • … • OBO Relation Ontology

  12. OBO Foundry: criteria, v1 • Open • Well-defined exchange format E.g. OBO or OWL • Uses identifiers according to OBO ID policy • Ontology Life-cycle / versioning • Has clearly specified and delineated content • Has unambiguous definitions • Uses or extends relations in the OBO Relation Ontology • Well documented • Has a plurality of users (and a mail list & issue tracker) • Developed collaboratively • Orthogonal, modular http://obofoundry.org/

  13. OBO Relation Ontology • Edges can link nodes… • Within ontologies • Across ontologies • The precise meaning of the relation is important • Relations have formal definitions • Rules for composing relations together • http://obofoundry.org/ro/

  14. Is_a • X is_a Y • If something is an instance of X (at time t), then it is also an instance of Y (at t) • Transitive • B1 B cell is_a B cell • B cell is_a lymphocyte • Therefore B1 B cell is_a lymphocyte

  15. Part_of • Instance level part_of relation is primitive • Between classes: • X part_of Y : • Every instance of X is part_of some instance of Y • Paneth cell part_of intestine : YES • Nucleus part_of Cell : YES • Neuron part_of brain : NO • (there are some neurons that are part of others parts of the nervous system) • Transitive • X part_of Y, Y part_of Z • Therefore, X part_of Z

  16. Has_part • Instance level inverse of part_of • X has_part Y • Every X has some Y as part • Cell has_part nucleus : NO • Nucleate erythrocyte has_part nucleus : YES

  17. Develops_from • X develops_from Y • Every instance of X was once a Y, or inherited a significant portion of its matter from a Y • Example: erythrocyte develops_from reticulocyte • Transitive • erythrocyte develops_from reticulocyte • reticulocyte develops_from orthochromatic erythroblast • => • erythrocyte develops_from orthochromatic erythroblast

  18. Transformation and derivation • Develops_from relation can be refined into two cases: • Transformation_of • X transformation_of Y : • Any instance of X was previously an instance of Y • Example: erythrocyte transformation_of reticulocyte • Derives_from • X derives_from Y : • Holds between distinct instances where Y inherits matter from X • Most OBO ontologies just use the develops_from relation

  19. Other relations • Inherence • Between a quality and an object • E.g. between a specific shape and a cell • Participation • Between a process and an object • E.g. between a B cell and an immune process

  20. Definitions state necessary and sufficient conditions • Links in the ontology graph state necessary conditions for a class • E.g. erythroid progenitor celldevelops_frommegakaryocyte erythroid progenitor • These characteristics may not be unique • A definition should state necessary and sufficient conditions for a class • The characteristics must be unique to the defined class • E.g. “progenitor cell that is committed to the erythroid lineage” • Definition should be precise and (as far as possible) translated / translatable to logical computable form

  21. Genus differentia definitions • Of the form • An X is a G that D • G should be in the same ontology • D is discriminating characteristics that differentiate (in the classification sense) Xs from other Gs. • Relations to terms in an ontology (the same ontology or a different one) • Example: • A B cell is a lymphocyte that expresses an immunoglubulin complex

  22. Orthogonality of ontologies • No two ontologies should represent the same kind of entity • E.g. “B-cell” should only be represented in one ontology • Related entities should be coordinated across ontologies • GO: “B-cell differentiation” • Exceptions: • The term “cell” connects GO Cellular Component (cell parts) and CL (cells) • Advantages: • Reduces redundancy and work • Easier to make the union consistent

  23. Some OBO terms.. bile liver liver development obesity fat body hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte insulin increased circulating glucose level carbohydrate metabolism glucose glycogen

  24. FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI

  25. FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI

  26. FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) How should we organize this? hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI

  27. Top-level organisation (BFO: Basic Formal Ontology) • Levels of granularity (scale) • Population • Organism • Organ • Cell • Molecule • part_of relations can cross levels • General categories • 3D things (continuants) • Independent • Cells, organs, molecules • Dependent • Shapes, sizes, concentrations, … • 4D things (processes) • Processes • Useful organisational principle for OBO • is_a and part_of should not cross top level categories

  28. Objects Qualities etc Processes FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI

  29. The OBO Foundry can help with modular ontology design • Biology is complex • So our ontologies will be complex • Multiple purposes • Multiple means of classifying • Separate out different aspects • Modular approach • Avoid multiple inheritance (>1 is_a parent) • Don’t over-use is_a • Don’t cross aspects with is_a • Make complex descriptions from simpler parts • Polyhierarchies arise from composition

  30. Cysteine biosynthesis (trimmed) GO Tangled polyhierarchy

  31. Cysteine biosynthesis (trimmed) Process axis

  32. Cysteine biosynthesis (trimmed) Chemical structure axis

  33. Cysteine biosynthesis (trimmed) ChEBI (trimmed)

  34. Cysteine biosynthesis (trimmed) ChEBI (trimmed)

  35. Cysteine biosynthesis (trimmed) ChEBI (trimmed)

  36. Cysteine biosynthesis (trimmed) ChEBI (trimmed) We can do more than simply link terms: Cross-products (aka logical definitions, Computable genus-differentia definitions)

  37. Cysteine biosynthesis (trimmed) ChEBI (trimmed) Cysteine biosynthesisGO:0019344 = a biosynthetic process GO:0009058 that results_in_creation_of cysteine CHEBI:13536 } genus } differentia

  38. results_in_change_to Cysteine biosynthesitic process = biosynthetic processthatresults_in_change_tocysteine

  39. Let the computer do the work.. Given cross-products, A reasoner can add all links Underlying representation is normalized

  40. Example of is_a-overloading: OBO Cell Ontology(current) CL

  41. CL X • Try not to assert too many is_a parents

  42. CL GO X ? Has function • Reuse existing ontologies • Non-is_a relation

  43. How CL can use other OBO ontologies • GO Cellular component • Mononuclear phagocyte • B cell (expresses immunoglubulin complex) • GO Biological process • Photosynthetic cell • PATO Qualities • Spiny neuron • CHEBI Chemical entities • X secreting cell • Anatomy Ontologies • CNS neuron Molecular function, PRO - CD4 positive cell

  44. How CL is used by other ontologies

  45. Results • Biological process x CL • http://wiki.geneontology.org/index.php?XP:biological_process_xp_cell • Uncovered inconsistencies between GO and CL • Oenocyte differentiation is_a columnar/cuboidal epithelial cell differentiation • MP x CL • http://wiki.geneontology.org/index.php/XP:mammalian_phenotype_xp • Resulted in various fixes to MP

  46. OBD: Ontology Annotation Database

  47. Summary • The cell ontology is a representation of the types of cell that exist • The OBO Foundry provides • Principles • A framework for connecting ontologies • There are many points of coordination between CL and other OBO ontologies • CL could benefit from the gradual introduction of a modular approach

More Related