870 likes | 878 Views
Learn about OBO Foundry, Gene Ontology, relations, modularity, identifiers, and more in ontology development for bioinformatics. Explore case studies and principles in this extensive workshop outline.
E N D
Linking Multiple Ontologies:The OBO Foundry Approach Chris Mungall NIAID Cell Ontology Workshop May 2008
Outline • Introduction to ontologies • The OBO perspective • Case study in the Gene Ontology • The OBO Foundry: goals and principles • The OBO relation ontology • Organization of ontologies in OBO • Modularity • An example from CL • Linking CL to the OBO Foundry
What is an ontology? • A computable representation of some domain • What kinds of things exists • What are the relations that hold between them? Cavitated organ Cardiovascular System is_a part_of Heart part_of part_of Mitral valve Aortic valve
Aspects of an ontology • Identifiers • Uniquely identify a class / term • E.g. CL:0000037 is ID for the term “hematopoietic stem cell” • Identifier metadata • Terminological aspects • Names and synonyms/alternate labels • CL:0000037 has “hemopoietic progenitor cell” as a related synonym and “hemopoietic stem cell” as exact synonym • Logical aspects • Relations • Definitions Provenance
Some ontologies and their uses • The Gene Ontology • Annotation of gene products • Analyzing high-throughput datasets • Anatomical ontologies (including CL) • Experimental metadata • Image annotation • Indicating location of gene expression • Creating Phenotypic descriptions • Others • NLP • Annotating information models • Database integration
Origins of OBO: The Gene Ontology (GO) • 3 ontologies for annotating genes and gene products • These ontologies are organised as a collection of related terms, constituting nodes in a graph • Gradually incorporating other logical axioms
Annotation and GO • GO Annotations: • Associations between genes and GO terms, with evidence • Met17 : “methionine metabolism” GO:0006555 • 222,000 genes and gene products have high quality annotations to GO terms • 3.4m including automated predictions • 66,000 publications curated • Variety of analysis tools • http://www.geneontology.org/GO.tools.shtml#micro
GO and high-throughput biology: Over-representation of GO terms for gene sets GO::TermFinder Sherlock et al
GO and the need for OBO • GO terms implicitly reference kinds of entities outwith the scope of GO • Methionine biosynthesis • Neural crest cell migration • Cardiac muscle morphogenesis • Regulation of vascularpermeability • OBO was born from the need to create source ontologies for GO term ‘cross-products’ • Define composite classes in terms of simpler ones chemical cell anatomy quality
The Open Biomedical Ontologies (OBO) Foundry • A collection of orthogonal reference ontologies in the biological/biomedical domain • The OBO Foundry: Each is committed to an agreed upon set of principles governing best practices in ontology development
Some OBO ontologies • Gene Ontology • ChEBI - chemical entities • OBI - investigations • PATO, MP - phenotypes • CL - cells • ENVO - environment and habitat • DO - Human diseases • CARO - common anatomy • FMA - human anatomy • SO - sequence features • Model organism anatomy • ZFA • Fly_anat • Dicty_anat • Mouse_anat • … • OBO Relation Ontology
OBO Foundry: criteria, v1 • Open • Well-defined exchange format E.g. OBO or OWL • Uses identifiers according to OBO ID policy • Ontology Life-cycle / versioning • Has clearly specified and delineated content • Has unambiguous definitions • Uses or extends relations in the OBO Relation Ontology • Well documented • Has a plurality of users (and a mail list & issue tracker) • Developed collaboratively • Orthogonal, modular http://obofoundry.org/
OBO Relation Ontology • Edges can link nodes… • Within ontologies • Across ontologies • The precise meaning of the relation is important • Relations have formal definitions • Rules for composing relations together • http://obofoundry.org/ro/
Is_a • X is_a Y • If something is an instance of X (at time t), then it is also an instance of Y (at t) • Transitive • B1 B cell is_a B cell • B cell is_a lymphocyte • Therefore B1 B cell is_a lymphocyte
Part_of • Instance level part_of relation is primitive • Between classes: • X part_of Y : • Every instance of X is part_of some instance of Y • Paneth cell part_of intestine : YES • Nucleus part_of Cell : YES • Neuron part_of brain : NO • (there are some neurons that are part of others parts of the nervous system) • Transitive • X part_of Y, Y part_of Z • Therefore, X part_of Z
Has_part • Instance level inverse of part_of • X has_part Y • Every X has some Y as part • Cell has_part nucleus : NO • Nucleate erythrocyte has_part nucleus : YES
Develops_from • X develops_from Y • Every instance of X was once a Y, or inherited a significant portion of its matter from a Y • Example: erythrocyte develops_from reticulocyte • Transitive • erythrocyte develops_from reticulocyte • reticulocyte develops_from orthochromatic erythroblast • => • erythrocyte develops_from orthochromatic erythroblast
Transformation and derivation • Develops_from relation can be refined into two cases: • Transformation_of • X transformation_of Y : • Any instance of X was previously an instance of Y • Example: erythrocyte transformation_of reticulocyte • Derives_from • X derives_from Y : • Holds between distinct instances where Y inherits matter from X • Most OBO ontologies just use the develops_from relation
Other relations • Inherence • Between a quality and an object • E.g. between a specific shape and a cell • Participation • Between a process and an object • E.g. between a B cell and an immune process
Definitions state necessary and sufficient conditions • Links in the ontology graph state necessary conditions for a class • E.g. erythroid progenitor celldevelops_frommegakaryocyte erythroid progenitor • These characteristics may not be unique • A definition should state necessary and sufficient conditions for a class • The characteristics must be unique to the defined class • E.g. “progenitor cell that is committed to the erythroid lineage” • Definition should be precise and (as far as possible) translated / translatable to logical computable form
Genus differentia definitions • Of the form • An X is a G that D • G should be in the same ontology • D is discriminating characteristics that differentiate (in the classification sense) Xs from other Gs. • Relations to terms in an ontology (the same ontology or a different one) • Example: • A B cell is a lymphocyte that expresses an immunoglubulin complex
Orthogonality of ontologies • No two ontologies should represent the same kind of entity • E.g. “B-cell” should only be represented in one ontology • Related entities should be coordinated across ontologies • GO: “B-cell differentiation” • Exceptions: • The term “cell” connects GO Cellular Component (cell parts) and CL (cells) • Advantages: • Reduces redundancy and work • Easier to make the union consistent
Some OBO terms.. bile liver liver development obesity fat body hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte insulin increased circulating glucose level carbohydrate metabolism glucose glycogen
FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI
FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI
FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) How should we organize this? hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI
Top-level organisation (BFO: Basic Formal Ontology) • Levels of granularity (scale) • Population • Organism • Organ • Cell • Molecule • part_of relations can cross levels • General categories • 3D things (continuants) • Independent • Cells, organs, molecules • Dependent • Shapes, sizes, concentrations, … • 4D things (processes) • Processes • Useful organisational principle for OBO • is_a and part_of should not cross top level categories
Objects Qualities etc Processes FMA bile MP GO (adult human) (mammal phenotype) (biological process) FBbt MA liver liver development obesity (fly) fat body (mouse) hepatic artery oenocyte oenocyte differentiation hepatoma hepatocyte DO CL PRO insulin increased circulating glucose level carbohydrate metabolism glucose glycogen CHEBI
The OBO Foundry can help with modular ontology design • Biology is complex • So our ontologies will be complex • Multiple purposes • Multiple means of classifying • Separate out different aspects • Modular approach • Avoid multiple inheritance (>1 is_a parent) • Don’t over-use is_a • Don’t cross aspects with is_a • Make complex descriptions from simpler parts • Polyhierarchies arise from composition
Cysteine biosynthesis (trimmed) GO Tangled polyhierarchy
Cysteine biosynthesis (trimmed) Process axis
Cysteine biosynthesis (trimmed) Chemical structure axis
Cysteine biosynthesis (trimmed) ChEBI (trimmed)
Cysteine biosynthesis (trimmed) ChEBI (trimmed)
Cysteine biosynthesis (trimmed) ChEBI (trimmed)
Cysteine biosynthesis (trimmed) ChEBI (trimmed) We can do more than simply link terms: Cross-products (aka logical definitions, Computable genus-differentia definitions)
Cysteine biosynthesis (trimmed) ChEBI (trimmed) Cysteine biosynthesisGO:0019344 = a biosynthetic process GO:0009058 that results_in_creation_of cysteine CHEBI:13536 } genus } differentia
results_in_change_to Cysteine biosynthesitic process = biosynthetic processthatresults_in_change_tocysteine
Let the computer do the work.. Given cross-products, A reasoner can add all links Underlying representation is normalized
CL X • Try not to assert too many is_a parents
CL GO X ? Has function • Reuse existing ontologies • Non-is_a relation
How CL can use other OBO ontologies • GO Cellular component • Mononuclear phagocyte • B cell (expresses immunoglubulin complex) • GO Biological process • Photosynthetic cell • PATO Qualities • Spiny neuron • CHEBI Chemical entities • X secreting cell • Anatomy Ontologies • CNS neuron Molecular function, PRO - CD4 positive cell
Results • Biological process x CL • http://wiki.geneontology.org/index.php?XP:biological_process_xp_cell • Uncovered inconsistencies between GO and CL • Oenocyte differentiation is_a columnar/cuboidal epithelial cell differentiation • MP x CL • http://wiki.geneontology.org/index.php/XP:mammalian_phenotype_xp • Resulted in various fixes to MP
Summary • The cell ontology is a representation of the types of cell that exist • The OBO Foundry provides • Principles • A framework for connecting ontologies • There are many points of coordination between CL and other OBO ontologies • CL could benefit from the gradual introduction of a modular approach