660 likes | 810 Views
How to build cross-species interoperable ontologies. Chris Mungall, LBNL Melissa Haendel, OHSU. The challenge. There are many fun and interesting issues involved in building and using cross-species ontologies homology evo-devo reasoning using ontologies
E N D
How to build cross-species interoperable ontologies Chris Mungall, LBNL Melissa Haendel, OHSU
The challenge.. • There are many fun and interesting issues involved in building and using cross-species ontologies • homology • evo-devo • reasoning using ontologies • connecting genomics databases to phenotypes
but… • Unfortunately, there are many more prosaic issues with unsatisfying solutions • multiple ontologies already exist • limited cooperation between the developers of these ontologies • they differ widely in every aspect imaginable • they are heavily embedded in existing databases and applications and slow to change • tools and infrastructure support falls short of what we need • FORTUNATELY, solutions are emerging..
Outline • Anatomy Ontologies: Background • Case studies • GO: A unified cross-species ontology • CL: Cell Ontology: Unifying multiple existing efforts • Building interoperable gross anatomy ontologies • (Melissa)
Ontologies • Computable qualitative representations of some part of the world • Relationships with computable properties • e.g. transitivity • languages and formats like owl and obo have a formal semantics • Entities are grouped into classes • Relationships are statements about all the members of a class • the most common form is the all-some statement
Ontologies are not smart • Deductive Logic is not flexible • Example • Human knowledge: • chromosomes are found in the nucleus • Naïve ontology encoding: • every chromosome part_of some nucleus • But this is wrong • Ontologies don’t make exceptions! • Solution: • (1) create location-specific subclasses • nuclear chromosome • mitochondrial chromosome • (2) – invert statement: every nucleus has chromosomes
Existing Anatomy Ontologies • Human AOs • Model Organism AOs • Domain specific AOs • Cross-species AOs
FMA : Foundational Model of Anatomy • Domain: adult human • no develops_from relationships, few embryonic structures • Size: large (70k+ classes) • Language: frames • Approach • formal, Strict single inheritance, Purely structural perspective • No computable definitions • Heavily pre-coordinated • “Trunk of communicating branch of zygomatic branch of right facial nerve with zygomaticofacial branch of right zygomatic nerve” • “Distal epiphysis of of distal phalanx of right little toe” • Extensive spatial relationships in selected areas • e.g. veins, arteries • Uses • not designed for one particular use
FMA Example / FMA:62955 ! Anatomical entity is_a FMA:61775 ! Physical anatomical entity is_a FMA:67165 ! Material anatomical entity is_a FMA:67135 ! Anatomical structure is_a FMA:67498 ! Organ is_a FMA:55670 ! Solid organ is_a FMA:55661 ! Parenchymatous organ is_a FMA:55662 ! Lobular organ is_a FMA:13889 ! Pituitary gland is_a FMA:20020 ! Vestibular gland is_a FMA:55533 ! Accessory thyroid gland is_a FMA:58090 ! Areolar gland is_a FMA:59101 ! Lacrimal gland is_a FMA:62088 ! Lactiferous gland is_a FMA:7195 ! Lung is_a FMA:7197 ! Liver is_a FMA:7198 ! Pancreas is_a FMA:7210 ! Testis is_a FMA:76835 ! Accessory pancreas is_a FMA:9597 ! Salivary gland is_a FMA:9599 ! Bulbo-urethral gland is_a FMA:9600 ! Prostate is_a FMA:9603 ! Thyroid gland
Model Organism Anatomy Ontologies • Typically species-centric • FBbt : Drosophila melanogaster • WBbt: C elegans • ZFA: Danio rerio • XAO: Xenopus • MA: Adult Mouse (no develops from) • EMAP/EMAPA: developing mouse • Uses • primarily gene expression, also phenotype description • others: Virtual FLy Brain, Phenoscape • Approach: • use-case driven • practicality over formality • No computable definitions • (exception FBbt)
Other anatomy ontologies • Developing human • EHDAA2 • Vectors • TGMA – mosquito • TADS - tick • Upper ontologies • CARO • AEO • Domain-specific anatomy ontologies • NIF_Anatomy, NIF_Cell – neuroscience • Phylogenetic or multi-taxon AOs • HAO – hymeoptera • PO – plant • TAO – telost • AAO – amphibian • SPD – Spider • … • we will return to these later..
Problem • These AOs are not developed in a coordinated fashion • use of a shared upper ontology does not buy us much • even the 3 mammalian AOs are massively different • Data annotated using these ontologies effectively becomes siloed • There is redundancy of effort in areas of shared biology • Are there lessons from existing ontologies?
Building ontologies that are interoperable across species • Case Studies • GO • Cell Ontology
Gene Ontology • Covers all kingdoms of life • viruses, bacteria, archaea • fungi, metazoans, plants • Covers biology at different scales • Issues • terminological confusion (e.g. “blood”) • large, difficult to maintain
How does GO deal with taxonomic variation? • What GO says: • every nucleus is part_ofsome cell • What GO does not say: • every cell has_partsome nucleus • wrong for bacteria (and mammalian erythrocytes) • Take home: • Logical quantifiers are essential to understanding the ontology • Saying what something is part of is safer than saying what its parts are
Principle: avoidance of taxonomic differentia • Not in GO: • vertebrate eye development • insect eye development • cephalopod eye development • In GO: • eye development • camera-type eye development • compound eye development • Exceptions for usability: • cell wall • fungal-type cell wall [differentia:cross-linked glycoproteins and carbohydrates, chitin / beta-glucan …] • plant-type cell wall [differentia: cellulose, pectin, …] } no implication of homology
The problem of vagueness in GO • “limb development” • “wing development”
Adding taxonomic constraints to GO • GO now includes two additional relations • only_in_taxon • never_in_taxon • See: • Kusnierczyk, W: Taxonomy-based partitioning of the Gene Ontology, JBI 2008 • Deegan et al: Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development, BMC Bioinformatics 2010
Examples • lactation only_in_taxon Mammalia (NCBITaxon:40674 ) • OWL: lactation in_taxononly Mammalia • odontogenesis never_in_taxon Aves (NCBITaxon:8782) • OWL: odontogenesis in_taxononly not Aves • chloroplast only_in_taxon (Viridiplantae or Euglenozoa) (NCBITaxon:33682 or NCBITaxon:33090)
Uses of taxon relationships • Clarifying meaning of GO terms • Detection of errors in electronic and manual annotation • Automated reasoners • GO previously had chicken genes involved in lactation, slime mold genes involved in fin regeneration… • Providing views over GO • e.g. subset of GO excluding terms that are never in drosophila
Scalability of single-ontology approach: GO • How does GO cope with wide taxonomic diversity? • conservation at molecular level, wide diversity of phenotypes at level of gross anatomical development, physiology, and organismal behavior • GO Development • Focused on model systems • “beak development” added only recently • GO Behavior • Very broad coverage • Some specific terms, e.g. drosophila courtship
Ontology Views • Ontologies, traditional • independent standalone resources • Ontologies, new • interconnected resources • multiple views possible • Subsetting • Aggregation • Subsetting + Aggregation • views can be manually specified (e.g. go slims) or automatically constructed • Limited re-writing possible • e.g. names
Views “slim” subset aggregate aggregate+subset subset subset scattered subset domain/taxon-specific cut
Subset of GO
vertebrate subset
Outline • Case studies • GO: A unified cross-species ontology • Cell Ontology: Unifying multiple existing efforts • Gross Anatomy
Cell types • GO-Cell Component • cell parts • CL – cell ontology • Anatomical Ontologies • Includes cell types: • FBbt (Drosophila) • WBbt (C elegans) • ZFA, TAO (Danio rerio, Teleost) • FMA (Human) • PO (Plant) • FAO (Fungi) • Excludes cell types: • MA (adult mouse) • EMAPA (developing mouse) • EHDAA2 (developing human)
Overlap (simplified view) CL PO ZFA FMA NIF cell brain MA plant spore alveolar macrophage lung neuron
The Problem • Duplicated work • No unified view • Confusion for users • Confusion for annotators
Alternative proposals • LUMP: Combine into one monolithic CL ontology • SPLIT: Taxon-specific cell types in taxon-centric ontologies • Obsolete generic cell types currently in tcAOs -vs- • Taxon-specific subclasses of generic cell types
LUMP all cells plants fish human mouse plant spore alveolar macrophage neuron
CL Lumping proposal • Advantages: • one stop shopping for CL • (but this can be done with aggregate views) • Disadvantages • tcAO IDs well-established • Little advantage to lumping plant cells with animal cells • Harder to manage editorially • Cross-granular relationships
(Partial) Splitting proposal • Advantages: • Easier to manage • Sensible subdivision of labor: • Common cell types in shared common cell ontology • e.g. shared definition of “neuron” • Taxon-specific subtypes in taxon-centric ontologies • Disadvantages • Aggregate view is problematic • union of ontologies contains multiple classes labeled “neuron” • Can be solved by obsoleting existing generic cell classes in tcAOs and replacing by CL IDs • problem: cross-granular relationships
Current solution for CL: split and retain IDs • Any cell type shared by two model taxa should be in CL • tcAOs retain both generic and specific cell type classes • Formally connected to CL via subclass relationships • or even stronger: taxon-specific equivalent
Example aggregate view CL-metazoa FMA CL FBbt cell i i i muscle cell muscle organ cell cell i i p muscle cell muscle cell i frontal pulsatile organ muscle
Example aggregate+subset view CL-metazoa FMA CL FBbt cell i i i muscle cell cell cell i i muscle cell muscle cell i frontal pulsatile organ muscle
Who maintains the connections and how? • How: • maintained as xrefs for convenience • Who: • either tcAO or CL • Synchronization? • hard • reasoning over aggregate view
Who maintains the connections? cl’s responsibility [Term] id: CL:0000584 name: enterocyte def: "An epithelial cell that has its apical plasma membrane folded into microvilli to provide ample surface for the absorption of nutrients from the intestinal lumen." [SANBI:mhl] xref: FMA:62122 is_a: CL:0000239 ! brush border epithelial cell cl.obo [Term] id: ZFA:0009269 name: enterocyte namespace: zebrafish_anatomy def: "An epithelial cell that has its apical plasma membrane folded into microvilli to provide ample surface for the absorption of nutrients from the intestinal lumen." [SANBI:curator] synonym: "enterocytes" EXACT PLURAL [] xref: CL:0000584 xref: TAO:0009269 xref: ZFIN:ZDB-ANAT-070308-209 is_a: ZFA:0009143 ! brush border epithelial cell relationship: end ZFS:0000044 ! Adult relationship: part_of ZFA:0005124 ! intestinal epithelium relationship: start ZFS:0000000 ! Unknown zfa.obo zfa’s responsibility
Issues with aggregate view FMA CL FBbt duplicate names lattices = hairballs cell i i i muscle cell cell cell i i muscle cell muscle cell i frontal pulsatile organ muscle
Duplicate names • Searching for “muscle cell” returns • CL:0000187 ! muscle cell • FBbt:00005074 ! muscle cell • FMA:67328 ! muscle cell • ZFA:0009114 ! muscle cell • NIF_Cell:sao519252327 ! Muscle Cell • Proposed solutions • rename in source ontology • yuck • make end-user applications smarter • not practical for n applications • auto-rename in ontology view • best solution
Aggregate view [Term] id: CL:0000584 name: enterocyte def: "An epithelial cell that has its apical plasma membrane folded into microvilli to provide ample surface for the absorption of nutrients from the intestinal lumen." [SANBI:mhl] xref: FMA:62122 is_a: CL:0000239 ! brush border epithelial cell cl-metazoa.obo [Term] id: ZFA:0009269 name: zebrafish enterocyte def: "An epithelial cell that has its apical plasma membrane folded into microvilli to provide ample surface for the absorption of nutrients from the intestinal lumen." [SANBI:curator] synonym: "enterocytes" EXACT PLURAL [] xref: CL:0000584 xref: TAO:0009269 xref: ZFIN:ZDB-ANAT-070308-209 is_a: CL:0000584 ! enterocyte is_a: ZFA:0009143 ! brush border epithelial cell relationship: end ZFS:0000044 ! Adult relationship: part_of ZFA:0005124 ! intestinal epithelium relationship: start ZFS:0000000 ! Unknown rewritten name (or syn – TBD) FMA class not shown, but it would also subclass generated from xref lattice
Summary: taxon variation in CL • Current solution is a compromise • Constraints • integrate with pre-existing tcAO ontologies • these ontologies have links to gross anatomy • tcAOs loosely integrated with CL • plant cell types should be left to PO • Synchronization remains a challenge
Lessons for gross anatomy cross-ontology link (sample) caro / all cell tissue import metazoa skeleton nervous system gut gonad appendage circulatory system gland mesoderm respiratory airway larva muscle tissue skeletal tissue mollusca arthropoda vertebrata trachea bone mantle mushroom body limb fin vertebra tibia shell cuticle vertebral column foot antenna mesonephros parietal bone cephalopod drosophila teleost mammalia amphibia tentacle neuron types XYZ weberian ossicle mammary gland tibiafibula brachial lobe mouse human zebrafish NO pons
Conclusions • Historically anatomy ontologies have been developed by different groups largely in isolation • The Phenotype RCN should coordinate these efforts • Dynamic Views • Explicit taxonomic relationships
Idealized model (M0) • A single ontology for ontology editors and consumers • Different editors have editing rights to different ontology partitions • by taxon • by domain (e.g. neuroscience, skeletal anatomy) • No taxon-specific subtypes • use structure, function etc as differentia • Users obtain dynamic views according to their needs
Example M0 mammalian view link (small sample) ventral nerve cord cell tissue mesoderm user/editor view gut circulatory system gonad appendage larva gland respiratory airway muscle tissue skeletal tissue nervous system mollusc view neuro view trachea bone mantle limb fin vertebra tibia pons vertebral column mushroom body skeletal view mollusc foot parietal bone metencephalon mesonephros antenna mammary gland weberian ossicle tentacle pupal DN3 period neuron tibiafibula brachial lobe
Slightly less idealized model (M1) • Maintain series of ontologies at different taxonomic levels • euk, plant, metazoan, vertebrate, mollusc, arthropod, insect, mammal, human, drosophila • Each ontology imports/MIREOTs relevant subset of ontology “above” it • this is recursive • Subtypes are only introduced as needed • Work together on commonalities at appropriate level above your ontology