510 likes | 987 Views
PATO An Ontology of Phenotypic Qualities. George Gkoutos University of Cambridge. Phenotype Information. Literature Qualitative descriptions Experimental data Qualitative descriptions Quantitative descriptions Various representation methodologies Complex phenotype data Need for :
E N D
PATOAn Ontology of Phenotypic Qualities George Gkoutos University of Cambridge
Phenotype Information • Literature • Qualitative descriptions • Experimental data • Qualitative descriptions • Quantitative descriptions • Various representation methodologies • Complex phenotype data • Need for : “A platform for facilitating mutual understanding and interoperability of phenotype information across species and domains of knowledge amongst people and machines” …..
Assay Controlled Vocabulary • Abnormality • Relative_to • Ranges of values • Allows the schema to be dynamic • Definition of qualities and their relations • Explicit differences (between laboratories) • Allows labs around the world to “plug-in” their • assays to the schema Phenotypic Character Assay Phenotypic Character Phenotypic Character
Phenotypic character representation methodologies • Pre-composition • Examples: • MGI Mouse genotype-phenotype annotation (Mammalian Phenotype) • Gramene trait annotation (Plant trait ontology) • etc. • Pre-composition often follows the compositional structure occasionally adopted by GO terms. • Positive/negative regulation of mitosis positive/negative + regulation of mitosis (GO:0045839) • Increased/decreased angiogenesis increased/decreased + angiogenesis (GO:0001525) • Advantages • Easy for annotation • Control • Complex phenotypic information • Disadvantages • Lack of rigidity • Ontology management • Expansion • Quantitative data
Methodologies (cont.) • post-composition • The post-composition methodology takes advantage of the ability to describe phenotypes by describing the particular affected entity (bearer), which could be an anatomical structure, a biological process, a particular function etc. , and the qualities that this entity possess, which can be described either in qualitative or quantitative terms. • Advantages • Ontology management • Rigidity • expansion • Quantitative data • Advanced queries • Disadvantages • Complex phenotypic information • More difficult for annotation • Need for constraints for ensuring meaningful annotations
Phenotype And Trait Ontology (PATO) • An ontology of phenotypic qualities, which can be shared across different species and domains of knowledge. • Qualities are the basic entities that we can perceive and/or measure: • colors, sizes, masses, lengths etc. • Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exist. • Qualities belong in a finite set of quality types (i.e. color, size etc) and inhere in specific individuals. No two individuals can have the same quality, and each quality is specifically constantly dependent on the entity it inheres in.
Phenotypic Character Core Ontologies (e.g. anatomy, behaviour, pathology) PATO Species Independent PATO Species Independent Entity (E) Quality (Q) EQ Phenotype Description EQ Phenotype Description
entity + quality Phenotypic Character (mouse anatomy: body + PATO: weight) (mouse body weight) (Drosophila anatomy: eye + PATO: colour) (eye colour) (glucose concentration) (ChEBI: glucose + PATO: concentration) Simple phenotype descriptions increased size hepatocellular carcinoma hepatocellular carcinoma (MPATH:357) has_qualityincreased size (PATO:0000586)
Phenotype annotation model Genetic Environment Evidence Qualifier Assertion Source Entity Quality relationship Attribution Properties Units Who makes the assertion When, what organization
Annotation:Phenotypes in literature Evidence: light microscopy Source: PMID:8431945 Assertion eya1 E=eye disc (FBbt:00001768) Q=condensed (PATO:0001485) influences appears Date: 10/26/2007 Organization: FlyBase Version: 1 M. Ashburner
Quantitative Data • PATO – part of a representation of qualitative phenotypic information • More often than not it is important to record quantitative information that results from a specific measurement of a quality • Measurements involve units (Phenotypic Character + Unit) The tail of my mouse is 2.1 cm
PATO & measurements • UO – an ontology of unit • UO’s top-level division is between primary base units of a particular measure and units that are derived from base units • mapping between the various scalar qualities (such as weight, height, concentration etc.) and the corresponding units used to measure those qualities • UO includes 264 terms, all of which are defined • email list (http://sourceforge.net/mailarchive/forum.php?forum_id=50613)
Linking quantitative data to qualitative descriptions • Measurement qualitative description • Assay • range • normality • necessary & sufficient conditions • EQ descriptor high level annotation marking phenodeviance (e.g. MP)
Multiple phenotypic characters to describe complex phenotypes SHH-/+ SHH-/- shh-/+ shh-/-
Phenotype (character) = entity + quality
Phenotype (character) = entity + quality P1 = eye + hypoteloric
Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic
Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied
Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +
Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied Phenotype = P1 + P2 + P3 (phenotypic profile) = holoprosencephaly
Assays for complex phenotype data & quantitative data Phenotypic Character Assay Phenotypic Character Phenotypic Character • necessary • necessary & sufficient • phenodeviance
Linking qualitative descriptions across species • Decomposition of precomposed phenotype ontologies by providing logical definitions based on PATO • Link annotations across different knowledge domains and species • Link phenotypic descriptions of human diseases to animal models
Reconciling pre and post composed annotations • Retrospective PATO definitions of pre-coordinated terms in phenotype ontology • Precomposed Ontologies • Mammalian Phenotype • Plant trait • Worm phenotype • etc. • OMIM
EQ definitions Aristotelian definitions (genus-differentia) A <Q> *which* inheres_in an <E> [Term] id: MP:0001262 name: decreased body weight namespace:mammalian_phenotype_xp Synonym:low body weight Synonym: reduced body weight def: "lower than normal average weight “[] is_a: MP:0001259 ! abnormal body weight intersection_of: PATO:0000583 ! decreased weight intersection_of: MA:0002405 ! adult mouse
Phenotypic information captured differently within the same domain (OMIM)
Phenotypic information captured differently across different domains • MP:0001265 – decreased body size • MP:0001255 – decreased body height • WBPhenotype0000229 – small • OMIM %210710 – short stature
Logical definitions allow for cross species – domain links [Term] id: MP:0001265 ! decreased body size intersection_of: PATO:0000587 ! decreased size intersection_of: inheres_in MA:0002405 ! adult mouse [Term] id: MP:0001255 ! decreased body height intersection_of: PATO:0000569 ! decreased height intersection_of: inheres_in MA:0002405 ! adult mouse [Term] id: WBPhenotype0000229 ! small intersection_of: PATO:0000587 ! decreased size intersection_of: OBO_REL:inheres_in WBls:0000041 ! Adult [Term] id: OMIM:xxxxxxx ! short stature intersection_of: PATO:0000587 ! decreased size intersection_of: OBO_REL:inheres_in FMA!:20394 ! Body [Term] id: OMIM:xxxxxxx ! short stature intersection_of: ATO:0000569 ! decreased height intersection_of: OBO_REL:inheres_in FMA:20394 ! Body
Experimental Design • Annotate 11 human disease genes, and their homologs • Develop search algorithm that utilizes the ontologies for comparison • Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” • alleles of the same gene • homologs in different organisms • members of a pathway (same organism) • members of a pathway (other organisms)
Strategy for Annotation Leverage OMIM gene and related disease records Use FMA, CL, GO, EDHAA, CHEBI, PATO ontologies Annotate 5 (in parallel) to check for curator consistency Annotate fly & fish orthologs (FB, ZFA) Import mouse ortholog data (MA, MP)
Testing the methodology Annotated 11 gene-linked human diseases described in OMIM, and their homologs in zebrafish and fruitfly:
Experimental Design • Annotate 11 human disease genes, and their homologs • Develop search algorithm that utilizes the ontologies for comparison • Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” • alleles of the same gene • homologs in different organisms • members of a pathway (same organism) • members of a pathway (other organisms)
Ontology-based similarity scoring Measure IC of any node: Compute ‘similarity’ by finding IC ratios between any genotypes, genes, classes, etc.
Ontology-based Search Algorithm c ∈ A(q) iff link(r,q,c) link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone) Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q). First step: create an annotation profile for the thing to be searched (i.e., a gene) The annotation profile is the set of classes used to annotate that entity, and their ancestors Comparing annotation profiles using same similarity IC metric
Experimental Design • Annotate 11 human disease genes, and their homologs • Develop search algorithm that utilizes the ontologies for comparison • Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” • alleles of the same gene • homologs in different organisms • members of a pathway (same organism) • members of a pathway (other organisms)
UBERON: an anatomical linking ontology Each organism has its own anatomical ontology To connect annotations across species, need a way to link the anatomies Wanted an ontology that incorporated both functional homology and anatomical similarity Created an ontology linking anatomies from ZFA, FMA, XAO, MA, MIAA, WBbt, FBbt
UBERON connects phenotype entities from separate anatomy ontologies
Experimental Design • Annotate 11 human disease genes, and their homologs • Develop search algorithm that utilizes the ontologies for comparison • Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” • alleles of the same gene • homologs in different organisms • members of a pathway (same organism) • members of a pathway (other organisms)
shha is phenotypically similar to homologous pathway members
Results thus far • Annotate 11 human disease genes, and their homologs • Develop search algorithm that utilizes the ontologies for comparison • Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” • alleles of the same gene • homologs in different organisms • members of a pathway (same organism) • members of a pathway (other organisms)
Conclusions Ontologies help Promising new directions for ontology-based phenotype annotation Promising ways for identifying novel pathway members, generating hypotheses to test at the bench