440 likes | 588 Views
Phenotype annotation. Chris Mungall Lawrence Berkeley Labs NCBO GO. Outline. Principles of Compositionality Tour of PATO Pre vs post composition Quantitative phenotypes Next steps. Phenotype annotation: why?. To shed light on the relationships between genes, environment and phenotype
E N D
Phenotype annotation Chris Mungall Lawrence Berkeley Labs NCBO GO
Outline • Principles of Compositionality • Tour of PATO • Pre vs post composition • Quantitative phenotypes • Next steps
Phenotype annotation: why? • To shed light on the relationships between genes, environment and phenotype • To compare genes and phenotypes across organisms • To improve human health and wellbeing
Difficulties • Phenotypes can be complex • Descriptions are often composite • Encompass relationships between different kinds of entities, at different levels of granularity • Different ways of describing the same thing • Descriptions must be rigorous and unambiguous • Ensures meaningful analyses and comparisons within and between organisms
Compositionality is essential for describing phenotypes • Compositionality is a principle of good ontology design • aka building blocks, cross-products, normalised/modular design • Create complex descriptions (definitions) from simpler ones • Descriptions can be composed at any time • Ontology construction time (pre-composition) • Annotation time (post-composition)
An example of compositionality • Plasma membrane of spermatocyte • Plasma membrane[GO CC] • Spermatocyte[OBO Cell] • Formal means of composition • Genus-differentia Genus Differentia aplasma membranewhichispart_ofaspermatocyte GO-CC OBO-REL Cell
Compositionality and ontology tools • Composition supported by: • Phenote • OBO-Edit • Cross-product plugin • Protégé-OWL • SWOOP • …and others
Advantage: Automatic DAG calculation amembranewhichispart_ofagerm cell aplasma membranewhichispart_ofaspermatocyte
The building blocks of phenotype descriptions: EQ • Entities and qualities (EQ) • (Bearer) Entity • E.g: compound eye, spermatocyte, blood, wing growth, scale morphogenesis • Quality (aka property, attribute) • A kind of dependent continuant • Defined in PATO • E.g: green, hot, squamous, rugose, edematous, light-sensitivity, luminescent, ectopic, arrested, decomposed
Formal treatment of EQ • We must be clear about what we mean when we compose an E and a Q • Otherwise we will have incomplete query results and erroneous statistics in annotations • The meaning must be computable • Formally, an EQ description defines: aQualitywhichinheres_inabearer entity Which implicitly refers to: abearer entity whichbears aQuality
Example normal eya[1]/eya[1]
Kinds of entities which can be bearers of biological qualities • Continuants (3D entities) • Cell parts (GO) • Cells (OBO Cell ontology) • Gross anatomical entities (CARO, FMA, flyAO, MA, zfishAO, …) • Aggregates of organisms (?) • Occurrents (4D entities) • Biological processes (GO)
PATO normal eya[1]/eya[1] GO FlyAO
Tour of PATO • Tour from the top-down • The top level of PATO has been built according to formal ontological principles • This helps us define terms in a consistent and unambiguous way • The top level can be hidden from end-users by means of ontology views (aka slims) • Still subject to change • Feedback welcome!
PATO: Top level division Note: some nodes omitted for brevity Quality Quality of a continuant A quality which inheres In a continuant Quality of an occurrent A quality which inheres In a process or spatiotemporal region physical quality cellular quality morphology duration rate color density shape size structure arrested premature delayed
Divisions by granularity Monadic quality of a continuant … Physical quality A quality that exists through action of continuants at the physical level of organisation Cellular quality A quality that exists at the cellular level of organisation … nucleate quality ploidy potency color temperature mass green diploid multipotent large mass pink hot haploid totipotent anucleate small mass cold yellow aneuploid oligoptent binculeate
Monadic vs relational quality of a continuant … Monadic quality of a C A quality of a C that inheres solely in the bearer and does not require another entity Relational quality of a C A quality of a C that requires another entity apart from its bearer to exist … Sensitivity (to) Displacement (with) Connected-ness (to) Physical quality Cellular quality morphology shape size structure
Example relational quality • Sensitivity • Directed towards some entity type • E.g. • Sensitivity of an eye to red light • The quality inheres_in the eye • With respect to (towards) red light • Pheno-syntax: • E= eye Q= sensitivity E2= red_light
On absence • Annotation patterns for absence, counts are currently under discussion • “spermatocyte devoid of asters” • E= CL:spermatocyte • Inheres in the spermatocyte • Q= PATO:lacks_part • The quality/relation of missing some part or parts • E2= GO-CC:aster • The quality is with respect to the type “aster”
Pre- vs post- composition • When do we build the phenotype description? • In the ontology • During annotation? • Reconciling pre and post composition: An analysis of the plant_trait ontology
When do we build the phenotype description? • Early? • Pre-composed phenotype definitions • MP:0000017 “big ears” • TO:0000227 “root length” • TO:0000029 “chlorine sensitivity” • Late? • Post-composed phenotype definitions • E= MA:ear Q= PATO:big • E= PO:root Q= PATO:length • E= organism Q= PATO:sensitivity E2= CHEBI:chlorine
Is this comparable? MP:0000285 “abnormal cardiac valve morphology” MP:0000287 “heart valve hypoplasia” ? PATO:0000051 “morphology” PATO:0000141 “structure” E= MA:heart_valve Q=PATO:hypoplastic PATO:0000645 “hypoplastic”
Yes: if term is decomposable MP:0000285 “abnormal cardiac valve morphology” MP:0000287 “heart valve hypoplasia” Def: ahypoplasticitywhichinheres_inaheart valve = PATO:0000051 “morphology” PATO:0000141 “structure” E= MA:heart_valve Q=PATO:hypoplastic PATO:0000645 “hypoplastic”
Comparing phenotypes • We want to compare and query both within and across species • For gross anatomical phenotypes to be compared across species, descriptions must be decomposed or decomposable to anatomical terms • Anatomical terms must be comparable • Homology links • CARO: Common Anatomy Reference Ontology
Case study: Defining plant traits with PATO • OBO Plant Trait ontology • Pre-composed phenotype terms • Analagous to OBO mammalian_phenotype ontology • Task: Define these terms with PATO • A good test of PATO • Demonstration of compositional approach • Allows meaningful comparison across plant species • Pilot study before applying to metazoans http://www.bioontology.org/wiki/index.php/PATO:Pre_vs_Post_Coordinating
Methods • Creation of genus-differentia definitions • First pass: Obol • Second pass: manual editing • Ontologies used • PATO • Plant anatomical entities (PO) • Gramene environment (GEO) • Chemical entities of biological interest (CHEBI) • GO
Basic phenotype terms • “root length” (TO:00000227) • E= PO:root Q= PATO:length • Formally: Def: alengthwhichinheres_inaroot
Relational qualities involving types of chemical • “Chlorine sensitivity” [TO:0000029] • Directed towards an additional entity type • Q= PATO:sensitivity E2= CHEBI:chlorine Def: asensitivitywhich is directed towardschlorine [ inheres_inorganism ]
Relational qualities involving the environment • “drought sensitivity” [TO:0000029] • Directed towards an additional entity type • Q= PATO:sensitivity E2= EO:drought Def: asensitivitywhichis directedtowardsdrought [ inheres_inorganism ] OBO needs a good environment ontology
Complex phenotypes • “Chinsura boro” • "Abortion of microspore development at trinucleate stage” Def: aarrestedwhichinheres_in (microspore development whichduring trinucleate stage )
Results of plant_trait analysis • 252/784 terms provided with genus-differentia definitions so far • Helped find inconsistencies and problems in the ontology • New term suggestions for PATO • proportionality • Approach should work for animal phenotype ontologies
Bacterial phenotypes • Performed similar analysis on bacterial phenotype terms • Provided by Garrity & Hozzein • Results (morphological only): • 26 new terms added to PATO • Rugose, rhizoidal, lobate, filamentous, … • Todo: chemical utilization phenotypes • Required: • Ontologies for aggregates of organisms • Assay ontology
Measurements • Ontologies provide qualitative partitions on the kinds of entities we find in nature • We may also want to record quantitative information • Comes from measurements of qualities • The measurement is not the phenotype • Phenotypes exist independently of our measurements of them
Measurement schema • A measurement record consists of • The quality being measured • E.g. the length of a particular mouse tail • The unit type • From PATO UO • A magnitude • Floating point number • Error measure [optional]
Sample of PATO UO • Unit • Base unit • Length unit • Angstrom • meter • Mass unit • Dalton • Gram • Substance unit • Derived unit • Concentration unit • pH • Quality • Morphology • Size length • Physical quality • Mass
Phenotype exchange formats • Genotypes and phenotypes: • Pheno-syntax • Pheno-XML • General purpose • OWL (using canonical EQ encoding) • Also has Obo equivalent • GO annotation files • Works with pre-coordinated terms only
OBD-Phenotype • A database for phenotype associations • Built on OBD framework • Tuned for inference and reasoning • Graph traversal built in from the start • Results • Annotations on data from OMIM, ZFIN and FlyBase • Currently too small a dataset to do analysis
Next steps • Get PATO & Phenote used across multiple organisms and projects • MODs, BIRN, OMIM, • Collect annotation data from multiple sources in one repository (OBD) • Both pre + post composed • Demonstrated improved analysis of annotation data using PATO
filamentous - having thin filamentous extensions at its edge • pleomorphic - a quality inhering in a cell by virtue of it ability to take on two or more different shapes during its life cycle • pulvinate - shaped like a cushion or has a marked convex cushion-like form • umbonate - having a knob or knoblike protuberance • rugose - having many wrinkles or creases on the surface • glistening - emitting or reflecting lots of light • dull - emitting or reflecting little or no light • viscid - covered with a sticky or clammy coating • mucoid - consistency of mucus • spiral - plane curve traced by a point circling about the center but at increasing distances from the center • rhizoidal - having root like extensions radiating from its center • spiny - having spines, thorns or similar stiff projections on its surface • warty - having a hard rough surface; not smooth • curled - having parallel chains in undulate fashion on the border • fragile - easily damaged or disrupted; brittle • butyraceous - resembling butter in appearance and consistency • undulate - having a wavy, shallow edge • punctiform - small and resembling a point • lobate - a morphological quality in which the bearer has deeply undulated edges forming lobes • erose - having an irregularly toothed edge • raised - is a thick colony that appear above the medium surface with terraced edges • convex - a shape that obtains by virtue of having inward facing edges; having a surface or boundary that curves or bulges outward, as the exterior of a sphere
Proportions • “amylose to amylopectin ratio”TO:0000372 Def: acompositionalitywhichis directedtowardsamylose relative_toamylopectin [ inheres_inorganism ]