1.04k likes | 1.05k Views
Explore the importance of is_a completeness in ontologies and fixing granularity issues in the organization of Gene Ontology. Discover how accurate ontologies lead to accurate queries and the benefits of filling is_a gaps. Learn about relations in GO ontologies and the concept of ontological representation.
E N D
Weaving and untangling the GO • is_a completeness ~9 slides • granularity & BP ~3 slides • Linking MF to BP ~15 slides • Sensu ~13 slides • linguistic qualifiers vs relations • Linking GO to other ontologies ~40 slides • GO+Cell
Tangled DAGs and complexity • paths increasing • GO process in general has a multiple axes of classification • qualifier -ve +ve • anatomy • structural • spatial • chemical • structural • functional
GO and is_a completeness • Why? • What’s wrong with every term having at least one is_a or part_of parent? • this is the way we’ve always done things
Ontologies should be complete • No errors of omission • is_a completeness is the ontologically correct thing to do • every entity type is a subtype of some other thing • Accurate ontologies = accurate queries • currently a query for “find all kinds of development” does not return “ovarian follicle development” • this is wrong
missing is_as hinders common tool use • We should play nicely with the others in the playground • Most (non-GOC) tools expect is_a completeness • GO looks funny when viewed in other tools • the standard is to show only is_a relations in default tree view • missing is_as breaks reasoners
Filling is_a gaps brings practical benefits • Easier for tools to find inconsistencies in GO • We can start to untangle displays
Example: current displays mix relations • it’s a mess
untangling is_a and part_of • difficult if is_a hierarchy is incomplete • is_a orphans show up at root node in pure is_a display • not everything must have an asserted part_of parent • can infer from is_a parents
The new complete cellular component • Current CC: • 277 is_a orphans / 1688 terms • avg is-a-paths-to-root 1.4 • avg mixed-paths-to-root 6.97 • Jane’s fixed CC: • 0 is_a orphans • avg is-a-paths-to-root 3.36 • avg mixed-paths-to-root 38.6
Fixing the upper levels of BP • The upper portion of any ontology is very important for organisation • Design decisions percolate down • Many users exploring GO top-down see this first • Diamonds are particularly bad in the upper level • significantly increases tangledness
biological process others cellular process physiological process cellular physiological process organismal physiological process
A phenomenon marked by changes that lead to a particular result, mediated by one or more gene products biological process Processes that are carried out at the cellular level, but are not necessarily restricted to a single cell. For example, cell communication occurs among more than one cell, but occurs at the cellular level Those processes specifically pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms cellular process physiological process organismal physiological process cellular physiological process The processes pertinent to the integrated function of a cell The processes pertinent to the function of an organismabove the cellular level; includes the integrated processes of tissues and organs
Consider… (long term view) • Making top division by granularity of the process itself • biological process • molecular level process? • cellular level process • (multi-cellular) level process • These types are disjoint • But what about physiological process? • this is not disjoint from the granularity of the process itself
Outline • We focus on MF & BP • biological example from David • the types and relations in reality • maintaining the ALL-SOME definition of relations • how should this be implemented in the GO? • what links should be manifested • retain some level of redundancy, or eliminate it?
GO:0006548 Histidine catabolism GO:0004397 Histidine ammonia lyase activity GO:0016153 Urocanate hydratase activity GO:0050480 imidazolopropionase activity GO:0050416 Formimidoylglutamate deiminase activity GO:???????? Histidine catabolism to glutamate and formiminotetrahydrofolate GO:0030409 Glutamate- Formimidoyl transferase GO:0050415 Formimidoyl- Glutamase activity GO:0050129 N-formylglutamate deformylase activity GO:0019557 Histidine catabolism to glutamate and formate GO:0019556 Histidine catabolism to glutamate and formamide Overbeek, et al. The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. NAR 2005, 33-17:5691-5702
Ontological Representation • I will try and be clear when I am talking about • types in reality • types we wish to manifest as terms in the GO (or in other ontologies) • all GO terms should be types • not all types need to have terms created - we limit for practical reasons
What are the relations in reality? • Between types in the same ontology, different levels of granularity • part_of • Between functions and processes (at the same level of granularity) • functioning_of • Between component and function • has_function • Between process and component • located_in
What are the instances and relations in reality? some gene product instance some multistep process instance part_of has function some molecular functionING instance functioning of some molecular function instance function process
What are the types and type-level relations in reality? some type of gene product some type of multistep process part (direction?) has function some type of molecular function some type of molecular functionING functioning of function process
types example issues: -- ALL-SOME structure coarse histidine catabolism part? functioning of histidine ammonia lyase function histidine ammonia lyase reaction fine function process
What are the types and relations in reality? histidine catabolism to glutamate and formate issues: -- ALL-SOME structure coarse has part? functioning of Formimidoylglutmate deiminasefunction Formimidoylglutmate deiminasereaction fine function process
We want to capture these real relationships between biological types • Between granular levels • Between orthogonal ontologies • But first we must be clear on the definitions of these types, and which types should be manifested as GO terms
Can we just manifest this in the GO? issues: -- not all function terms have a functionING corresponding term -- even if they do, redundancy is generally to be avoided coarse some type of multistep process has part(?) some type of molecular function some type of molecular functionING functioning of fine function process
We already have some redundancy • function & process redundancy • iron transport (BP) • iron transporter (MF) • function & component redundancy • voltage-gated ion channel function • voltage-gated ion channel complex • If we retain this redundancy, these relations can be trivially added • But we don’t always have this redundancy • not all functions have a corresponding functioning term
Manifest shortcut relationships • one relation standing for two coarse some type of process has part(?) some type of molecular function some type of molecular functionING functioning of fine function process
most functionings are implicit • current paradigm coarse histidine catabolism has part(?) functioning of histidine ammonia lysase function histidine ammonia lyase REACTION fine function process
When do we manifest functions and processes? • Need consistent stable policy • Nothing in function ontology should have activity suffix • even though to a biochemist activity==potential, this is still confusing • Beyond this, do we retain current policy • some redundancy • Or take a more extreme approach • eliminate redundancy • eliminate current ‘activity’ MF terms and manifest corresponding reaction terms in BP (Amelia)
‘purist process’ approach some type of gene product histidine catabolism has function part functioning of histidine ammonia lyase function histidine ammonia lysase reaction function process
When is it safe to eliminate redundancy? • Does functioning always imply function? • iron transport does not imply iron transporter • but we could still extend annotation to allow for specification of functioning-as-function • Reactions and other ‘single-step’ processes involving no helper • function and corresponding functioning imply one another • Redundancy between function and component should be retained • Any obsoletion obviously causes disruption
Difficult functionings • Structural constituents • functioning happens at lower level of granularity than is covered by GO • these will not be linked to process - for now
Implementation • Still need to curate the actual links • trivial links can be computed automatically • Can proceed independently of resolving ontological issues • most likely retain current policy re: manifesting terms • need maintain 3 kinds of links • granular (part, same ontology) • functioning_of (function and functioning) • ‘diagonal’ • ALL-SOME definition
Sensu - outline • Original use • A linguistic qualifier • denote differing community usage of a terminological entity (a term) • Perverted use • A type qualifier • Used for when the part_of structure is specific to an organism type • The fix • provide separate mechanisms for each
Terms vs kinds • The term ‘term’ is confusing • Term (sensu GO) • Term (sensu normal usage) • strings, tokens • GO is not a terminology • A GO ID identifies a type of entity • a kind of entity • a universal (as opposed to instance) • more specific than a class • but not a concept
Sensu - original usage • Sometimes the same string refers to different types • nucleus (sensu particle physicist) • nucleus (sensu astrophysicist) • nucleus (sensu biologist) • Canonical GO example: • bud • no longer relevant, terms obsoleted • trichome
Linguistic qualifiers are about language, not biological reality • No ontological requirement for linguistically related terms to be ontologically related • current GO docs are not correct • trichome, sensu plant community • should not state that there is some biological relation between an instance of a trichome and the plant community
The original usage has been conflated • Organism type specificity is a genuine challenge for the GO • ‘contextual’ part_ofs • e.g. X part_of Y in species Z • Sensu has been wrongly recruited to fix this • standard pattern: • X, sensu Zpart_ofY • X, sensu Z is_a Z • Two problems • conflation of meaning of sensu • conflation results in lack of precision • “as in, but not restricted to taxon” not rigorous enough
Two problems, two solutions • Retain sensu as a linguistic qualifier only • re-interpret as: sensu S community • no requirement for taxon IDs • no ontology structure requirements • Introduce a new relation for genuine organism-type specific terms • in_organism • standard inference rules can be used • e.g. • X in_organism X’, Y in_organism Y’, X is_a Y <=> X’ is_a Y’
Contextual synonyms [Term] name: trichome (sensu insecta) synonym: EXACT “hair” [] synonym: EXACT “trichome” [] {context=insecta} def: “a polarized cellular extension that covers much of the insect epidermis” [Term] name: trichome (sensu plant) synonym: EXACT “trichome” [] {context=plant} def: “An outgrowth from the epidermis. Trichomes vary in size and complexity and include hairs, scales, and other structures and may be glandular. In Arabidopsis, patterning of trichome development is not random but does not appear to be lineage-based like stomata”
Advantages • Lexical qualifiers dealt with use lexical oboedit tags • No need to be as specific as a taxon • only as specific as is needed to decontextualise • No false reasoning is done over synonyms • cellular component types and cell types should not be siblings • Big user-friendliness win? • Displays customised for particular users may choose to display contextual exact synonyms in place of the wordier sensu name
in_organism • Standard ALL-SOME definition: • Type level definition: • P in_organism O • for all instances p of P, there exists some organism o of type O, and some time t, such that p in_organism o at time t • More specific relation than located_in in OBO relations ontology • Standard logical rules can be applied
photosystem I thylakoid is_a is_a part of photosystem I, in cyanobacteria thylakoid, in cyanobacteria in organism in organism cyanobacteria
Open question • Sometimes the relation between two types is largely lexical • eg trichome • Sometimes it isn’t so clear • Can we have both a relation to a taxon, and a contextual synonyms • Is ‘eye’ an exact contextual synonym for ‘compound eye’ for the arthropod community?
Practical considerations • Use NCBI Taxonomy as our organism ontology • xref or relationship tags? • xrefs are more lightweight • relationship tags are more accurate • relationship tags would be ‘dangling’ unless organism ontology is loaded • See next section…