480 likes | 495 Views
Real-life ontology development:. lessons from the Gene Ontology. What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned. Gene Ontology. Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases”
E N D
Real-life ontology development: lessons from the Gene Ontology
What is GO? • Evolution of GO • Mechanisms of updating GO • Tools for ontology development • Lessons learned
Gene Ontology • Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” • Applicable to all species
Gene Ontology - scope • Three disjoint axes: • molecular function • molecular role e.g. catalytic activity, binding • biological process • broad biological phenomena e.g. mitosis, growth, digestion • cellular component • sub-cellular location e.g nucleus, ribosome, origin recognition complex
Gene Ontology • Directed acyclic graph (DAG) • Terms connected by two transitive relations (edges): • is_a • part_of
Gene Ontology • Developed by an international consortium • about 50 members • Editorial office, 4 full-time editors (ish) • Many other part-time editors at databases • Multiple changes made a day • made live immediately
Gene Ontology • Main ontology format OBO flat file • Changes are live immediately • no releases • Propagated to GO database • monthly snapshots archived
Evolution of GO • Original GO created in 2000 • Three databases involved: • FlyBase (Drosophila) • MGI (Mouse) • SGD (S. cerevisae) • Used immediately
Evolution of GO • Later databases: • TAIR (Arabadopsis) • TIGR (microbes including prokaryotes) • SWISS-PROT (several thousand species inc. human) • PSU (P. falciparum) • Recent additions • ZFIN (zebrafish) • PAMGO (plant pathogens)
Evolution of GO • GO development traditionally annotation-driven • development directed by use • Terms added as new species annotated • Terms added on as as-needed basis
Evolution of GO • Resulted in ‘organic’ structure, little formality • Ontological formality added subsequently • philosophical and logical
Modifying the graph: • Before:
Modifying the graph: • But then I need to annotate VW Beetles, pre-1980 • The graph no longer works, because the engine is in the boot
Modifying the graph: • After:
Mechanisms for ontology change • Small incremental changes • Initially all changes to the ontologies made this way
Mechanisms for ontology change • Suggested changes initially submitted by email • Moved to an online tracking system when this became unmanageable
Requesting changes to GO - curator requests tracker • Web-based tracking system hosted at SourceForge.net • Public • Tracker item for each new request or question
Mechanisms for ontology change • Problems: • Larger questions about the higher ontology structure remain unresolved • Makes some items impossible to close • No sense of the ‘big picture’ • Large areas of the ontologies missing or incomplete because no annotations • Massive volume • needed to increase the number of editors
Mechanisms for ontology change • Larger-scale changes: • content meetings • interest groups
Content meetings • Short meetings aimed at developing specific areas of GO ontology content • proposals refined and discussed before meeting • small number of people (10-15) • invited experts • specific topics
Content meetings • Further refinements made following meeting by email • Changes are made once consensus reached • Large number of terms typically added (500+)
Content meetings • Recent meetings: • immunology • interactions between organisms • CNS development
Content meetings • Advantages • Allows a lot of detailed work to be done on a very specific area • Involves external expertise
Content meetings • Problems: • Expensive - everyone has to be in the same location • Only works for very specific topics • Long lag time getting terms into ontologies
Interest groups • Groups of experts for a specific topic • e.g. development, cell cycle, plants • Includes GO curators/annotators and external experts • Don’t typically meet face to face
Interest groups • Communicate via email, desktop sharing etc • Transporters area of the ontology recently revised this way
Interest groups • Advantages • Cheap, no travel required • Allows a lot of detailed work to be done on a very specific area • Involves external expertise
Interest groups • Disadvantages • Harder to reach consensus when not face to face • Projects tend to drag on
Mechanisms for ontology change • Systematic changes via small working groups
Systematic changes • Projects not directly related to biological content • Systematic changes throughout ontology • Small group of GO consortium members • meets regularly by desktop sharing, voice over IP • Experts recruited to meetings as needed
Systematic changes • Changes either • made on a branch of the ontology and merged in later • always have big problems merging branched file into main file • merged directly into live ontology after session • fast, but people get angry
is_a complete • GO contains both is_a and part_of relations • Typically, graphs a mixture of incomplete is_a and part_of hierarchies • A result of ‘organic’ evolution of GO • All graphs now have complete is_a paths to root
partial disjointness • Biological process terms organised by granularity: • cellular process • multicellular organism process • multi-organism process • To avoid massive increase in number of paths to root, these terms are disjoint • no is_a children in common
sensu • sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings • e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)
sensu • Current project to remove the sensu term strings • Replace with strings that represent the true differentiae • e.g. • cell wall (sensu Bacteria) -> peptidoglycan-based cell wall • cell wall (sensu Fungi) -> chitin- and beta-glucan-containing cell wall
Systematic changes to GO • Advantages • Fast • Efficient • Small number of people required
Systematic changes to GO • Disadvantages • Difficult to obtain wider consensus • Changes sometimes have to be undone
Useful tools for ontology development • WebEx • desktop sharing, can control each others desktops • wiki • mainly internal • Skype • free international calls! • conference calls • not free
Tracking changes to GO • General tracking • files stored in cvs, all differences trackable (in theory) • far from ideal - frequent discussion is should we history track, date-stamp terms?
Tracking changes to GO • Obsolete terms • formerly stored within the ontology • in OBO format made a special kind of deprecated term (tag is_obsolete) • Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms
Tracking changes to GO • Crediting experts • traditionally no mechanism for doing this • creating abstracts for content meetings, adding tag to term • as yet no mechanism for crediting individuals
Useful tools for ontology development • OBO-Edit • ontology editor originally developed for GO • can be used for any OBO format ontology • developed by group of users
Useful tools for ontology development • Reasoner integrated into OBO-Edit • based on OBOL • detects missing links, redundant links, • soon misplaced terms, automatic term creation • Validation system • typographical errors, is_a orphans, duplicate synonyms etc.
Lessons learned • An ontology doesn’t have to be perfect or complete to be used • For domain ontologies, external experts should be involved • Communication is critical • You will never please everyone