1 / 47

Real-life ontology development:

Real-life ontology development:. lessons from the Gene Ontology. What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned. Gene Ontology. Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases”

damienm
Download Presentation

Real-life ontology development:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-life ontology development: lessons from the Gene Ontology

  2. What is GO? • Evolution of GO • Mechanisms of updating GO • Tools for ontology development • Lessons learned

  3. Gene Ontology • Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” • Applicable to all species

  4. Gene Ontology - scope • Three disjoint axes: • molecular function • molecular role e.g. catalytic activity, binding • biological process • broad biological phenomena e.g. mitosis, growth, digestion • cellular component • sub-cellular location e.g nucleus, ribosome, origin recognition complex

  5. Gene Ontology • Directed acyclic graph (DAG) • Terms connected by two transitive relations (edges): • is_a • part_of

  6. Gene Ontology • Developed by an international consortium • about 50 members • Editorial office, 4 full-time editors (ish) • Many other part-time editors at databases • Multiple changes made a day • made live immediately

  7. Gene Ontology • Main ontology format OBO flat file • Changes are live immediately • no releases • Propagated to GO database • monthly snapshots archived

  8. Evolution of GO • Original GO created in 2000 • Three databases involved: • FlyBase (Drosophila) • MGI (Mouse) • SGD (S. cerevisae) • Used immediately

  9. Evolution of GO • Later databases: • TAIR (Arabadopsis) • TIGR (microbes including prokaryotes) • SWISS-PROT (several thousand species inc. human) • PSU (P. falciparum) • Recent additions • ZFIN (zebrafish) • PAMGO (plant pathogens)

  10. Evolution of GO • GO development traditionally annotation-driven • development directed by use • Terms added as new species annotated • Terms added on as as-needed basis

  11. Evolution of GO • Resulted in ‘organic’ structure, little formality • Ontological formality added subsequently • philosophical and logical

  12. Growth of GO

  13. Modifying the graph: • Before:

  14. Modifying the graph: • But then I need to annotate VW Beetles, pre-1980 • The graph no longer works, because the engine is in the boot

  15. Modifying the graph: • After:

  16. Mechanisms for ontology change • Small incremental changes • Initially all changes to the ontologies made this way

  17. Mechanisms for ontology change • Suggested changes initially submitted by email • Moved to an online tracking system when this became unmanageable

  18. Requesting changes to GO - curator requests tracker • Web-based tracking system hosted at SourceForge.net • Public • Tracker item for each new request or question

  19. Curator requests tracker

  20. Mechanisms for ontology change • Problems: • Larger questions about the higher ontology structure remain unresolved • Makes some items impossible to close • No sense of the ‘big picture’ • Large areas of the ontologies missing or incomplete because no annotations • Massive volume • needed to increase the number of editors

  21. Mechanisms for ontology change • Larger-scale changes: • content meetings • interest groups

  22. Content meetings • Short meetings aimed at developing specific areas of GO ontology content • proposals refined and discussed before meeting • small number of people (10-15) • invited experts • specific topics

  23. Content meetings • Further refinements made following meeting by email • Changes are made once consensus reached • Large number of terms typically added (500+)

  24. Content meetings • Recent meetings: • immunology • interactions between organisms • CNS development

  25. Content meetings • Advantages • Allows a lot of detailed work to be done on a very specific area • Involves external expertise

  26. Content meetings • Problems: • Expensive - everyone has to be in the same location • Only works for very specific topics • Long lag time getting terms into ontologies

  27. Interest groups • Groups of experts for a specific topic • e.g. development, cell cycle, plants • Includes GO curators/annotators and external experts • Don’t typically meet face to face

  28. Interest groups • Communicate via email, desktop sharing etc • Transporters area of the ontology recently revised this way

  29. Interest groups • Advantages • Cheap, no travel required • Allows a lot of detailed work to be done on a very specific area • Involves external expertise

  30. Interest groups • Disadvantages • Harder to reach consensus when not face to face • Projects tend to drag on

  31. Mechanisms for ontology change • Systematic changes via small working groups

  32. Systematic changes • Projects not directly related to biological content • Systematic changes throughout ontology • Small group of GO consortium members • meets regularly by desktop sharing, voice over IP • Experts recruited to meetings as needed

  33. Systematic changes • Changes either • made on a branch of the ontology and merged in later • always have big problems merging branched file into main file • merged directly into live ontology after session • fast, but people get angry

  34. is_a complete • GO contains both is_a and part_of relations • Typically, graphs a mixture of incomplete is_a and part_of hierarchies • A result of ‘organic’ evolution of GO • All graphs now have complete is_a paths to root

  35. partial disjointness • Biological process terms organised by granularity: • cellular process • multicellular organism process • multi-organism process • To avoid massive increase in number of paths to root, these terms are disjoint • no is_a children in common

  36. sensu • sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings • e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)

  37. sensu • Current project to remove the sensu term strings • Replace with strings that represent the true differentiae • e.g. • cell wall (sensu Bacteria) -> peptidoglycan-based cell wall • cell wall (sensu Fungi) -> chitin- and beta-glucan-containing cell wall

  38. Systematic changes to GO • Advantages • Fast • Efficient • Small number of people required

  39. Systematic changes to GO • Disadvantages • Difficult to obtain wider consensus • Changes sometimes have to be undone

  40. Useful tools for ontology development • WebEx • desktop sharing, can control each others desktops • wiki • mainly internal • Skype • free international calls! • conference calls • not free

  41. Tracking changes to GO • General tracking • files stored in cvs, all differences trackable (in theory) • far from ideal - frequent discussion is should we history track, date-stamp terms?

  42. Tracking changes to GO • Obsolete terms • formerly stored within the ontology • in OBO format made a special kind of deprecated term (tag is_obsolete) • Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms

  43. Tracking changes to GO • Crediting experts • traditionally no mechanism for doing this • creating abstracts for content meetings, adding tag to term • as yet no mechanism for crediting individuals

  44. Useful tools for ontology development • OBO-Edit • ontology editor originally developed for GO • can be used for any OBO format ontology • developed by group of users

  45. Useful tools for ontology development • Reasoner integrated into OBO-Edit • based on OBOL • detects missing links, redundant links, • soon misplaced terms, automatic term creation • Validation system • typographical errors, is_a orphans, duplicate synonyms etc.

  46. Lessons learned • An ontology doesn’t have to be perfect or complete to be used • For domain ontologies, external experts should be involved • Communication is critical • You will never please everyone

More Related