480 likes | 497 Views
Explore the evolution and mechanisms of updating the Gene Ontology, tools for development, lessons learned, and the ontology's applicability to all species. Discover the ontology's structure, growth, and the challenges faced in modification and change processes.
E N D
Real-life ontology development: lessons from the Gene Ontology
What is GO? • Evolution of GO • Mechanisms of updating GO • Tools for ontology development • Lessons learned
Gene Ontology • Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” • Applicable to all species
Gene Ontology - scope • Three disjoint axes: • molecular function • molecular role e.g. catalytic activity, binding • biological process • broad biological phenomena e.g. mitosis, growth, digestion • cellular component • sub-cellular location e.g nucleus, ribosome, origin recognition complex
Gene Ontology • Directed acyclic graph (DAG) • Terms connected by two transitive relations (edges): • is_a • part_of
Gene Ontology • Developed by an international consortium • about 50 members • Editorial office, 4 full-time editors (ish) • Many other part-time editors at databases • Multiple changes made a day • made live immediately
Gene Ontology • Main ontology format OBO flat file • Changes are live immediately • no releases • Propagated to GO database • monthly snapshots archived
Evolution of GO • Original GO created in 2000 • Three databases involved: • FlyBase (Drosophila) • MGI (Mouse) • SGD (S. cerevisae) • Used immediately
Evolution of GO • Later databases: • TAIR (Arabadopsis) • TIGR (microbes including prokaryotes) • SWISS-PROT (several thousand species inc. human) • PSU (P. falciparum) • Recent additions • ZFIN (zebrafish) • PAMGO (plant pathogens)
Evolution of GO • GO development traditionally annotation-driven • development directed by use • Terms added as new species annotated • Terms added on as as-needed basis
Evolution of GO • Resulted in ‘organic’ structure, little formality • Ontological formality added subsequently • philosophical and logical
Modifying the graph: • Before:
Modifying the graph: • But then I need to annotate VW Beetles, pre-1980 • The graph no longer works, because the engine is in the boot
Modifying the graph: • After:
Mechanisms for ontology change • Small incremental changes • Initially all changes to the ontologies made this way
Mechanisms for ontology change • Suggested changes initially submitted by email • Moved to an online tracking system when this became unmanageable
Requesting changes to GO - curator requests tracker • Web-based tracking system hosted at SourceForge.net • Public • Tracker item for each new request or question
Mechanisms for ontology change • Problems: • Larger questions about the higher ontology structure remain unresolved • Makes some items impossible to close • No sense of the ‘big picture’ • Large areas of the ontologies missing or incomplete because no annotations • Massive volume • needed to increase the number of editors
Mechanisms for ontology change • Larger-scale changes: • content meetings • interest groups
Content meetings • Short meetings aimed at developing specific areas of GO ontology content • proposals refined and discussed before meeting • small number of people (10-15) • invited experts • specific topics
Content meetings • Further refinements made following meeting by email • Changes are made once consensus reached • Large number of terms typically added (500+)
Content meetings • Recent meetings: • immunology • interactions between organisms • CNS development
Content meetings • Advantages • Allows a lot of detailed work to be done on a very specific area • Involves external expertise
Content meetings • Problems: • Expensive - everyone has to be in the same location • Only works for very specific topics • Long lag time getting terms into ontologies
Interest groups • Groups of experts for a specific topic • e.g. development, cell cycle, plants • Includes GO curators/annotators and external experts • Don’t typically meet face to face
Interest groups • Communicate via email, desktop sharing etc • Transporters area of the ontology recently revised this way
Interest groups • Advantages • Cheap, no travel required • Allows a lot of detailed work to be done on a very specific area • Involves external expertise
Interest groups • Disadvantages • Harder to reach consensus when not face to face • Projects tend to drag on
Mechanisms for ontology change • Systematic changes via small working groups
Systematic changes • Projects not directly related to biological content • Systematic changes throughout ontology • Small group of GO consortium members • meets regularly by desktop sharing, voice over IP • Experts recruited to meetings as needed
Systematic changes • Changes either • made on a branch of the ontology and merged in later • always have big problems merging branched file into main file • merged directly into live ontology after session • fast, but people get angry
is_a complete • GO contains both is_a and part_of relations • Typically, graphs a mixture of incomplete is_a and part_of hierarchies • A result of ‘organic’ evolution of GO • All graphs now have complete is_a paths to root
partial disjointness • Biological process terms organised by granularity: • cellular process • multicellular organism process • multi-organism process • To avoid massive increase in number of paths to root, these terms are disjoint • no is_a children in common
sensu • sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings • e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)
sensu • Current project to remove the sensu term strings • Replace with strings that represent the true differentiae • e.g. • cell wall (sensu Bacteria) -> peptidoglycan-based cell wall • cell wall (sensu Fungi) -> chitin- and beta-glucan-containing cell wall
Systematic changes to GO • Advantages • Fast • Efficient • Small number of people required
Systematic changes to GO • Disadvantages • Difficult to obtain wider consensus • Changes sometimes have to be undone
Useful tools for ontology development • WebEx • desktop sharing, can control each others desktops • wiki • mainly internal • Skype • free international calls! • conference calls • not free
Tracking changes to GO • General tracking • files stored in cvs, all differences trackable (in theory) • far from ideal - frequent discussion is should we history track, date-stamp terms?
Tracking changes to GO • Obsolete terms • formerly stored within the ontology • in OBO format made a special kind of deprecated term (tag is_obsolete) • Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms
Tracking changes to GO • Crediting experts • traditionally no mechanism for doing this • creating abstracts for content meetings, adding tag to term • as yet no mechanism for crediting individuals
Useful tools for ontology development • OBO-Edit • ontology editor originally developed for GO • can be used for any OBO format ontology • developed by group of users
Useful tools for ontology development • Reasoner integrated into OBO-Edit • based on OBOL • detects missing links, redundant links, • soon misplaced terms, automatic term creation • Validation system • typographical errors, is_a orphans, duplicate synonyms etc.
Lessons learned • An ontology doesn’t have to be perfect or complete to be used • For domain ontologies, external experts should be involved • Communication is critical • You will never please everyone