120 likes | 268 Views
The Gene Ontology : a real-life ontology, progress and future. Jane Lomax EMBL-EBI. What is the Gene Ontology?. Controlled vocabulary - GO Terms and relationships Bottom-up approach Annotation of proteins to terms Gene association files Software/database development Freely available.
E N D
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI
What is the Gene Ontology? • Controlled vocabulary - GO • Terms and relationships • Bottom-up approach • Annotation of proteins to terms • Gene association files • Software/database development • Freely available
The vocabulary • GO is divided into three sub-vocabularies: • biological process • broad series of events, can either be at the level of the cell or organism e.g. circulation, glycolysis • molecular function • direct activities e.g. catalysis, binding • cellular component • site of action e.g. nucleus, ribosome
The vocabulary • Hierarchical • Directed Acyclic Graph • terms have one or more parents • is-a and part-of relations
How is GO maintained? • Several full-time editors • Requests from community • database curators, researchers, software developers • SourceForge tracker • GO Consortium meetings for large changes • Mailing lists
OBO - Open Biological Ontologies • GO is a member vocabulary of OBO • A repository for biological structured vocabularies • Freely available without license • Common syntax • Orthogonal to existing ontologies http://obo.sourceforge.net/
Future developments • File format • Current GO flat file format • partly redundant • difficult to parse • New format • Extensible e.g. new relationship types can be specified • minimal redundancy, but human readable • easier to parse • Moving to a database being the primary form of GO
Formalizing GO • Informality is a common criticism of GO • developed by biologists, for biologists • Now beginning work ‘decomposing’ GO using ProLog • Terms broken down into constituent parts e.g. regulation of heartdevelopment • New terms could be created from orthogonal ontologies e.g. anotomical • Work translating GO in DL, reasoning across the ontologies
GO into UMLS • GO now released as part of the NLM’s Unified Medical Language System Metathesaurus • Links biomedical vocabularies including MeSH and SNOMED. • The process of including GO in UMLS highlighted problems in both systems
GO synonyms • Text strings associated with GO terms • Often do not have identical meaning to term • Reduces utility in e.g. semantic matching • Developed relationships between terms and synonyms • soon to be fully implemented in GO
www.geneontology.org • FlyBase & Berkeley Drosophila Genome Project • Saccharomyces Genome Database • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research • Compugen, Inc • The Arabidopsis Information Resource • WormBase • DictyBase • Mouse Genome Informatics • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit (Sanger Institute) The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].