1.01k likes | 1.3k Views
Introduction to the Gene Ontology and GO Annotation Resources. EBI Bioinformatics Roadshow 15 March 2011 Düsseldorf, Germany Rebecca Foulger.
E N D
Introduction to the Gene Ontology and GO Annotation Resources EBI Bioinformatics Roadshow15 March 2011Düsseldorf, Germany Rebecca Foulger
OUTLINE OF TUTORIAL:PART I: Ontologies and the Gene Ontology (GO)PART II: GO AnnotationsHow to access GO annotations How scientists use GO annotations
Q: What is a cell? A: It really depends who you ask!
The same thing can be described by different names: Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis
Inconsistency in naming of biological concepts • Comparison is difficult – in particular across species or across databases • Just one reason why the Gene Ontology (GO) is is needed… Same name for different concepts Different names for the same concept
Why do we need GO? • Inconsistency in naming of biological concepts • Increasing amounts of biological data available • Large datasets need to be interpreted quickly • Increasing amounts of biological data to come
Increasing amounts of biological data available Search on mesoderm development…. you get 9441 results! Expansion of sequence information
1700s 1606 What is an ontology? Dictionary: • A branch of metaphysics concerned with the nature and relations of being (philosophy) • A formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts (computer science) Barry Smith: • The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.
What is an ontology? is part of More usefully: • An ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.
What is an ontology? An ontology is more than just a list of terms (a controlled vocabulary) • A vocabulary of terms • Definitions for those terms • *** Defined logical relationships between the terms ***
What is the Gene Ontology (GO)? A way to capture biological knowledge in a written and computable form Describes attributes of gene products (RNA and protein)
The scope of GO • What information might we want to capture about a gene product? • What does the gene product do? • Where does it act? • How does it act?
Biological Processwhat does a gene product do? A commonly recognised series of events transcription cell division
Cellular Componentwhere is a gene product located? • plasma membrane • mitochondrion • mitochondrial membrane • mitochondrial matrix • mitochondrial lumen • ribosome • large ribosomal subunit • small ribosomal subunit
Molecular Functionhow does a gene product act? • glucose-6-phosphate isomerase activity insulin binding insulin receptor activity
Three separate ontologies or one large one? GO was originally three completely independent hierarchies, with no relationships between them As of 2009, GO have started making relationships between biological process and molecular function in the live ontology
Process Function Function
GO IS: • species independent • covers normal processes • GO is NOT: • NO pathological/disease processes • NO experimental conditions • NO evolutionary relationships • NO gene products • NOT a nomenclature system
Aims of the GO project Compile the ontologies Annotate gene products using ontology terms Provide a public resource of data and tools
Anatomy of a GO term Unique identifier Term name Definition Synonyms Cross-references
node edge node node Ontology structure • GO is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by relationships, which add to the meaning of the term • Nodes = terms in the ontology • Edges = relationships between the concepts
Relationships between GO terms • is_a • part_of • regulates • positively regulates • negatively regulates • has_part
is_a • If Ais aB, then Ais a subtype of B • mitotic cell cycle is acell cycle • lyase activity is acatalytic activity. • Transitive relationship: can infer up the graph
part_of • Necessarily part of • WhereverB exists, it is as part ofA. But not all B is part of A. • Transitive relationship (can infer up the graph) A B
regulates • One process directly affects another process or quality • Necessarily regulates: if both A and B are present, B always regulatesA, but A may not always be regulated byB A B
has_part • Relationships are upside down compared to is_a and part_of • Necessarily has part
is_a complete For all terms in the ontology, you have to be able to reach the root through a complete path of is_a relationships: • we call this being is_a complete • important for reasoning over the ontology, and ontology development
True path rule • Child terms inherit the meaning of all their parent terms.
How is GO maintained? GO editors and annotators work with experts to remodel specific areas of the ontology • Signaling • Kidney development • Transcription • Pathogenesis • Cell cycle Deal with requests from the community • database curators, researchers, software developers • Some simple requests can be dealt with automatically GO Consortium meetings for large changes Mailing lists, conference calls, content workshops
Requesting changes to the ontology Public Source Forge (SF) tracker for term related issues https://sourceforge.net/projects/geneontology/
Why modify the GO? GO reflects current knowledge of biology Information from new organisms can make existing terms and arrangements incorrect Not everything perfect from the outset • Improving definitions • Adding in synonyms and extra relationships
Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explain why the term is now obsolete • Alternative GO terms are suggested to replace an obsoleted term
Searching for GO terms http://www.ebi.ac.uk/QuickGO/ http://amigo.geneontology.org … there are more browsers available on the GO Tools page: http://www.geneontology.org/GO.tools.browsers.shtml The latest OBO Gene Ontology file can be downloaded from: http://www.geneontology.org/ontology/gene_ontology.obo
Exercise • Browsing the Gene Ontology using QuickGO • Exercise 1
E. Coli hub http://www.geneontology.org Reactome
A GO annotation is… A statement that a gene product: 1. has a particular molecular function Or is involved in a particular biological process Or is located within a certain cellular component 2. as determined by a particular method 3. as described in a particular reference
http://www.geneontology.org/GO.evidence.shtml Evidence codes IDA: enzyme assay IPI: e.g. Y2H BLASTs, orthology comparison, HMMs subcategories of ISS review papers
Gene Ontology Annotation (GOA) The GOA database at the EBI is: The largest open-source contributor of annotations to GO Member of the GO Consortium since 2001 Provides annotation for 321,998 species (February 2011 release) GOA’s priority is to annotate the human proteome GOA is responsible for human, chicken and bovine annotations in the GO consortium
GOA makes annotations using two methods • Electronic • Quick way of producing large numbers of annotations • Annotations are less detailed • Manual • Time-consuming process producing lower numbers of annotations • Annotations are very detailed and accurate
Electronic annotation by GOA Ensembl compara Macaque Chimpanzee Guinea Pig Rat Mouse Dog Chicken Cow 1. Mapping of external concepts to GO terms • InterPro2GO (protein domains) • SPKW2GO (UniProt/Swiss-Prot keywords) • HAMAP2GO (Microbial protein annotation) • EC2GO (Enzyme Commission numbers) • SPSL2GO (Swiss-Prot subcellular locations) 2. Automatic transfer of annotations to orthologs
Mappings of concepts from UniProtKB files Aspartate transaminase activity ; GO:0004069 lipid transport; GO:006869
Automatic transfer of annotations to orthologs Human Mouse Rat Zebrafish Xenopus Ensembl COMPARA • Homologies between different species calculated • GO terms projected from MANUAL annotation only (IDA, IEP, IGI, IMP, IPI) • One-to-one orthologies used. Currently provides 479,961 GO annotations for 60,515 proteins from 49 species (February 2011 release) Macaque Chimpanzee Zebrafish Xenopus Human Human Guinea Pig Rat Mouse Tetraodon Rat Mouse Cow Dog Chicken Fugu
Manual annotation by GOA High-quality, specific annotations using: • Peer-reviewed papers • A range of evidence codes to categorize the types of evidence found in a paper www.ebi.ac.uk/GOA