1 / 51

Introduction to Gene Ontology annotation resources

Introduction to Gene Ontology annotation resources. Rachael Huntley UniProt -GOA. A Practical Overview of Biomedical Ontologies 17-18 April 2013. Talk Overview. Intro to GO and GO terms. Exercise . Annotating to GO. Accessing GO annotations. Exercise. Practical use of GO.

ziya
Download Presentation

Introduction to Gene Ontology annotation resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Gene Ontology annotation resources Rachael Huntley UniProt-GOA A Practical Overview of Biomedical Ontologies 17-18 April 2013

  2. Talk Overview • Intro to GO and GO terms • Exercise • Annotating to GO • Accessing GO annotations • Exercise • Practical use of GO • Exercise • Precautions

  3. What is GO?

  4. The Gene Ontology Less specific concepts • A way to capture biological knowledge for individual gene products in a written and computable form • A set of concepts • and their relationships • to each other arranged • as a hierarchy More specific concepts www.ebi.ac.uk/QuickGO

  5. The Concepts in GO 1. Molecular Function • protein kinase activity • insulin receptor activity An elemental activity or task or job 2. Biological Process A commonly recognised series of events • cell division • mitochondrion • mitochondrial matrix • mitochondrial inner membrane 3. Cellular Component Where a gene product is located

  6. Anatomy of a GO term Unique identifier Term name Definition Synonyms Cross-references

  7. Ontology structure • Directed acyclic graph Terms can have more than one parent • Terms are linked by • relationships is_a part_of regulates (and +/- regulates) has_part occurs_in www.ebi.ac.uk/QuickGO These relationships allow for complex analysis of large datasets

  8. Searching for GO terms Search GO terms or proteins http://www.ebi.ac.uk/QuickGO

  9. Exercise Search for a GO term Exercise 1 (pg.15)

  10. Why do we need GO? 10

  11. Reasons for the Gene Ontology • Inconsistency in English language www.geneontology.org

  12. Inconsistency in English languauge • Same name for different concepts Cell or ??

  13. Different names for the same concept Eggplant Brinjal Aubergine Melongene Same for biological concepts • Comparison is difficult – in particular across species or across databases

  14. Reasons for the Gene Ontology • Inconsistency in English language • Increasing amounts of biological data available • Increasing amounts of biological data to come www.geneontology.org

  15. Increasing amounts of biological data available Search on ‘DNA repair’... get over 70,000 results Expansion of sequence information

  16. Reasons for the Gene Ontology • Inconsistency in English language • Increasing amounts of biological data available • Increasing amounts of biological data to come • Large datasets need to be interpreted quickly www.geneontology.org

  17. Aims of the GO project • Compile the ontologies - currently over 38,000 terms - constantly increasing and improving • Annotate gene products using ontology terms - around 30 groups provide annotations • Provide a public resource of data and tools - regular releases of annotations - tools for browsing/querying annotations and editing the ontology

  18. Reactome

  19. GO Annotation

  20. UniProt-Gene Ontology Annotation (UniProt-GOA) project at the EBI • Largest open-source contributor of annotations to GO • Provides annotation for more than 390,000 species • Our priority is to annotate the human proteome

  21. Accession Name GO ID GO term name Reference Evidence code P00505 GOT2 GO:0004069 aspartate transaminase activity PMID:2731362 IDA A GO annotation is … …a statement that a gene product; 1. has a particular molecular function oris involved in a particular biological process oris located within a certain cellular component 2. as determined by a particular method 3. as described in a particular reference

  22. UniProt-GOA incorporates annotations made using two methods Electronic Annotation • Quick way of producing large numbers of annotations • Annotations use less-specific GO terms • Only source of annotation for many non-model organism species Manual Annotation • Time-consuming process producing lower numbers • of annotations • Annotations tend to use very specific GO terms

  23. Electronic annotation methods • Mapping of external concepts to GO terms GO:0005634: Nucleus GO:0009734: Auxin mediated signaling pathway GO:0004707: MAP kinase activity

  24. Electronic annotation methods 2. Automatic transfer of manual annotations to orthologs Macaque Chimpanzee Ensembl compara Guinea Pig Rat ...and more e.g. Human Mouse Dog Chicken Cow Arabidopsis Ensembl compara …and more Brachypodium Poplar Maize Grape Rice Annotations are high-quality and have an explanation of the method (GO_REF) http://www.geneontology.org/cgi-bin/references.cgi

  25. Manual annotation by UniProt-GOA • High–quality, specific annotations made using: • Full text peer-reviewed papers • A range of evidence codes to categorise the types of evidence found in a paper • e.g. IDA, IMP, IPI http://www.ebi.ac.uk/GOA

  26. Number of annotations in UniProt-GOA database Electronic annotations 139,757,414 1,259,994 Manual annotations* April 2013 Statistics * Includes manual annotations integrated from external model organism and specialist groups

  27. How to access and use GO annotation data

  28. Where can you find annotations? UniProtKB Ensembl Entrez gene

  29. UniProt vs. QuickGO annotation display UniProt QuickGO

  30. Gene Association Files 17 column files containing all information for each annotation GO Consortium website UniProt-GOA website Numerous species-specific files http://www.ebi.ac.uk/GOA/downloads.html

  31. GO browsers

  32. The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations http://www.ebi.ac.uk/QuickGO

  33. Exercise Find annotations to a protein Exercise 2 (pg.15) Find annotations to a list of proteins Exercise 1 and 2 (pg.20) Find protein list at: ftp://ftp.ebi.ac.uk/pub/contrib/goa/Tutorial_Data

  34. How scientists use the GO • Access gene product functional information • Analyse high-throughput genomic or proteomic datasets • Validation of experimental techniques • Get a broad overview of a proteome • Obtain functional informationfor novel gene products Some examples…

  35. Term enrichment • Most popular type of GO analysis • Determines which GO terms are more often associated with a specified list of genes/proteins compared with a control list or rest of genome • Many tools available to do this analysis • User must decide which is best for their analysis

  36. time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle Hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabolism Immune response Immune response Toll regulated genes control attacked Analysis of high-throughput genomic datasets MicroArray data analysis Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

  37. Biological Process GO enrichment of 88 human peroxisome proteins before focused annotation… Mutowo-Meullenet, Huntley, et al. DATABASE 2013

  38. …and after focused annotation • More terms • Greater specificity • New processes

  39. Analysis using GO annotations GO Galaxy http://galaxy.berkeleybop.org/ go-helpdesk@ebi.ac.uk

  40. Analysis using GO annotations Many more listed at: http://neurolex.org/wiki/Category:Resource:Gene_Ontology_Tools

  41. Annotating novel sequences • Can use BLAST queries to find similar sequences with • GO annotation which can be transferred to the new sequence • Two tools currently available; • AmiGO BLAST – searches the GO Consortium database • BLAST2GO – searches the NCBI database

  42. Annotating novel sequences • Can use InterProScan • to find GO annotation that is • attributed to protein signatures • in a submitted protein sequence

  43. Using the GO to provide a functional overview for a large dataset • Many GO analysis tools use GO slims to give a broad • overview of the dataset • GO slims are cut-down versions of the GO and • contain a subset of the terms in the whole GO • GO slims usually contain less-specialised GO terms

  44. Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:

  45. Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:

  46. GO slims Custom slims are available for download; http://www.geneontology.org/GO.slims.shtml or you can make your own using; • QuickGO http://www.ebi.ac.uk/QuickGO • AmiGO's GO slimmer http://amigo.geneontology.org/cgi-bin/amigo/slimmer

  47. The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations Map-up annotations with GO slims www.ebi.ac.uk/QuickGO

  48. Exercise Using GO slims in QuickGO Exercise 1 (pg.27) Find protein list at: ftp://ftp.ebi.ac.uk/pub/contrib/goa/Tutorial_Data

  49. Precautions when using GO annotations for analysis • The Gene Ontology is always changing and GO annotations are • continually being created - always use a current version of both - if publishing your analyses please report the versions/dates you used http://www.geneontology.org/GO.cite.shtml • Recommended that ‘NOT’ annotations are removed before analysis - only ~7000 out of 141 million annotations are ‘NOT’ - can confuse the analysis

  50. Precautions when using GO annotations for analysis • Unannotated is not unknown - where there is no evidence in the literature for a process, function or location the gene product is annotated to the appropriate ontology’s root node with an ‘ND’ evidence code (no biological data), thereby distinguishing between unannotated and unknown • Pay attention to under-represented GO terms - a strong under-representation of a pathway may mean that normal functioning of that pathway is necessary for the given condition

More Related