1 / 36

This presentation is designed to show the features of four ‘third-party’

Introduction. This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on http://www.geneontology.org/GO.tools.shtml#micro can be used in proteomics studies to view GO terms associated with a list of proteins

arlo
Download Presentation

This presentation is designed to show the features of four ‘third-party’

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on http://www.geneontology.org/GO.tools.shtml#micro can be used in proteomics studies to view GO terms associated with a list of proteins obtained from high-throughput experiments and their statistical significance compared with a reference set of proteins.* Each presentation was prepared by the developers of the tools, using for the analysis a list of human cardiovascular-related protein accessions (or in the case of Blast2GO, the equivalent bovine protein sequences). *All of these tools have been created outside of the GO Consortium. The articles authors do not intend to recommend any tool, merely demonstrate how GO analysis of proteome sets could be performed using some of these tools. We advise researchers to try several different tools to find one which suits their needs.

  2. Contents Blast2GO Slide 4 FatiGO Slide 13 Onto-Express Slide 20 Ontologizer Slide 27 Accession list I Slide 35 Accession list II Slide 36

  3. Blast2GO in Babelomics http://babelomics.bioinfo.cipf.es Bioinformatics Department Centro de Investigación Príncipe Felipe (CIPF)‏ babelomics@cipf.es Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M. & Robles, M. (2005). Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674-3676 Functional Annotation: First, the BLAST step to obtain the homologue sequences for the query sequences. Second, the actual GO annotation by applying the Blast2GO method which, basically, transfers the most confident and appropriate GO annotations to the novel sequences. Statistical charts help here to understand and interpret the annotation results. Visualization: This step allows the users to get an overall idea of the assigned GO annotations of the sequence dataset making use of GO's graph structure.

  4. Functional Annotation with Blast2GO Annotation is the process of assigning functional categories to gene or gene products. In Blast2GO this assignment is performed for each sequence based on the information available for the homologous sequences retrieved by BLAST. Blast2GO annotation proceeds through a 2 step strategy: 1. All GO terms for the BLAST hit sequences are collected For the first step, BLAST results are parsed and the identifiers of the BLAST hits are found and used to query the Gene Ontology database to recover associated functional terms. Also the evidence code of each particular annotation is recovered. The evidence codes indicate how the functional assignment in the Gene Ontology database has been obtained. 2. GO terms are selected from this original pool to extract the most reliable annotation Once all this information is gathered, an annotation score is computed for each {GO,Query Sequence} pair. Only the most specific GO term within a branch of the GO is assigned to the query sequence, and this assignment is dependent on the 'annotation score', the threshold for which is preset by the user.The annotation score is computed as: Annotation score{GO, Seq} = (max.sim * ECw) + (#GO-1 * GOw)‏ where: • max.sim: is the maximal value of similarity between the query and hit sequences that have the given GO annotation • ECw: is the weight given to the Evidence Code of the original annotation. Blast2GO has defined values for these weights, which can also be modified by the user. In general, ECw = 1 for experimental evidence codes and ECw < 1 for non-experimental evidence codes. • #GO: is the number of annotated children terms • GOw: is the weight given to the contribution of annotated children term to a given term

  5. In this tab you can see the actual status of your job and for big datasets come back later to retrieve the results. The BLAST Step (1/2)‏ Upload your sequence file in FASTA format, choose the appropriate BLAST parameters and database (blastp for protein sequences) and press RUN The homology search is the first and most time consuming step when attempting to transfer functional information from similar sequences to uncharacterized sequence data. This simple tool gives you the option to perform high-throughput BLAST searches against several protein databases, keep processes running until they are finished monitoring its actual status and saving the generated alignments as XML file. These XML-files can than be used as input data for the Blast2GO annotation method.

  6. Save your results as an XML file. The BLAST Step (2/2)‏ Open the results with this link

  7. The Annotation Step Upload and parse your BLAST results in NCBI's XML format applying several filters Annotation rule parameters: • e-Value cut-off as minimal quality criteria • annotation rule cut-off (coverage vs. exactness) GO-Weight (more general vs. more specific terms) • define a minimal alignment length allowed for function transfer Evidence code weights can be set to in/decrease the influence of different kinds of annotation evidence e.g. automatically generated source annotation Start the annotation assignment

  8. The result table to browse and export the generated annotations • A chart showing the e-value distribution of the BLAST results • A chart showing from which source databases the transferred GO terms were originally coming from review browse export The Blast2GO web tool generates a multitude of statistical charts to understand the underlying dataset and to better interpret the generated annotation results

  9. A chart showing the distribution of the different evidence codes throughout the GO terms per BLAST hit • A chart showing the distribution of the different species from which the BLAST hits originate • A chart showing how many GO terms were assigned to how many sequences • A chart showing the distribution of the different evidence codes throughout the GO terms per sequence • A chart showing the most frequent GO terms throughout the dataset • A chart showing the success of the annotations process giving the number of successfully ‘BLASTed’, GO-mapped and annotated sequences • A chart showing the number of sequences annotated at a certain GO level and category • A chart showing the distribution of BLAST sequence similarities

  10. Blast2GO annotations are exported in a tabular format: SeqId<tab>GOterm<tab>SeqDesc Open and save the results in a tabular format for further use in the GO-Graph-Viewer or as download data in Blast2GO project format for direct import into Blast2GO Saving and exporting results Browse the generated annotations in the result table

  11. Save parts of your graphs in high resolution images to better communicate your results Start the interactive graph visualization tool with Java Web Start Define graph filtering parameters for more dense and informative graphs Visualization: The GO-Graph-Viewer The DAG viewer tool generates joined Gene Ontology graphs (DAGs) to create overviews of the functional context of groups of sequences. Interactive graph visualization allows the navigation of large and unwieldy graphs often generated when trying to biologically explore large sets of sequence annotations. Zoom and graph navigation is provided through the DAG viewer Java Web Start tool. Upload your Blast2GO generated annotations

  12. FatiGO Functional enrichment analysis Bioinformatics Department Centro de Investigación Príncipe Felipe (CIPF)‏ http://www.fatigo.org http://www.babelomics.org babelomics@cipf.es Al-Shahrour, F., et al. (2005), Babelomics: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments, Nucleic Acids Research, 33, W460-W464 Al-Shahrour, F., et al. (2004), FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes, Bioinformatics, 20, 578-580

  13. *Several types of identifier are acceptable, such as UniProtKB, Ensembl IDs, HGNC symbols, RefSeq, Entrez Gene etc. Enter your list or file of genes/proteins* Select the database(s) you want to query Click options to filter the database (optional) In this example, list #1 is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35) and list #2 is the “Rest of genome” Select your organism

  14. Filter Tool Use the level of the DAG and the evidence code as filtering criteria Select subsets of annotations based on keywords and on the size of the gene module • Babelomics allows for sub-selection of gene annotations, in which gene modules are based, in order to test hypotheses in a more focused and sensitive manner. • Removing from the analysis modules whose testing is unnecessary and superfluous increases the power of the tests in the multiple-testing adjustment step.

  15. Results of GO analysis Level 3 is less-granular terms. Level 9 is more-granular terms. The number of annotated proteins per GO level is displayed

  16. Low p-value = more significant The proteins from your query set that are annotated to each GO term are listed FatiGO returns a list of GO terms which are over-represented in the list of interest, in this case the BHF-UCL list. For Biological Process terms at level 3 of the ontology, the terms that are over-represented in the BHF-UCL list include muscle contraction, cell cycle and anatomical structure development.

  17. Best p-value FatiGO shows terms deeper in the ontology, at level 6, which are over-represented in the BHF-UCL list (but not necessarily significantly – compare p-values) such as regulation of progression through cell cycle, heart development and cholesterol absorption. These are all processes you would expect cardiovascular-related proteins to be involved in.

  18. GO-Graph-Viewer Tool You can upload your FatiGO results to the interactive graph visualization tool The DAG viewer tool allows visualization of the significant GO terms as a GO graph. The GO term names are displayed together with the annotation score.

  19. Onto-Express Features at a Glance http://vortex.cs.wayne.edu/projects.htm#Onto-Express Purvesh Khatri (purvesh@cs.wayne.edu)‏ Sorin Draghici (sod@cs.wayne.edu)‏ Intelligent Systems and Bioinformatics Lab Department of Computer Science Wayne State University

  20. Input interface Select type of IDs in input file Choose from more than 300 microarrays. Select organism Choose a statistical distribution from: 1. hypergeometric 2. binomial 3. chi-square If an array of choice is not available, use your own reference. Choose a correction for multiple hypotheses from: 1. Bonferroni, 2. FDR, 3. Holm, 4. Sidak • Supported input types are GenBank accession numbers, UniGene cluster IDs, Entrez Gene IDs, gene symbols, Affymetrix probe IDs, any of the IDs used in GO database.

  21. Results – Flat view

  22. Results – tree view • Choose a level to expand the GO tree and click “Expand” button. • Only the GO terms with at least one input gene are displayed in the tree.

  23. Results – chromosome view • Chromosome information is supported for human, mouse and rat. It displays number of genes on each chromosome and their positions. • Clicking on “NCBI Genome view” links out to NCBI Mapviewer.

  24. Results – single gene view • Selecting “show in gene view” in the tree view displays the annotations for the selected gene in the GO hierarchy in the single gene view.

  25. References • Purvesh Khatri, Sorin Draghici, G. Charles Ostermeier, Stephen A. Krawetz. Profiling Gene Expression Using Onto-Express. Genomics, 79(2):266-270, February 2002. • Sorin Draghici, Purvesh Khatri, Rui P. Martins, G. Charles Ostermeier and Stephen A.Krawetz. Global functional profiling of gene expression. Genomics 81(2):98-104, February 2003. • Purvesh Khatri and Sorin Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21(18):3587-95, September 2005. • http://vortex.cs.wayne.edu/projects.htm.

  26. Ontologizer Ontologizer Open Source Team http://compbio.charite.de/ontologizer/ located at Institute for Medical Genetics Charité Universitätsmedizin Berlin Grossman S., Bauer S., Robinson P.N., Vingron M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics. 2007 Nov 15;23(22):3024-31. Robinson P.N., Wollstein A., Böhme U., Beattie B. Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics. 2004 Apr 12;20(6):979-81.

  27. Ontologizer – Setting up a Project There are several predefined entries for various settings… …or you may specify the fields manually. Inputs: • Ontology, defines the GO structure • Annotations, map genes to GO terms

  28. Ontologizer – Editing Sets of Identifiers The induced graph of these terms can be displayed. Annotated identifiers are highlighted on the fly. No annotation for this one Mouse hovering reveals direct annotations.

  29. Ontologizer – Overview Of interest here are two lists of identifiers – study and population.* *In this example the study list is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35) and the population list is a random list of human UniProtKB accessions. Choose analysis method; parent-child takes account of the ontology structure, term-for-term treats each term independently. But multiple projects may reside in the workspace.

  30. Ontologizer – Results A list of terms is displayed. The shading indicates significance – darker shading is more significant. Click on a term to display its position in the ontology, definition and the proteins annotated to it and its parents.

  31. Ontologizer – Graphical View of Results The term highlighted in the table will also be highlighted red in the graph. Yellow = Molecular Function Pink = Cellular Component Green = Biological Process

  32. Ontologizer – What Else? • Can be easily invoked from the Web. • Input files can be located remotely. • Several procedures of multiple testing correction are supported. • Results can be filtered and stored in a tabular as well as in a graphical fashion. • A command line version is available.

  33. Acknowledgments The authors wish to thank the developers of the tools for preparing these presentations as follows; • Blast2GO Stefan Götz • FatiGO Fatima Al-Shahrour • Onto-Express Sorin Draghici and Purvesh Khatri • Ontologizer Sebastian Bauer and Peter Robinson

  34. List of human UniProtKB accessions used in FatiGO, Onto-Express and Ontologizer analyses

  35. List of bovine UniProtKB accessions used in Blast2GO analysis

More Related