350 likes | 360 Views
This article discusses the use of the Blast2GO suite for high-throughput functional annotation and analysis of large amounts of sequence data in functional genomics studies. The suite is versatile, easy to set up and run, and allows for the annotation and function analysis of genes using Gene Ontology-based methods. It also provides tools for validation and visualization of the results.
E N D
High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics Department Centro de Investigaciones Prínicpe Felipe aconesa@cipf.es
Credits Blast2GO Development: Bioinformatics Department CIPF, Valencia Bioinformatics Department CIPF, Valencia Ana Conesa Stefan Goetz Centro de Genómica IVIA, Valencia Javier Terol Manuel Talón Biomedical Informatics UPV, Valencia Juan Miguel Gómez Montserrat Robles Blast2GO special thanks to: ANNEX :Simen Myhre, Henrik Tveit (MTNU) GOSSIP: Nils Blüthgen (MicroDiscovery GmbH) ZVTM: Emmanuel Pietriga (INRIA) goslim.tair.obo: Suparna Mundodi (TAIR)
Motivation Numerous EST/genome projects Large amounts of NEW sequence data Functional Genomics Studies Need of Functional Annotation Which kind of tool? Easy to set up & run Versatil & Universal High-throughput & interactive Combine annotation & function analysis www.blast2go.org
Gene Ontology based annotation GO2EC IP2GO Molecular Function Biological Process Cellular Component more general more specific
Concepts of automatic annotation Similarity between Sequences Consistency of assigned annotation Precision vs. “recall” Resolution Level in GO hierarchy Selection of recovered annotation data Quality of existence annotation B2G Annotation Rule
Blast2GO Annotation Rule Quality of source annotation Possibility of abstraction Lowest term satisfying the requirements Evidence Codes Similarity requirement Recall vs. Precision Annotation Rule
Main functions within Blast2GO BLAST MAPPING ANNOT.RULE GO Second Layer GO-Slim Manual Curation Annotation (GO,IPR,EC) Validation Statistics InterProScan Graph Visualization KEGG maps Additional Features: Enrichment Batch Mode Pipeline costumDB GeneIDs localB2GDB Compare
Blast2GO use Species Citrus, nicotiana, maize, soybean, tomato, grape… Streptococcus, Trichoderma, Schistosoma, Cyanobacteria… European Flounder,pig, flidder crab, rat, honneybee, human… Metagenome projects…
Where to find Blast2GO Web: http://www.blast2go.orghttp://blast2go.bioinfo.cipf.es http://www.geneontology.orghttp://groups.google.com/group/Blast2GO More info:Bioinformatics 2005 21: 3674-3676 Blast2GO tutorial: http://www.blast2go.org
Blast2GO Guided Tour Ana Conesa Bioinformatics Department Centro de Investigaciones Prínicpe Felipe aconesa@cipf.es
Start Blast2GO www.blast2go.org • Desktop application • Java webstart technology • Internet connection
Exercise 1 • Launch Blast2GO • Open FASTA file (unizip examples.zip) • Browse number of sequences and sequence length • Unselect all sequences • Select 5 sequences • Run Blast against NCBI nr (change parameters if desidered)
Exercise 2 • Open blast_example.dat • Examine Distribution charts
Resources of mapping EC Hit ACC/GI Mapping Resources GO-Terms sim % GO mapping resources: • Full Gene Ontology DB • NCBI Flat Files: gene2accession (4 079 414 entries) gene_info (1 635 614 entries) • PIR - Non-Redundant Reference Protein Database:including PSD, UniProt, Swiss-Prot, TrEMBL, RefSeq, GenPept y PDB
Annotation GO DAG Validation Annex GOSlim EC/KEGG InterPro
Exercise 3 • Select 10 first sequences • Run Mapping and Annotation • Select non annotated sequences and re-annotate with milder parameters • Lo annotation_example.dat file • Visualize Results on Mapping/Annotation Charts
Modulation of annotation Change annotation manually Summarize annotation by “GoSlim” GO-Term ACC EC-Codes OBO GO-Slim File Seq. Description Extend annotation by the GO “Second Layer” Molecular Function is involved in acts in Biological Process Cellular Component Myhre et al, Bioinformatics 2006
Exercise 4 • Browse BlastResults to (select one sequence and use sequence menu): Draw annotation graph View Annotations Edit/change annotation • Select a few sequences to run GoSlim • Run Annex
Enzyme annotation and Kegg Maps GO Enzyme Codes KEGG maps
Exercise 5 • Select a few sequences to run InterProScan • Change terms view GO ID/term, IPS/GO • Merge IPS results with Blast Annotations • Load annot_interpro_annex_example.dat • Export GO Annotations • Export IPS Annotations • Save Project and Sequence Table
Visualization Node Score of Annotation Content 2.5 1 2.4 1 3 1 3 GO Graph Visualization as tool to explore data Interactive and “zoomable” graphs Color graphs highlighting areas of interest
Visualization :Pies Level and Multilevel Charts
Exercise 6 • Select some sequences using select by names function (use test.example.txt) • Create a GO Combined Graph • Create Pies at level 4 and Multilevel Pie at score 3 • Play with filters to simplify the graph (set score filter to 3) • Export GO Graph data as table and visualize
Functional analysis with B2G Enrichment Analysis (Fisher)
Exercise 7 • Run Enrichment Analysis using test and reference set files • Create Bar Chart • Create Enriched Graph and modulate number of nodes • Export results