370 likes | 382 Views
This workshop provides an overview of GO-based data analysis tools and materials available online at the AgBase database. Topics covered include GOanna, GOSlimViewer, Gene Ontology enrichment analysis, annotation clustering, and comparison of enrichment analysis tools. Supported by USDA CSREES grant.
E N D
GO based data analysis Iowa State Workshop 11 June 2009
All tools and materials from this workshop are available online at the AgBase database Educational Resources link. • For continuing support and assistance please contact: agbase@cse.msstate.edu This workshop is supported by USDA CSREES grant number MISV-329140.
GOanna GOSlimViewer AgBase protein annotation process Protein identifiers or Fasta format GORetriever Annotated Proteins Proteins with no annotations
Hypothesis generating • Gene Ontology enrichment analysis GO terms that are statistically (Fisher’s exact test) over or underrepresented in a set of genes • Annotation Clustering groupsimilar annotations based on the hypothesis that they should have similar gene members
Some resources • DAVID: http://david.abcc.ncifcrf.gov/ • GOStat: http://gostat.wehi.edu.au/ • EasyGO: http://bioinformatics.cau.edu.cn/easygo/ • AmiGO http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment(does not use IEA) • Onto-Express & OE2GOhttp://vortex.cs.wayne.edu/projects.htm • GOEAST http://omicslab.genetics.ac.cn/GOEAST • http://www.geneontology.org/GO.tools.shtml • Comparison of enrichment analysis tools : Nucleic Acids Research, 2009, Vol. 37, No. 1 1–13 (Tool_Comparison_09.pdf) DAVID and EasyGO analysis included DAVID&EasyGo.ppt
Database for Annotation, Visualization and Integrated Discovery
http://vortex.cs.wayne.edu/ontoexpress Onto-Express analysis instructions are Available in onto-express.ppt
Comparison • Onto-Express , EasyGO, GOstat and DAVID • Test set: 60 randomly selected chicken genes • Used AgBase GO annotations as baseline annotations Vandenberg et al (BMC Bioinformatics, in review)
Networks & Pathways Iowa State Workshop 11 June 2009
Multiple data analysis platforms Proteomics LIST Transcriptomics ESTs
Our original aim….…understand biological phenomena…. • Bits and pieces of information • Do not have the full picture • How do we get back to BIOLOGY in this digital information landscape?
Francis Crick, 1958 What do we know about biological systems …. • biological systems are dynamic, not static • how molecules interact is key to understanding complex systems
Types of interactions • protein (enzyme) – metabolite (ligand) • metabolic pathways • protein – protein • cell signaling pathways, protein complexes • protein – gene • genetic networks
STRING Database Sod1 Mus musculus http://string.embl.de/
Database/URL/FTP DIP http://dip.doe-mbi.ucla.edu BIND http://bind.ca MPact/MIPS http://mips.gsf.de/services/ppi STRING http://string.embl.de MINT http://mint.bio.uniroma2.it/mint IntAct http://www.ebi.ac.uk/intact BioGRID http://www.thebiogrid.org HPRD http://www.hprd.org ProtCom http://www.ces.clemson.edu/compbio/ProtCom 3did, Interprets http://gatealoy.pcb.ub.es/3did/ Pibase, Modbase http://alto.compbio.ucsf.edu/pibase CBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm SCOPPI http://www.scoppi.org/ iPfam http://www.sanger.ac.uk/Software/Pfam/iPfam InterDom http://interdom.lit.org.sg DIMA http://mips.gsf.de/genre/proj/dima/index.html Prolinks http://prolinks.doe-mbi.ucla.edu/cgibin/functionator/pronav/ Predictome http://predictome.bu.edu/ PLoS Computational Biology March 2007, Volume 3 e42
Pathways & Networks • A network is a collection of interactions • Pathways are a subset of networks Network of interacting proteins that carry out biological functions such as metabolism and signal transduction • All pathways are networks of interactions • NOT ALL NETWORKS ARE PATHWAYS
Biological Networks • Networks often represented as graphs • Nodes represent proteins or genes that code for proteins • Edges represent the functional links between nodes (ex regulation) • Small changes in graph’s topology/architecture can result in the emergence of novel properties
Yeast Protein-Protein Interaction Map Nature411, 2001, H. Jeong, et al
Some resources KEGG http://www.genome.jp/kegg/pathway.html/ BioCyc http://www.biocyc.org/ Reactome http://www.reactome.org/ GenMAPP http://www.genmapp.org/ BioCarta http://www.biocarta.com/ Pathguide – the pathway resource list http://www.pathguide.org/
Pathguide Statistics Gallus gallus is missing
Systems Biology Workflow Nanduri & McCarthy CAB reviews, 2008
Systems Biology Workflow For a given species of interest what type of data is available???
Retrieval of interaction datasets • Evaluate PPI resources such as Predictome Prolinks for existence of species of interest • If unavailable, find orthologous proteins in related species that have interactions!
I have interactions what next? • Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?
I have interactions what next? • Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods? STRING Database
PPI Identification Computational Experimental Phylogenetic profile Yeast two hybrid Yeast two hybrid (Y2H) Gene Cluster TAP assays TAP assays Sequence coevolution Gene Coexpression Rosetta stone method Protein arrays Text mining PLoS Computational Biology March 2007, Volume 3 e42
PPI database comparisons Proteins: Structure, Function and Bioinformatics 63:490-500 2006
I have interactions what next? • Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods? • Visualize these interactions as a network and analyze… what are the available tools?