280 likes | 469 Views
Modeling Functional Genomics Datasets CVM8890-101. Lesson 3 13 June 2007 Fiona McCarthy. Lesson 3: Tools for functional annotation. Accessing functional data; computational strategies to obtain more complete functional annotation; the AgBase GO annotation pipeline. Lesson 3 Outline.
E N D
Modeling Functional Genomics DatasetsCVM8890-101 Lesson 3 13 June 2007 Fiona McCarthy
Lesson 3: Tools for functional annotation. Accessing functional data; computational strategies to obtain more complete functional annotation; the AgBase GO annotation pipeline.
Lesson 3 Outline • Review: Functional Annotation • Tools for functional annotation • Accessing functional data • Computational strategies to obtain more functional data • Example: The AgBase GO annotation pipeline • Other GO annotation tools
Review: Functional Annotation • biologists refer to both the annotation of the genome and functional annotation of gene products: “structural” AND “functional” annotation • Functional annotation is required to make biological sense of high throughput datasets eg. genomics, arrays, proteomics • COGs, KOGs, GO
Tools for Functional Annotation • Need to be able to access functional annotation for your dataset • Breadth and depth • Date updated • No annotation vs function unknown • Need to be able to add more annotation • Need to be able to use the annotations to model your data • Depth or detail • Compatibility with other programs (eg pathway analysis) • Comparative data?
Tools for Functional Annotation • Clusters of Orthologous Groups (COGs) • euKaryotic Orthologous Groups (KOGs) • UniProt Knowledgebase (UniProtKB) • Bioinformatic Harvester • FANTOM • Puma • Gene Ontology (GO)
COGs & KOGs • Accessible at http://www.ncbi.nlm.nih.gov/COG/ • ftp download • Available for many prokaryotes and 7 eukaryotes • Add more annotation using the KOGinator? • Modeling: • Has breadth but not always depth • Good for prokaryote comparative analysis?
COGs & KOGs http://www.ncbi.nlm.nih.gov/COG/ Automated tools for large numbers of comparisons??
UniProtKB • Accessible at http://www.pir.uniprot.org/ • ftp download & sophisticated search & download capabilities • Available for > 132,000 species • Annotation across both literature (for selected species) and biological databases • Modeling: • Has breadth but not always depth; many proteins not represented in UniProtKB • Those that are represented have a detailed summary of function from a range of sources • Rapid help and feedback from the database help
UniProtKB http://www.pir.uniprot.org/
UniProtKB http://www.pir.uniprot.org/
UniProtKB http://www.pir.uniprot.org/
Bioinformatic Harvester • Accessible at http://harvester.fzk.de/harvester/ • no download • Available for 6 model species • Integrates data from multiple sources • Modeling: • Has breadth and depth; not useful for large datasets • Updates?
Bioinformatic Harvester http://harvester.fzk.de/harvester/
FANTOM http://www.gsc.riken.go.jp/e/FANTOM/ Mouse only
PUMA http://compbio.mcs.anl.gov/puma2/
Gene Ontology • Accessible at http://www.geneontology.org/ • updated downloads for 34 species + downloads for UniProtKB species (>130,000) • UniProtKB species annotation: some depth, less breadth • GO data mapped from other databases • Modeling: • Many tools available for modeling using the GO • Can use computational or manual curation to add annotations
Gene Ontology http://www.geneontology.org/
EBI-GOA Project http://www.ebi.ac.uk/GOA/
The AgBase GO Annotation Pipeline • Accessible at http://www.agbase.msstate.edu/ • Access available annotations for agriculturally important species • Provide your own GO annotations • Model GO for your dataset
Coming soon; GOModeler quantitative hypothesis driven modeling using GO
Other GO Annotation Tools http://www.geneontology.org/GO.tools.shtml
Other GO Annotation Tools • Evaluate: • Can I run it from my computer? • Does it include my species of interest? • When was it last updated? • Does it display evidence codes? • Does it display IEA annotations? • What are the inputs it accepts? • Does it do batch searches?
Using GO to Analyze Array Data • Evaluate: • Does it include my species of interest? • When were the annotations last updated? • Can I add my own annotations? • Does it tell me how many of my genes are used for the analysis? • Does it account for “not” annotations? • Does it display IEA annotations? • What are the input IDS it accepts? • Does it analyze both over & under-represented terms? • What statistics does it use for the analysis? • Does it do a graphical representation? • ANY tool will only be as good as the annotations.