1 / 118

Making GO Annotations For Fungal Genomes

Making GO Annotations For Fungal Genomes. A brief overview. Outline of Topics. Intro Overview of Overall Annotation Pipeline Introduction to the Gene Ontologies (GO) Making GO Annotations Submitting GO Annotations GO Tool - AmiGO.

williey
Download Presentation

Making GO Annotations For Fungal Genomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making GO Annotations For Fungal Genomes A brief overview

  2. Outline of Topics • Intro • Overview of Overall Annotation Pipeline • Introduction to the Gene Ontologies (GO) • Making GO Annotations • Submitting GO Annotations • GO Tool - AmiGO

  3. Intro & Overview of Overall Sequence and Annotation Pipeline Karen Christie Saccharomyces Genome Databases Stanford University

  4. Total Nucleotides at GenBank/EMBL/DDBJ including Whole Genome Shotgun Dec 2006 1.52E+11 NCBI created by Congress WGS section started EBI created at Hinxton Homo sapiens Mus musculus Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae Haemophilus influenzae Growth in 2006 Percent of Total 3.50 x 1010 nucs 23.2%

  5. Fungal Genomes being sequenced at Broad Institute

  6. Fungal Genomes being sequenced by JGI

  7. Published Literature PubMed: over 15 million citations Basic search:secondary metabolism → 109580 Limit search: secondary metabolism (published in the last 1 year) → 5440 Boolean operators:secondary metabolism AND Aspergillus → 479 Numbers as of 3/21/2007

  8. Gene Ontology Objectives • GO represents categories used to classify specific parts of our biological knowledge: • Biological Process • Molecular Function • Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species

  9. My genome is sequenced! ATGTCTTTTTTAAGTGCATCGATGTCCTGGGGGCTTAGTATAATGCTCCCCGAGCTTCCTAG CGCTTAGTGCATTAGACTAGGGCCAAAATGACTACTGTTCTTAAAGTACTAGTACTTACTAC GCCCTGTTTCTTTCTTCTTCTAAAAGACTAACTAAGTGCTAGTCTAGATCTACTATTACTAC CCTACCTACTATACTAGACTAATTACCAACCCCTAGGGTACTAAATTTGCCTAGTTTACGTA GCGTTCTTAAAACGTACTAGATTACCGTACTAGGGACGTACTAAGGTACTAG… What do I do now?

  10. Overview of Sequencing/Annotation Pipeline • Sequence of genes/genome • Primary Annotation - the location and structure of genes • Secondary Annotation - the functions of the genes ATGCTTCCTGATTTTGCCCTGGACTTCGCTTGTATAAATTCATTGCACC… GO process: terrequinone A biosynthesis GO function: methyltransferase activity alcohol dehydrogenase Enzyme Commission: 2.1.1.-

  11. Who will be annotating? • Just you? • A single group? • A consortium of groups? The number of people and groups participating and the funding will affect some decisions on whether to set up a database or use flatfiles.

  12. Make automated or manual gene calls TIGR’s Eukaryotic Annotation course very useful Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to GenBank/DDBJ/EMBL? Resources to make functional annotations? no yes no yes Submit gene/protein calls to GenBank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. UniProtKB contains translations of all coding regions in GenBank/DDBJ/EMBL Decide who will collate all GO annotations into one file Set up pipeline for any automated annotations not being done by GOA Manual GO annotations from literature, or from sequence similarity methods GOA will make GO annotations (IEA) using automated methods GOA will collect all GO annotations and submit them to GOC You (or your group) collects all GO annotations and submits them to GOC GOA will maintain annotation file You (or your group) maintains annotation file

  13. Automated Eukaryotic Gene Annotation Genome Sequence EST Database Repeat masker Repeat masked sequence Develop a training set Database comparisons Gene finders AAT_aa AAT_na tRNA Scan GMAP Sim4 etc. Twinscan GeneZilla glimmerHMM Augustus Fgenesh etc. Gene predictions Genome alignments Combined consensus prediction EST based refinement (adjust exons, UTRs, alternative splicing) Automated Gene Annotation Based on TIGR course

  14. Manual Gene Annotation? 1st Question - Is it in the budget? Manual annotation can be a lot better than automated, but is a lot more expensive and time consuming! Based on TIGR Eukaryotic Annotation course

  15. Manual Gene Annotation Tools • Viewer only • Gbrowse • Editors • Apollo (requires a database) • Manatee (requires a database) • Artemis (runs on flat files) Based on TIGR Eukaryotic Annotation course

  16. Eukaryotic Gene Annotation At the end of the procedure, you’ll have: • Gene calls • Protein predictions • Unique IDs for your genes This last is important. Gene IDs are unambiguous. Gene names are frequently ambiguous. You’ll also need IDs in order to submit GO annotations. Example: Gene Name: SP1 Gene ID: NM_138473 19242 hits in Entrez nucleotide 1 hit

  17. Ready to make Functional Annotations! • Questions • What’s your budget? • How much literature is available? • Automated annotations • Faster, cheaper • Often less specific • Manual annotations • Time consuming & more expensive • Precise and accurate

  18. Make automated or manual gene calls TIGR’s Eukaryotic Annotation course very useful Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to GenBank/DDBJ/EMBL? Resources to make functional annotations? no yes no yes Submit gene/protein calls to GenBank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. UniProtKB contains translations of all coding regions in GenBank/DDBJ/EMBL Decide who will collate all GO annotations into one file Decide who will collate all GO annotations into one file Set up pipeline for any automated annotations not being done by GOA Manual GO annotations from literature, or from sequence similarity methods GOA will make GO annotations (IEA) using automated methods

  19. Introduction to GO Rama Balakrishnan Saccharomyces Genome Database Stanford University, CA

  20. The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!

  21. http://www.geneontology.org/

  22. What’s in a name? • What is a cell?

  23. Cell

  24. Cell

  25. Cell

  26. Cell

  27. Cell Image from http://microscopy.fsu.edu

  28. What’s in a name? • The same name can be used to describe different concepts

  29. What’s in a name?

  30. What’s in a name? • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose from simpler components

  31. What’s in a name? • The same name can be used to describe different concepts • A concept can be described using different names  Comparison is difficult – in particular across species or across databases

  32. What’s in a name? • Rad54 (S. cerevisiae) • Okra (D. melanogaster) • Rhp54 (S. pombe) What do these genes products have in common? ATP dependent helicase involved in DNA recombination, repair

  33. What is the Gene Ontology? A (part of the) solution: • A controlled vocabulary that can be applied to all organisms • Used to describe gene products - proteins and RNA - in any organism

  34. What is Ontology? • Dictionary: A branch of metaphysics concerned with the nature and relations of being. • Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. 1606 1700s

  35. So what does that mean? From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things. is part of

  36. Ontology Includes: • A vocabulary of terms (names for concepts) • Definitions • Defined logical relationships to each other

  37. How does GO work? • What does the gene product do? • Molecular Function • Why does it perform these activities? • Process • Where does it act? • Location in the cell, cellular component What information might we want to capture about a gene product?

  38. The 3 Gene Ontologies • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective • broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

  39. Cellular Componentwhere a gene product acts

  40. Molecular Functionactivities or “jobs” of a gene product insulin binding insulin receptor activity glucose-6-phosphate isomerase activity drug transporter activity

  41. Molecular Function • A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product. • Sets of functions make up a biological process.

  42. cell division Biological Process transcription limb development Courtship behavior

  43. Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’sjuggling object Entertainment

  44. What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. Synonym: glucose biosynthesis

  45. No GO Areas • GO covers ‘normal’ functions and processes • No pathological processes • No experimental conditions • NO evolutionary relationships • NO gene products • NOT a system of nomenclature for genes

  46. Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by two relationships • is-a • part-of

  47. Parent-Child Relationships Chromosome Cytoplasmic chromosome Mitochondrial chromosome Nuclear chromosome Plastid chromosome A child is a subset or instances of a parent’s elements

  48. Parent-Child Relationships One-to-many parental relationship Many-to-many parental relationship DAG: Directed Acyclic Graph Each child has only one parent Each child may have one or more parents

  49. A Sample DAG cellular_component cell part is_a part_of Intracellular organelle chromosome nucleus [other organelles] [Other types of chromosomes] mitochondrial chromosome nuclear chromosome

  50. True Path Rule • The path from a child term all the way up to its top-level parent(s) must always be true cell • cytoplasm • chromosome • nuclear chromosome • cytoplasmic chromosome • mitochondrial chromosome • nucleus • nuclear chromosome • is-a • part-of

More Related