1.18k likes | 1.19k Views
Learn about the gene ontology (GO) annotations for fungal genomes, the annotation pipeline, tools like AmiGO, and how to submit GO annotations. Explore the importance of GO terms and the process of making functional annotations.
E N D
Making GO Annotations For Fungal Genomes A brief overview
Outline of Topics • Intro • Overview of Overall Annotation Pipeline • Introduction to the Gene Ontologies (GO) • Making GO Annotations • Submitting GO Annotations • GO Tool - AmiGO
Intro & Overview of Overall Sequence and Annotation Pipeline Karen Christie Saccharomyces Genome Databases Stanford University
Total Nucleotides at GenBank/EMBL/DDBJ including Whole Genome Shotgun Dec 2006 1.52E+11 NCBI created by Congress WGS section started EBI created at Hinxton Homo sapiens Mus musculus Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae Haemophilus influenzae Growth in 2006 Percent of Total 3.50 x 1010 nucs 23.2%
Published Literature PubMed: over 15 million citations Basic search:secondary metabolism → 109580 Limit search: secondary metabolism (published in the last 1 year) → 5440 Boolean operators:secondary metabolism AND Aspergillus → 479 Numbers as of 3/21/2007
Gene Ontology Objectives • GO represents categories used to classify specific parts of our biological knowledge: • Biological Process • Molecular Function • Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species
My genome is sequenced! ATGTCTTTTTTAAGTGCATCGATGTCCTGGGGGCTTAGTATAATGCTCCCCGAGCTTCCTAG CGCTTAGTGCATTAGACTAGGGCCAAAATGACTACTGTTCTTAAAGTACTAGTACTTACTAC GCCCTGTTTCTTTCTTCTTCTAAAAGACTAACTAAGTGCTAGTCTAGATCTACTATTACTAC CCTACCTACTATACTAGACTAATTACCAACCCCTAGGGTACTAAATTTGCCTAGTTTACGTA GCGTTCTTAAAACGTACTAGATTACCGTACTAGGGACGTACTAAGGTACTAG… What do I do now?
Overview of Sequencing/Annotation Pipeline • Sequence of genes/genome • Primary Annotation - the location and structure of genes • Secondary Annotation - the functions of the genes ATGCTTCCTGATTTTGCCCTGGACTTCGCTTGTATAAATTCATTGCACC… GO process: terrequinone A biosynthesis GO function: methyltransferase activity alcohol dehydrogenase Enzyme Commission: 2.1.1.-
Who will be annotating? • Just you? • A single group? • A consortium of groups? The number of people and groups participating and the funding will affect some decisions on whether to set up a database or use flatfiles.
Make automated or manual gene calls TIGR’s Eukaryotic Annotation course very useful Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to GenBank/DDBJ/EMBL? Resources to make functional annotations? no yes no yes Submit gene/protein calls to GenBank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. UniProtKB contains translations of all coding regions in GenBank/DDBJ/EMBL Decide who will collate all GO annotations into one file Set up pipeline for any automated annotations not being done by GOA Manual GO annotations from literature, or from sequence similarity methods GOA will make GO annotations (IEA) using automated methods GOA will collect all GO annotations and submit them to GOC You (or your group) collects all GO annotations and submits them to GOC GOA will maintain annotation file You (or your group) maintains annotation file
Automated Eukaryotic Gene Annotation Genome Sequence EST Database Repeat masker Repeat masked sequence Develop a training set Database comparisons Gene finders AAT_aa AAT_na tRNA Scan GMAP Sim4 etc. Twinscan GeneZilla glimmerHMM Augustus Fgenesh etc. Gene predictions Genome alignments Combined consensus prediction EST based refinement (adjust exons, UTRs, alternative splicing) Automated Gene Annotation Based on TIGR course
Manual Gene Annotation? 1st Question - Is it in the budget? Manual annotation can be a lot better than automated, but is a lot more expensive and time consuming! Based on TIGR Eukaryotic Annotation course
Manual Gene Annotation Tools • Viewer only • Gbrowse • Editors • Apollo (requires a database) • Manatee (requires a database) • Artemis (runs on flat files) Based on TIGR Eukaryotic Annotation course
Eukaryotic Gene Annotation At the end of the procedure, you’ll have: • Gene calls • Protein predictions • Unique IDs for your genes This last is important. Gene IDs are unambiguous. Gene names are frequently ambiguous. You’ll also need IDs in order to submit GO annotations. Example: Gene Name: SP1 Gene ID: NM_138473 19242 hits in Entrez nucleotide 1 hit
Ready to make Functional Annotations! • Questions • What’s your budget? • How much literature is available? • Automated annotations • Faster, cheaper • Often less specific • Manual annotations • Time consuming & more expensive • Precise and accurate
Make automated or manual gene calls TIGR’s Eukaryotic Annotation course very useful Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to GenBank/DDBJ/EMBL? Resources to make functional annotations? no yes no yes Submit gene/protein calls to GenBank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. UniProtKB contains translations of all coding regions in GenBank/DDBJ/EMBL Decide who will collate all GO annotations into one file Decide who will collate all GO annotations into one file Set up pipeline for any automated annotations not being done by GOA Manual GO annotations from literature, or from sequence similarity methods GOA will make GO annotations (IEA) using automated methods
Introduction to GO Rama Balakrishnan Saccharomyces Genome Database Stanford University, CA
The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!
What’s in a name? • What is a cell?
Cell Image from http://microscopy.fsu.edu
What’s in a name? • The same name can be used to describe different concepts
What’s in a name? • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose from simpler components
What’s in a name? • The same name can be used to describe different concepts • A concept can be described using different names Comparison is difficult – in particular across species or across databases
What’s in a name? • Rad54 (S. cerevisiae) • Okra (D. melanogaster) • Rhp54 (S. pombe) What do these genes products have in common? ATP dependent helicase involved in DNA recombination, repair
What is the Gene Ontology? A (part of the) solution: • A controlled vocabulary that can be applied to all organisms • Used to describe gene products - proteins and RNA - in any organism
What is Ontology? • Dictionary: A branch of metaphysics concerned with the nature and relations of being. • Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. 1606 1700s
So what does that mean? From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things. is part of
Ontology Includes: • A vocabulary of terms (names for concepts) • Definitions • Defined logical relationships to each other
How does GO work? • What does the gene product do? • Molecular Function • Why does it perform these activities? • Process • Where does it act? • Location in the cell, cellular component What information might we want to capture about a gene product?
The 3 Gene Ontologies • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective • broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme
Molecular Functionactivities or “jobs” of a gene product insulin binding insulin receptor activity glucose-6-phosphate isomerase activity drug transporter activity
Molecular Function • A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product. • Sets of functions make up a biological process.
cell division Biological Process transcription limb development Courtship behavior
Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’sjuggling object Entertainment
What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. Synonym: glucose biosynthesis
No GO Areas • GO covers ‘normal’ functions and processes • No pathological processes • No experimental conditions • NO evolutionary relationships • NO gene products • NOT a system of nomenclature for genes
Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by two relationships • is-a • part-of
Parent-Child Relationships Chromosome Cytoplasmic chromosome Mitochondrial chromosome Nuclear chromosome Plastid chromosome A child is a subset or instances of a parent’s elements
Parent-Child Relationships One-to-many parental relationship Many-to-many parental relationship DAG: Directed Acyclic Graph Each child has only one parent Each child may have one or more parents
A Sample DAG cellular_component cell part is_a part_of Intracellular organelle chromosome nucleus [other organelles] [Other types of chromosomes] mitochondrial chromosome nuclear chromosome
True Path Rule • The path from a child term all the way up to its top-level parent(s) must always be true cell • cytoplasm • chromosome • nuclear chromosome • cytoplasmic chromosome • mitochondrial chromosome • nucleus • nuclear chromosome • is-a • part-of