1 / 44

Lecture 6: Gene ontology and Gene Annotation

Lecture 6: Gene ontology and Gene Annotation. June 19 , 2014. What is gene annotation. Process of assigning descriptions to a known gene that represent: Assigned gene name Molecular function, process and cellular location

dallon
Download Presentation

Lecture 6: Gene ontology and Gene Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6:Gene ontology and Gene Annotation June 19, 2014

  2. What is gene annotation • Process of assigning descriptions to a known gene that represent: • Assigned gene name • Molecular function, process and cellular location • Protein features: domains, functional elements such as nuclear localization signals

  3. What is the Gene Ontology? • Set of standard biological phrases (terms) which are applied to genes/proteins: • protein kinase • apoptosis • Membrane • Standardizing representation of gene and gene product attributes across species and databases

  4. Who annotates the genes? • Curators at the major databases • NCBI, EBI, MGI, model organism databases • Uniprot • Protein domain databases (PFAM, SMART, Interpro) • Older sources (SwissProt, PIR) • Gene ontology groups

  5. Why use gene ontology? • Allows biologists to make queries across large numbers of genes without researching each one individually • Can find all the PI3 kinases in a given genome or find all proteins involved in oxidative stress response without prior knowledge of every gene

  6. From the Ex 1 gene list • Vha-6 • C. elegans gene called vacuolar H ATPase • What is its role in the cell? • Gene ontology biological process: • body morphogenesis & determination of adult lifespan; lipid storage • GO molecular function: • H ion transmembrane transporter • GO cellular component • Apical plasma membrane, vacuolar ATPase complex

  7. Asparagine utilization Lysine biosynthesis Cell wall catabolism Oxidative stress response Glucose repression Aging Ribose metabolism Protein folding Ubiquinone biosynthesis A long list of genes...how do you make sense of them? By using gene ontology Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868

  8. GO structure Nucleic acid binding is a type of binding. • GO isn’t just a flat list of biological terms • terms are related within a hierarchy is_a is_a DNA binding is a type of nucleic acid binding.

  9. gene A GO structure A single gene associated with with a particular term is automatically annotated to all of the parent terms

  10. GO structure • This means genes can be grouped according to user-defined levels • Allows broad overview of gene set or genome

  11. How does GO work? • What does the gene product do? • Where and does it act? • Why does it perform these activities? What information might we want to capture about a gene product?

  12. GO structure • GO terms divided into three parts: • cellular component • molecular function • biological process

  13. Cellular Component • where a gene product acts Mitochondria

  14. Cellular Component Cellular components of a virus different than a cell

  15. Cellular Component Enzyme complexes in the component ontology refer to places, not activities.

  16. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity

  17. Molecular Function insulin binding insulin receptor activity

  18. Molecular Function • A gene product may have several functions • Sets of functions make up a biological process.

  19. cell division Biological Process a commonly recognized series of events

  20. Biological Process transcription

  21. Biological Process regulation of gluconeogenesis

  22. Biological Process limb development

  23. Biological Process courtship behavior

  24. Ontology Structure • Terms are linked by two relationships • is-a  • part-of 

  25. cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of Ontology Structure

  26. term: transcription initiation id:GO:0006352 definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter. • a name • an ID number • a definition GO terms Each concept has:

  27. GO terms • Where do GO terms come from? • GO terms are added by editors at EBI and annotating databases • new terms are usually only added when they are asked for by annotators • GO editors work with experts to make major ontology developments • metabolism • pathogenesis • cell cycle

  28. Species coverage • All major eukaryotic model organism species • Human via gene ontology annotation (GOA) group at UniProt • Several bacterial and parasite species through TIGR and GeneDB at Sanger ~80 species in the Gene Ontology database

  29. Anatomy of a GO annotation • Three key parts: • gene name/id • GO term(s) • evidence for association

  30. Example annotation Human BRCA1 protein – molecular function GO terms

  31. Types of evidence codes Experimental codes Other evidence codes Computational codes

  32. Manual annotation Molecular function In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Cellular component Biological process

  33. Electronic Annotation • Annotation derived without human validation • mappings file e.g. interpro2go, ec2go. • Blast search ‘hits’ • Lower ‘quality’ than manual codes • Used in non-model organisms

  34. GO & microarray analysis • Many tools exist that use GO to find common biological functions from a list of genes • GoMiner, GOstat, Onto-express, FatiGO and GSEA to name a few • We’ll use the DAVID Bioinformatics Resource

  35. GO tools • input a gene list • shows which GO categories have most genes associated with them or are “enriched” • provides a statistical measure to determine whether enrichment is significant

  36. Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis … Gene 2 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 3 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 4 Nervous system Pregnancy Oncogenesis Mitosis … Gene 100 Positive ctrl. of cell prolif Mitosis Oncogenesis Glucose transport … Traditional analysis

  37. GO:0006915 : apoptosis Using GO annotations • But by using GO annotations, this work has already been done for you!

  38. Grouping by process Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 … Glucose transport Gene 7 Gene 3 Gene 6 … Apoptosis Gene 1 Gene 53 Positive control of cell proliferation Gene 7 Gene 3 Gene 12 … Growth Gene 5 Gene 2 Gene 6 …

  39. GO for microarray analysis • Annotations give ‘function’ label to genes • Ask meaningful questions of microarray data: • Do the genes involved in the same process have the same or different expression patterns?

  40. mitosis – 80/100 apoptosis – 40/100 Cell proliferation – 30/100 glucose transport – 20/100 microarray 1000 genes 100 genes differentially regulated experiment Using GO in practice • statistical measure • how likely your differentially regulated genes fall into that category by chance

  41. Using GO in practice • However, when you look at the distribution of all genes on the microarray:

  42. Other sources of annotation • Uniprot (Swiss-Prot) keywords • Protein domain databases • PFAM, Panther, PDB, PROSITE, ect • GeneDB summaries from NCBI • Protein-protein interactions databases • Pathway databases • KEGG, BioCarta, BBID, Reactome DAVID incorporates annotation from all of these and clusters the redundant terms

  43. Limitations of GO analysis ~40% of the C. neoformans predicted proteins are similar only to other C. neoformans and have no identifiable protein domain Difficult to do enrichment analysis on only 60% of the coded proteins

  44. Today in computer lab Tutorial on using DAVID for GO enrichment analysis Analyze the gene lists from Exercise 1 and 2 Create a sub-list that you will use in Exercise 7

More Related