510 likes | 543 Views
Gene Ontology Annotation of immune system genes. Evelyn Camon On behalf of the EBI, Cambridge, UK Located Nr. Moira, Craigavan, Northern Ireland 13 September 2007, Dublin City University, ISI. What is the EMBL-EBI?. Non-profit organization
E N D
Gene Ontology Annotation of immune system genes Evelyn Camon On behalf of the EBI, Cambridge, UK Located Nr. Moira, Craigavan, Northern Ireland 13 September 2007, Dublin City University, ISI
What is the EMBL-EBI? • Non-profit organization • Part of the European Molecular Biology Laboratory • EMBL is a basic research institute funded by public research monies from 19 member states. • EBI based on the Wellcome Trust Genome Campus near Cambridge, UK
Databases: molecules to systems Nucleotide sequence EMBL-Bank Genomes Ensembl, Integr8 Proteomes UniProt, GOA Gene expression ArrayExpress Protein structure MSD Protein families, motifs and domains InterPro Chemical entities ChEBI Protein interactions IntAct Pathways Reactome • European node for globally • coordinated data collection and • dissemination projects Systems BioModels
Presentation agenda • Brief introduction to Gene Ontology (GO) • How we annotate GO terms (link proteins to terms) • Immune System GO Annotation: Tackling an ‘unmet need’ • Plan of Action ( gene lists, wiki, workshops) • How you can contribute (wiki, mailing list, volunteer) • Dissemination
The Gene Ontology www.geneontology.org
GO classification Larkin JE et al, Physiol Genomics, 2004 Why do we need ontologies? - The use of HTP technology in innate and adaptive immune research is gaining momentum. - Gene-set enrichment analysis helps us spot the patterns BUT often relies on annotation. - Dissection of immune system responses is hampered by the lack of annotation of many of the key gene products involved Proteomics data analysis Microarray data analysis GO classification
We have the Information Why do we need ontologies? • Literature Search for human interleukin 2: >18000 papers • How do you find the paper with useful knowledge, summarise it and link it to your dataset? http://www.teamtechnology.co.uk/f-scientist.jpg
Why do we need ontologies? • Need to organise, analyse and share knowledge • BUT • English language is not precise • Same namefor different concepts e.g. bud initiation • Different namesfor the same concept e.g. T cell homeostatic proliferation, resting T cell proliferation • Ontologies/standardised vocabularies try to solve this problem by providing a way to convert knowledge into a machine readable format.
Who founded the Gene Ontology? • GO was founded in 1998 by top biological database managers • FlyBase, Prof. Michael Ashburner • MGI, Dr. Judith Blake • SGD, Dr. Mike Cherry • BDGP, Dr. Suzi Lewis supported the creation of GO editor First paid annotator was appointed, Dr. Midori Harris
The Gene Ontology • A (part of the) solution: • The Gene Ontology: “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” • A controlled vocabulary to describe gene products - proteins and RNA - in any organism.
The Gene Ontology Consortium 1998/99 2001 2007
The Three Gene Ontologies • Provides a standard, species-neutral • way of representing biology • Molecular Function: elemental activity or task interleukin-6 binding • Biological Process: broad objective or goal, antigen processing and presentation • Cellular Component: location or complex extracellular, mitochondrion.. For community use in gene and gene product annotation
GO and other Ontologies http://obo.sourceforge.net/
cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of Directed Acyclic Graph (DAG) GO can have up to 67 nodes deep..
Anatomy of a GO term http://www.ebi.ac.uk/quickgo Denormalised tree view..
Anatomy of a GO term Graphical view..
2004 2001 2007 6861 Terms 16362 Terms 23053 Terms GO is a work in progress
723 Immune system terms added Nov 2006 Diehl et al., 2007, PMID:17267433 Search of GOpubMed revealed 93 immunological researchers have used GO annotation to aid interpretation of their data sets. http://www.gopubmed.org/
What is GO Annotation? www.ebi.ac.uk/goa
What is GO annotation? GO Term ID An annotation is a statement that a gene product … …has a particular molecular function …is involved in a particular biological process …is located within a certain cellular component …as determined by a particular method …as described in a particular reference. Evidence Code Reference Smith et al. determined by a direct assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.
How GOA annotates to the GO ? Electronic Annotation Manual Annotation • All annotations must: • be attributed to a source. • indicate what evidence was found to support the GO term-gene/protein association.
Manual Annotation • High–quality, specific gene/gene product associations made, using: • Peer-reviewed papers • Evidence codes to grade evidence BUT – its more time consuming and requires trained biologists, community input.
GO Evidence Codes • An evidence code indicates how annotation to a particular term is supported in the cited paper
DB DB_Object_ID DB_Object_Symbol Qualifier GOid DB:Reference Evidence With Aspect UniProt O00110O00110_HUMAN GO:0008083 GOA:interpro IEA INTERPRO:IPR007087 F UniProt O75976 CBPD_HUMAN GO:0008472 GOA:spec IEA EC:3.4.17.22 F UniProt P06730 IF4E_HUMAN GO:0005515 PMID:15247416 IPI UniProt:Q04743 F UniProt Q9Y265 RUVB1_HUMAN NOT GO:0016877 PMID:10966108 IDA F - - protein taxon:9606 20040426 UniProt Carboxypeptidase D precursor IPI00027078 protein taxon:9606 20060125 UniProt Eukaryotic translation initiation IPI00027463 protein taxon:9606 20060125 IntAct RuvB-like 1IPI00012323 protein taxon:9606 20030721 Roslin Gene Association File 15 columned tab delimited file DB_Object_Name DB_Object_Synonym DB_Object_Type taxon Date Assigned by EBI: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ GO: http://www.geneontology.org/GO.current.annotations.shtml
Manual GO annotation Choose Immune gene Find Paper Annotate GO term to protein Find GO term Find evidence choose code Save to database GOA-association file Oracle RDBMS GO and EBI ftp sites UniProtKB, SRS, QuickGO, Gene, Ensembl, MOD, 100 GO Tools
Search one function across all species As well as benefiting HTP it would benefit Comparative Proteomics
Immunology GO annotation project http://www.geneontology.org/immunlogy.shtml
Immunology GO annotation Project • Mouse Genome Informatics (MGI), • European Bioinformatics Institute(EBI) • University College London (UCL) , • Trinity College, Dublin (TCD), plan to tackle this annotation deficit with assistance from: and the immunological community.
Plan of Action • Created a dedicated project website at: http://www.geneontology.org/GO.immunology.shtml Email list: immunology@genome.stanford.edu See Case Study Microarray tools
Plan of Action • A working group of professional GO curators with a background in immunology has been organised. Evelyn Camon (EBI/TCD) Ruth Lovering (UCL) Alexander Diehl (MGI) Jennifer Deegan (EBI) Software Engineer (EBI) Bovine, Porcine & Human GO annotator Human GO annotator Mouse GO annotator, Ontology Developer GO Outreach Coordinator Ontology Developer Database Releases and Community Tool Development
Plan of Action • Create a list of immunologically related genes • Four curated lists: • ImmPort - 2974 protein coding genes (PMID:17238789 , University of Texas, Richard Scheuermann) • IRIS - 1548 genes (PMID:15780753, University of Cambridge, John Trowsdale and Bernard de Bono) • Immunome - 847 genes (PMID:17434156, University of Tampere, Finland, Ortutay C, Vihinen M. )http://bioinf.uta.fi/Immunome/. • The “MapK” list - 447 genes (Simon Fraser University, Canada, Fiona Roche, Fiona Brinkmann) • The combined list has 3691 genes.
Plan of Action • Prioritise the list to focus our annotation effort • Four additional lists used to rank genes: • Genes shown by microarray to be upregulated in inflammation (PMID:16136080, Dr. Steve Calvano) • Genes associated with OMIM records • Genes on the MGI Top 1000 requested genes • Most queried UniProtKB records already annotated to GO “immune response” term We have also calculated the number of publications / gene /species
TLR4 • IL10 • TNF • CCL2 • IL6 • MAPK14 • IFNB1 • IL2 • IL4 • IL6ST • LCK • NFKB1 • PTPRC • SPP1 • TLR2 • ZAP70 • B2M • C3 • CD14 • CXCL12 • FYN • HLA-DQB1 • IFNG • IKBKG • IL12B • IL1R1 • IL2RA • STAT3 • ADA • BCL2 • C5 • CASP3 • CASP8 • CCR7 • CD28 • CD4 • CD8A • CHUK • CLU • CSF1R • CTLA4 • CXCR3 • CXCR4 • DPP4 • FAS • FCGR2B • HLA-DRA • ICAM1 • IKBKAP • IL1B • IL2RG • IL7 • ITGAM • ITGB2 • MYD88 • NFKBIA • RELA • TGFB1 • TNFRSF1A • AIRE • APP • C1QBP • C4A • C4B • CASP1 • CASP10 • CASP2 • CCL5 • CCR2 • CCRL2 • CD1D • CD24 Top ranking 1300 genes
How you can contribute Community GO Annotation wiki for Immunology http://wiki.geneontology.org • Gene pages for 1300 of the top genes in our prioritised list are found in the wiki. • The pages contain basic information about the gene name and synonyms and have links to current GO annotation for a number of species. • Users may edit the pages to name additional biological facts to guide our annotation efforts.
Community GO Annotation wiki http://wiki.geneontology.org Immune Related Gene List Immune System Processes Example Gene Page
Community GO Annotation wiki http://wiki.geneontology.org Alphabetical list, ‘I’ View by Priority Score Suggest new genes
Community GO Annotation wiki http://wiki.geneontology.org Click on hyperlink to gene page
Community GO Annotation wiki http://wiki.geneontology.org Click on hyperlink to gene page
Community GO Annotation wiki http://wiki.geneontology.org Synonyms View Gene Details Find Orthologs
Inferred from Electronic Annotation (IEA) Community GO Annotation wiki http://wiki.geneontology.org Existing GO Annotations See Comparative Graph
Community GO Annotation wiki http://wiki.geneontology.org Literature Search Tools IL1A and Mouse = 202 publications Summary Experimental evidence Which Paper? PMID? Email address
Edit the wiki http://wiki.geneontology.org Existing GO Annotations
Example: Manually Curated Gene Links to Publication Experimental Evidence Code
GOA data dissemination www.ebi.ac.uk/goa
Search GO term/accession GOA home page Click on Downloads View statistics View proteome sets About GOA file format Cross reference file Download mappings ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
GO home page EBI file downloads http://www.geneontology.org/GO.current.annotations.shtml
AmiGO Browser Query by manual evidence code Query by species Query by data source http://amigo.geneontology.org/cgi-bin/amigo/go.cgi
UniProtKB Ensembl Entrez Gene
..In addition to: Array Products and data analysis Affymetrix, Spotfire, Almac Numerous third party tools: (http://www.geneontology.org/GO.tools.shtml) All GO Consortium Model Organism Databases integrate GOA data to ensure a comprehensive set of annotations