360 likes | 484 Views
Update. Susan Bridges, Fiona McCarthy, Shane Burgess. NRI 2006-04846. Some of what we’ve been doing :Confirmation of predicted/hypothetical proteins in chicken. 2. Something of more interest to almost everyone in here for analyzing your data. Educate researchers who need to use GO.
E N D
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Some of what we’ve been doing :Confirmation of predicted/hypothetical proteins in chicken 2. Something of more interest to almost everyone in here for analyzing your data.
Educate researchers who need to use GO. University of Delaware, 12-13 November, 2007. …… currently working with researchers from the Universities of Delaware and Maryland to provide GO annotations necessary to facilitate publication of array data. First residential workshop at MSU in May 20-22 2008.
Avian Genome Conference 18-20 May, 2008 GO Annotation Jamboree 21-22 May, 2008 agbase@cse.msstate.edu
“Hypothetical” and “predicted” proteins Naive and activated purified CD4+ T cells; transformed CD4+ T cells; spleen; brain tissues; bursal B and stromal cells; muscle; and serum. Database of all predicted proteins, from chicken build 2.1, using DFF-2D LC MS2 and our computational pipeline. Experimentally-confirmed 7,809 chicken predicted proteins: 52% were expressed in more than one tissue. 6,027 (77%) of these proteins mapped to human and mouse orthologs and we assigned standardized nomenclature to 5,326 (64%). 8,213 GO associations to 21% of the identified chicken proteins using the ISS evidence code to transfer function between human-chicken and human-mouse orthologs increased the current chicken GO annotations by 8% and doubled the number of chicken manually-curated annotations. In PRIDE and NCBI databases and being used at NCBI to promote XP (computational model) to NP (confirmed product) accessions i.e. the words “hypothetical” and “predicted” are removed. We also add experimentally-derived cell component GO annotations.
6000 Tissue specific proteins 5000 Proteins identified in other tissues 4000 3000 Number of proteins 2000 1000 0 Tcells Brain Serum Spleen UA01 B-cells Muscle Stroma Tissue type 1% 0% 4% (61) (2) 0% (313) (0) 7% (561) 14% (1,073) 48% (3,779) 26% (2,020) In one tissue In two tissues In three tissues In four tissues In five tissues In six tissues In seven tissues In all eight tissues Tissue distribution of expressed ‘predicted’ proteins
chicken: human/mouse orthologs (1:1) No human or mouse orthologs 1,784 Mouse orthologs Human orthologs 236 5,685 106
Cumulative external visits to AgBase 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 J Au Se Oc No De Ja Fe Ma Ap Ma Ju J Au Se Oc No De Ja Fe Ma Ap Ma Ju J Au Se Oc No De 07 05 05 05 05 05 05 06 06 06 06 06 06 06 06 06 06 06 06 07 07 07 07 07 07 07 07 07 07 07
Summary of GO annotations for last 12 months 11,716 GO annotations for chicken & cow: • 214 cow gene products GO annotated (1,521 GO annotations) • 1,762 chicken gene products GO annotated (10,194 GO annotations) • in addition, orthology with human and mouse genes used to GO annotate 7,809 computationally ‘predicted’ chicken proteins (8,213 GO annotations)
Chicken Dec '07 Database distribution of AgBase GO Annotations Cow Dec '07 AgBase Community file GO Consortium file
Genomic Annotation Structural Annotation including Sequence Ontology Other annotations using other bio-ontologies e.g. Anatomy Ontology Nomenclature (species’ genome nomenclature committees) Functional annotation using Gene Ontology
Quality improvement of annotations Pre-annotation Re-annotation
GO annotation of arrays. structural mapping Array IDs Gene product IDs ‘known’ genes from public databases ‘predicted’ genes from genome sequencing Is functional literature available ? NO Are strict mammalian orthologs available ? YES YES NO GO annotation of literature GO annotation from orthologs (ISO) Electronic GO annotation using InterPro data (IEA) Collate GO annotations Submit to EBI-GOA, GOC link to array IDs (updateable)
AgBase: annotating arrays 1. Del-Mar 14K Chicken Integrated Systems microarray (GPL1731). • 14,053 chicken genes represented • 9,587 contigs GO annotated (CC:3,514; MF:6,640; BP:4,623) • 3,101 singletons GO annotated (CC:487; MF: 881; BP:646) • many singletons map to chicken ESTs with no associated GO
Figure 1A: Biological Process associated with Del-Mar 14K array metabolic process transport cell communication development immune response cell death cell differentiation response to stress sensory perception cell motility regulation of biological process cellular organization and biogenesis behavior response to chemical stimulus process unknown
6.0 cellular organization and biogenesis 4.0 response to stimulus cell communication metabolic process 2.0 0.0 Array GO/total chicken GO behavior transport secretion cell death cell motility -2.0 development process unknown cell differentiation immune response response to stress sensory perception -4.0 response to chemical stimulus regulation of biological process -6.0 GO Biological Processes Relative amount of GO BP associated with Del-Mar 14K array compared to total chicken GO.
AgBase: annotating arrays 2. TAMU Agilent 44K chicken array • approx 44,000 chicken genes represented • added GO annotation for 8,731 chicken gene products • many of the array IDs with no associated GO annotation map to chicken EST sequences
AgBase: annotating arrays 3. FHCRC Chicken 13K v2.0 (GPL1836) • 13,007 chicken genes represented • 2,491 array IDs mapped to chicken gene products & GO annotated • 628 mapped to chicken gene products with no GO • approx 2,000 array IDs mapped to human or mouse gene products with GO annotation
GO Annotation Quality Score: “GAQ” GAQ : no. annotations; DAG depth; GO evidence code • calculate overall GAQ score for any dataset (eg. array) • calculate GAQ for subsets (eg. biological processes studied using arrays)
“Gene Ontology” Your Favorite Gene Your NEW Favorite gene “Biological Process” Low GAQ score Evidence Code IEA inferred from electronic annotation ISS inferred from sequence similarity IMP inferred from mutant phenotype IGI inferred from genetic interaction IPI inferred from physical interaction IDA inferred from direct assay IEP inferred from expression pattern TAS traceable author statement NAS non-traceable author statement ND no biological data available RCA inferred from reviewed computational analysis IC inferred by curator High GAQ score
Quantification of re-annotation • Metrics • GranularitySpecificity • # previous annotations # chicken annotations • # re-annotations # human/mouse annotations Quality Gene Annotation Quality (GAQ) score
GRANULARITY SPECIFICITY Pre-annotation 300% increase 4500 Re-annotation 4000 3500 3000 700% increase 2500 Number of annotations 50% increase 2000 1500 1000 500 0 Whole Array Chicken Human/Mouse Annotation type • 13% of previous annotations to other species were corrected to chicken specific annotations Bart van den Berg, CVM MSU/ Sue Lamont and Huaijun Zhu
GAQ score summary Pre-annotation Re-annotation Fold difference Total # proteins (Breadth) 886 4,240 4.8 Depth 87,250 231,184 2.7 Total GAQ score 207,869 579,599 2.8 Confidence score total 39,355 108,537 2.8
Quality improvement of annotations Pre-annotation Re-annotation
6 5.12 microarray GO / total chicken GO 4 1.64 2 1.26 1.06 1.04 0.46 0.33 0.18 0 Relativedifference -0.04 -0.75 -2 -1.80 cell death cell motility -4 -3.61 regulation of biological process Macromolecule metabolic process transport cell differentiation biological_process response to stimulus multicellular organismal development -4.88 -6 catabolic process Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process metabolic process cell communication GO Term GO biological process annotations
Modeling using the GO Gene Ontology Network Modeling Derived Implied Physiology (= Cellular Component + Biological Process + Molecular Function) (interactions) Functional Understanding
Buza, J. J. and S.C. Burgess. Modeling the proteome of a Marek's disease transformed cell line: a natural animal model for CD30 over-expressing lymphomas. Proteomics, 2007. 7:1316-26. Hypothesis-driven GO-based data interrogation
Avian Genome Conference 18-20 May, 2008 GO Annotation Jamboree 21-22 May, 2008 agbase@cse.msstate.edu