1 / 18

Rice Proteins Data acquisition Curation Resources

Rice Protein and Ontology Database. Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology. www.gramene.org. Objectives.

tanner
Download Presentation

Rice Proteins Data acquisition Curation Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rice Protein and Ontology Database • Rice Proteins • Data acquisition • Curation • Resources • Development and integration of controlled vocabulary • Gene Ontology • Trait Ontology • Plant Ontology www.gramene.org

  2. Objectives • Annotation of rice proteins using Gene Ontology (GO) concepts of Molecular Function, Biological Process and Cellular Localization • 4,000 rice genes annotated during project • Leading to presentation of Rice Protein Database (RPD) (http://www.gramene.org/perl/protein_search) • Ontology • Contribute GO terms for monocot plants • Develop and curate vocabulary for • plant anatomy • developmental stages • phenotypes or trait (TO-Trait Ontology) (PO-Plant Ontology) www.gramene.org

  3. Gene mining using the Controlled vocabulary Agronomic Root Shoot Seed Sub components Organ Gene Traits (TO) PO & TO Sub components Development Transcript Morphology Meristematic Vascular Ground Sub components Tissue Anatomy Or Histology Localization (PO) Sub components Cell Cell type Protein Sub components Cell components Sub-Cellular Molecular Function Sub components Enzyme others GO Reactions Other roles Biological Process Pathways Sub components Internal CVO Organic Inorganic Molecule Fats/carohydrates/proteins/mutagens/others www.gramene.org

  4. Rice Protein database (RPD) Link back sequence DBXrefs Germplasm bank Gramene Modules GenBank SWISSPROT EMBL/DDBJ Other databases Sequence entry BLAT Plant Ontology Anatomy & growth stages Gene Ontology Molecular function Biological process Cellular localization EnsEMBL Genome Browser Features on Peptide map IEA and ISS codes • Electronic Curation information • Sequence similarity • Clustal / BLAST • Traceable author statement • Predictions/identification • Gen Ontology mapping • Gramene & Interpro (EBI) • Pfam • PROSITE • PROTOMAP • Transmembrane helices • Cellular localization • Predictions based on HMM • Physiochemical properties • ProDom • 3D-Structural alignments • DBXref / References Non IEA code • Experimental evidence • Direct enzyme assay • Expression • Mutant/phenotype • Physical interaction • Complementation • Genetic interaction • Localization • Electronic-prediction • Citation • Sequence similarity Non IEA code Published report -PubMed -BIOSIS -Others www.gramene.org

  5. GenBank/SWISSPROT ENTRY Get information on Protein page • Name(s): Shows all the different names by which the molecule is represented in various databases and in scientific literature. • E.C. Number(s): Shows the designated Enzyme Commission (E.C.) number. The EC numbers link to the GenomeNet, Japan, from where further links to biochemical pathways and Ligands are accessible • Gene name(s): Lists all the gene names by which the molecule is called, as designated by the Commission on Plant Gene Nomenclature. If not available consider using a systematic name given to the ORF/Gene. Courtesy KEGG database www.gramene.org

  6. GenBank/SWISSPROT ENTRY Accession number: Is the Swissprot accession number, also similar to the "AC" field from SWALL (EMBL) record and "ACCESSION" field of GenBank records for respective protein entry. Links the protein entry to the other databases namely, GenBank protein database, SWALL from EMBL and SWISS-PROT. Get information on Protein page • Organism: Represents the taxonomic information on the organism from which the protein sequence was derived. • Species: Shows the species of the Genus Oryza (presently represents 23 of 25 species) • Subspecies: The subspecies indica or the japonica of the rice species Oryza sativa. • Cultivar: Is the variety/cultivar name from which the sequence was derived and will link to a germplasm bank (GRIN/IRIS) for further information www.gramene.org

  7. Perform a “Blat” alignment of the Rice protein sequences from SWISSPROT and translated peptides from Ensembl Rice genome sequence database at Gramene. The cut-off score used is 99% identity. The curator should validate. Add the features to the Protein structure - a map showing protein domains (e.g. Pfam) and protein features (trans-membrane, low complexity and coil regions) on the Ensembl peptide report page. GenBank/SWISSPROT ENTRY Sequence Protein page Map with features Use it for performing analyses to identify features such as, Pfam / Prosite domains and generate predictions for trans-membrane helix, coiled coil regions, cellular component localization Validation Based on available CDS features and gene indices/ESTs www.gramene.org

  8. Prosite members in RPD Pfam members in RPD ftp://www.gramene.org/pub/gramene/protein/feature/Oryza_TMHMM_result.txt Various tools used by Gramene in annotation of rice gene products www.gramene.org

  9. Rice Functional Information After identifying a number of features, finally the curator proceeds to annotate gene product(s) in Rice Protein Database • Annotate rice gene function using the Gene Ontology (GO) system • Provide literature citations as evidence for assertion and classify them using the evidence codes Gene Ontology is a controlled vocabulary to define the following concepts for a gene product Molecular function: GO term(s) defining the molecular function of gene product Biological process: GO term(s) defining the biological process Cellular component: GO term(s) identifying the localization of the protein in a cell www.gramene.org

  10. Gene Ontology (GO) Associations EVIDENCE CODES APPLIED IN RICE PROTEIN DATABASE IDA inferred from direct assay Enzyme assays / in vitro reconstitution immunofluorescence / cell fractionation binding assay IEA inferred from electronic annotation Feature search / Interpro / Pfam / Prosite / Annotations from database records IEP inferred from expression pattern Northerns / microarray data / western blots IMP inferred from mutant phenotype Gene mutation / deletion or disruption / over expression / ectopic expression anti-sense experiments / RNAi experiments / specific protein inhibitors NR not recorded Very old annotation IGI inferred from genetic interaction Suppressor screens / synthetic lethal / functional Complementation / rescue experiments IPI inferred from physical interaction 2-hybrid interactions/3-hybrid interactions co-purification / co-immunoprecipitation / affinity interaction ISS inferred from sequence or structural similarity Sequence similarity / Recognized domains / Structural similarity Southern blotting NAS non-traceable author statement No citation / non-traceable by curator TAS traceable author statement review article / text book / dictionary / website / database A complete list is available at http://www.gramene.org/plant_ontology/evidence_codes.html www.gramene.org

  11. Gene Ontology (GO) Associations Protein page Gramene Ontology Database The association of protein 1433_ORYSA with the GO term www.gramene.org

  12. Gene Ontology (GO) Associations Protein page Gramene Literature Database The association of protein 1433_ORYSA with literature citation (EVIDENCE for molecular function) www.gramene.org

  13. Gene Ontology (GO) Associations The association of protein 1433_ORYSA with the Literature citation and EVIDENCE CODES Protein page www.gramene.org

  14. Rice Protein Database (RPD) statistics-1 Total number of proteins: 8985 Number of proteins from SWISSPROT: 397 Number of proteins from TrEMBL: 8588 GO mappings are based on Interpro-EBI and Gramene curation Molecular function Biological process Total number of evidences: 21170 Total number of IEA evidences: 20593 Total number of non-IEA evidences: 577 Total number of references as evidences: 74 • Total number of associations: 9866 • (3321 gene products associated with 781 GO terms) • Biological Process: 242 terms-2881 associations • Molecular Function: 449 term-5599 associations • Cellular Component: 90 terms-1386 associations www.gramene.org

  15. Rice Protein Database (RPD) statistics-2 Total number of proteins in RPD: 8985 Number of proteins from SWISS-PROT: 397 Number of proteins from TrEMBL: 8588 Total number of correspondences between proteins and translations: 7960 (6912 proteins correspond to 7957 translations) Proteins have only one corresponding translation: 5911 Proteins have two corresponding translations: 959 Proteins have three corresponding translations: 37 Proteins have four corresponding translations: 5 Gene products associated with 781 GO terms: 3321 (refer to previous slide) Number of Pfam entries:874 Total number of proteins that have mappings to Pfam:3663 Number of Prosite entries:556 Total number of proteins that have mappings to Prosite:3201 Total number of proteins that have mappings to trans-membrane features:1583 www.gramene.org

  16. Trait Ontology (TO) to describe Mutants/phenotypes in rice www.gramene.org

  17. PLANT ONTOLOGY resources will be available soon www.plantontology.org www.gramene.org

  18. Future plans • Continue annotation of rice proteins • Identify the resources and tools to provide much improved annotation of rice proteins, using HMM’s, structure predictions and other tools. • Develop tools to simplify the process of gene mining using Gramene and other databases by building combination search tools using controlled vocabulary and feature tables. • Start building up a resource for creating a protein interaction map for the complete rice genome based on association in a biochemical pathway, assembly in a functional complex / interacting partners, proximity on the genome and common regulation mechanism (a possible collaboration). • Contribute / share the controlled vocabulary for monocots with other databases • Develop the necessary tools and host the resource pages for Plant Ontology Consortium • Collaborate with Gene Ontology Consortium on various aspects of ontology development and curation www.gramene.org

More Related