1 / 28

Ph.D. Candidate Steve Johnson Committee Members

Computer Science Ph. D. Seminar Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics. Ph.D. Candidate Steve Johnson Committee Members Dr. Debasis Mitra , Dr. Philip Bernhard , Dr. Walter Bond, Dr. Julia Grimwade Date: September 12, 2011.

tremper
Download Presentation

Ph.D. Candidate Steve Johnson Committee Members

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science Ph. D. Seminar Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics Ph.D. Candidate Steve Johnson Committee Members Dr. Debasis Mitra , Dr. Philip Bernhard , Dr. Walter Bond, Dr. Julia Grimwade Date: September 12, 2011

  2. Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics GO Background GO Subontologies GO Annotations GO Relationships GO Tools GO Research Research Direction

  3. Gene Ontology Background The Gene Ontology (GO), http://www.geneontology.org/, provides a consistent vocabulary for describing the attributes of proteins, specifically molecular function, biological process and the cellular component where the protein is found.

  4. Gene Ontology BackgroundGO Consortium • Berkley Bioinformatics Open Source Project (BBOP) • British Heart Foundation • EcoliWiki • Flybase • GeneDB • UniProtKB-GOA • Univ. of Maryland – IGS • Mouse Genome Informatics (MGI) • Rat Genome Database (RGD) • Saccharomyces Genome Database (SGD) • The Arabidopsis Information Resource (TAIR) • WormBase

  5. Gene Ontology BackgroundGO Consortium • GO terms • A set of integer IDs (i.e., GO terms) is assigned to members of the GO Consortium • GO Consortium members • provide annotations • attend all meetings, • receive funding for supported databases

  6. Gene Ontology Project Facts • Started in 1998 • Primary Goals • Structured Vocabulary • Use to annotate genes and gene products • 3 Model Organisms • FlyBase (Drosophila) • Saccharomyces Genome Database (SGD) • Mouse Genome Informatics (MGI) project

  7. Gene Subontologies Three Ontology Structure • Biological Process • Molecular Function • Cellular component

  8. Gene Subontologies Biological Process Biological process refers to the series of steps or sequence of molecular functions. Examples of biological processes include the following. • Metabolic Process • Photosynthetic Process • Biosynthetic Process

  9. Gene Subontologies Molecular Function Molecular Function refers to describing the purpose of the gene product and refers to a single function (i.e., unlike biological process). Examples of molecular function include the following. • Binding Activity • Transport Activity • Receptor Activity

  10. Gene Subontologies Cellular Component Cellular component refer to identifying the location of the gene product within the structure of the cell. Examples of cellular components include the following. • Organelle Part • Cell Body Membrane • Apical Complex

  11. GO AnnotationsGO Annotation Terms • Example • Term: Glucose Biosynthetic Process • ID: GO:0006094 • Definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

  12. GO AnnotationsGO Annotation Term Statistics • Molecular Function 8637 terms • Biological Process 17,069terms • Cellular Component2432 terms • Total 28, 138 terms As of September 2009

  13. GO AnnotationsGO Annotation Methods • Electronic Annotation • Manual Annotation • All annotations • Source • Supportive evidence

  14. GO AnnotationsGO Annotation Methods • Manual Annotation • Primary source is published literature • Curators perform sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis)

  15. GO AnnotationsGO Annotation Methods • Electronic Annotation • Database entries • Manual mapping of GO terms to concepts external to GO (‘translation tables’) • Proteins then electronically annotated with the relevant GO term(s) • Automatic sequence similarity analyses to transfer annotations between highly similar gene products

  16. GO AnnotationsGO Annotation Example Cellular component: Mitochondria GO:0005739 Biological Process: Ethanol Catabolic Process GO:0006068 Molecular Function: Oxireductase Activity 1A71 Liver Alcohol Dehydrogenase

  17. GO AnnotationsSample Annotations • GO Consortium members provide gene annotation data based on information obtained from research quality articles. • The information extracted from the articles are described as “Annotation Sets” • Sample Annotation Sets

  18. GO AnnotationsFile Format • The Gene Ontology website represents the annotation data in graphical format. It is part of the Open Biomedical Ontologies (OBO), http://obo.sourceforge.net/. • Current Species/Database Annotations • Annotation File Format (GAF 2.0)

  19. GO AnnotationsEvidence Code Categories • The information in the annotation file includes evidence information which serves as a source to validate /the annotation information. • Experimental Evidence Codes • Computational Analysis Evidence Codes • Author Statement Evidence Codes • Curator Statement Evidence Codes

  20. GO AnnotationsGO Slims • GO Slimsare subsets of GO annotation information that provide broader classification of terms. • GO Slim Application Example

  21. GO Relationships • A graph structure is used to establish relationship amongst the terms for molecular function, biological process, and cellular component features. • Primary Ontology Relations • is a • part of • regulates

  22. Gene Ontology BackgroundGO Mappings to EC Numbers • Enzyme Commission numbers are used to specify categories of enzymes based on the chemical reactions catalyzed. The UniProtKB-GOA EC2GO mapping provides GO molecular function IDs for each classification • EC1 - Oxidoreductases • EC2 - Transferases • EC3 - Hydrolases • EC4 - Lyases • EC5 – Isomerases • EC 6 - Ligases

  23. GO Tools • Amigo • OBO – Edit • QuickGO • Goanna • agriGO

  24. Gene Ontology Database • MySQL • Querying GO MySQL • SQL • Perl • GHOUL

  25. Gene Ontology Interesting Research • GO Annotation Consistency • Automated Annotation • Biocreative • CLUGO • Similarity Prediction Method • Automated Protein Function Predictions • Search for Genes w/ Similar Function • Semantic Similarity

  26. Dissertation Research Hypothesis There exists protein alignment metrics/algorithms that can be used as clustering indexes for proteins with matching GO molecular functions IDs

  27. Gene Ontology References Evelyn B Camon, Daniel G Barrell, Emily C Dimmer, Vivian Lee, Michele Magrane, John Maslen, David Binns and Rolf Apweiler; An evaluations of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatices 2005. 6 (Supplement 1): S17. Mary E. Dolan, Li Ni, Evelyn Camon and Judith A. Blake; A procedure for assessing GO annotation consistency. Bioinformatics 2005. 21 (Supplement 1): i136 – i143. In-Yee Lee, Jan-Ming Ho, Ming-Syan Chen; CLUGO: A Clustering Algorithm for Automated Functional Annotations Based on Gene Ontology. Proceedings of the 5th IEEE International Conference on Data Mining (ICDM, 05): i136 – i143. Gene Ontology Consortium; The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Research, 2009. Evelyn Camon, Michele Magrane, Daniel Barrell, Vivian Lee, Emily Dimmer, John Maslen, David Binns, Nicola Harte, Rodrigo Lopez and Rolf Apweiler; The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Research, 2004 (32).

  28. Gene Ontology References Gene Ontology Consortium; The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 2004 (32). Seth Carbon1, Amelia Ireland2, Christopher J. Mungall, ShengQiangShu, Brad Marshall, Suzanna Lewis; Amigo: online access to ontology and annotation data. Bioinformatics Application Note. 22 (2), 2009: 288 – 289.

More Related