1 / 27

GlycO: An Ontology for Glycomics Research

GlycO is a comprehensive ontology for the study of glycomics, including the biosynthesis, metabolism, and biological relevance of complex glycans. It provides a framework for modeling complex carbohydrate structures and enables the analysis of glycoproteomics data. The ontology is populated using diverse data sources, ensuring quality and avoiding redundancy. GlycO is an essential tool for researchers in the field of bioinformatics and biomedical ontologies.

janettel
Download Presentation

GlycO: An Ontology for Glycomics Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Research Overview

  2. Outline • Biomedical Ontologies • GlycO • EnzyO • ProPreO • Scientific Workflow for analysis of Proteomics Data • Framework for Semantic Provenance Annotation • Biological Services Registry • Demo of User Interface – T.cruzi Knowledge Base

  3. GlycO • is a focused ontology for the description of glycomics • models the biosynthesis, metabolism, and biological relevance of complex glycans • models complex carbohydrates as sets of simpler structures that are connected with rich relationships • An ontology for structure and function of Glycopeptides • Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO) • See:GlycoDoc, GlycO

  4. Challenge – model hundreds of thousands of complex carbohydrate entities But, the differences between the entities are small (E.g. just one component) How to model all the concepts but preclude redundancy → ensure maintainability, scalability GlycO

  5. GlycO population • Assumption: with a large body of background knowledge, learning and extraction techniques can be used to assert facts. • Asserted facts are compositions of individual building blocks • Because the building blocks are richly described, the extracted larger structures will be of high quality

  6. GlycO Population • Multiple data sources used in populating the ontology • KEGG - Kyoto Encyclopedia of Genes and Genomes • SWEETDB • CARBBANK Database • Each data source has a different schema for storing data • There is significant overlap of instances in the data sources • Hence, entity disambiguation and a common representational format are needed

  7. Diverse Data From Multiple Sources Assures Quality • Democratic principle • Some sources can be wrong, but not all will be • More likely to have homogeneity in correct data than in erroneous data

  8. Ontology population workflow

  9. Ontology population workflow [][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc]{}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}}

  10. Ontology population workflow <Glycan> <aglycon name="Asn"/> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc"> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc"> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="Man" > <residue link="3" anomeric_carbon="1" anomer="a" chirality="D" monosaccharide="Man" > <residue link="2" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc" > </residue> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc" > </residue> </residue> <residue link="6" anomeric_carbon="1" anomer="a" chirality="D" monosaccharide="Man" > <residue link="2" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc"> </residue> </residue> </residue> </residue> </residue> </Glycan>

  11. Ontology population workflow

  12. Diverse Data From Multiple Sources Assures Quality • Holds only, when the data in each source is independent • In the case of GlycO, the sources that were meant to assure quality were not diverse. • One original source (Carbbank) was copied by several Databases without curation • Errors in the original propagated • Errors in KEGG and Carbbank are the same • Cannot use these sources for comparison • Needs curation by the expert community ?

  13. b-D-GlcpNAc -(1-2)- b-D-GlcpNAc -(1-2)+ b-D-GlcpNAc -(1-4)- a-D-Manp -(1-6)+ b-D-Manp -(1-4)- b-D-GlcpNAc -(1-4)- b-D-GlcpNAc a-D-Manp -(1-3)+ GlycoTree N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15: 235-251

  14. Pathway Steps - Glycan Abundance of this glycan in three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia

  15. ProPreO ontology • An ontology for capturing process and lifecycle information related to proteomic experiments • Two aspects of glycoproteomics: What is it?→ identification How much of it is there? → quantification • Heterogeneity in data generation process, instrumental parameters, formats • Need data and process provenance→ ontology-mediated provenance • Hence, ProPreO models both the glycoproteomics experimental process and attendant data • Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO)

  16. N-GlycosylationProcess (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification N-dimensional array Peptide list Data correlation Signal integration

  17. Workflow based on Web Services = Web Process

  18. ProPreO: Ontology-mediated provenance parent ion charge 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data Mass Spectrometry (MS) Data

  19. ProPreO: Ontology-mediated provenance • <ms-ms_peak_list> • <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” • mode=“ms-ms”/> • <parent_ion m-z=“830.9570” abundance=“194.9604” z=“2”/> • <fragment_ion m-z=“580.2985” abundance=“0.3592”/> • <fragment_ion m-z=“688.3214” abundance=“0.2526”/> • <fragment_ion m-z=“779.4759” abundance=“38.4939”/> • <fragment_ion m-z=“784.3607” abundance=“21.7736”/> • <fragment_ion m-z=“1543.7476” abundance=“1.3822”/> • <fragment_ion m-z=“1544.7595” abundance=“2.9977”/> • <fragment_ion m-z=“1562.8113” abundance=“37.4790”/> • <fragment_ion m-z=“1660.7776” abundance=“476.5043”/> • </ms-ms_peak_list> OntologicalConcepts Semantically Annotated MS Data

  20. ISiS – Integrated Semantic Information and Knowledge System Semantic Web Process to incorporate provenance Biological Sample Analysis by MS/MS Agent Raw Data to Standard Format Agent Data Pre- process Agent DB Search (Mascot/Sequest) Agent Results Post-process (ProValt) O I O I O I O I O Storage Raw Data Standard Format Data Filtered Data Search Results Final Output Biological Information Semantic Annotation Applications

  21. Integrated Semantic Information and knowledge System (Isis) Have I performed an error? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. SPARQL query-based User Interface ProPreO ontology Is the result erroneous? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. Experimental Data Semantic Annotation Metadata File Semantic Metadata Registry PROTEOMECOMMONS EXPERIMENTAL DATA ProVault result MACOT result mzXML Pkl pSplit Raw Raw2mzXML mzXML2Pkl Pkl2pSplit MASCOT Search ProVault PROTEOMICS WORKFLOW

  22. Semantic Annotation Facilitates Complex Queries • Evaluate the specific effects of changing a biological parameter: Retrieve abundance data for a given protein expressed by three different cell types of a specific organism. • Retrieve raw data supporting a structural assignment: Find all the raw ms data files that contain the spectrum of a given peptide sequence having a specific modification and charge state. • Detect errors: Find and compare all peptide lists identified in Mascot output filesobtained using a similar organism, cell-type, sample preparation protocol, and mass spectrometry conditions. A Web ServiceMust Be Invoked ProPreO concepts highlighted in red

  23. Browsing & Querying Data Generation Annotation Knowledge Base (ProPreO and GlycO Ontology) Data API GlycoVault WWW ISiS

  24. Semantic Biological Web Services Registry

  25. Data, ontologies, more publications at Biomedical Glycomics project web site: http://knoesis.wright.edu/research/bioinformatics/index.html Thank You

More Related