150 likes | 246 Views
Glycomics project overview. Life Science Ontologies. Glyco An ontology for structure and function of Glycopeptides 573 classes, 113 relationships Published through the National Center for Biomedical Ontology (NCBO). ProPreO
E N D
Life Science Ontologies • Glyco • An ontology for structure and function of Glycopeptides • 573 classes, 113 relationships • Published through the National Center for Biomedical Ontology (NCBO) • ProPreO • An ontology for capturing process and lifecycle information related to proteomic experiments • 398 classes, 32 relationships • 3.1 million instances • Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO)
Two aspects of glycoproteomics: What is it?→ identification How much of it is there? → quantification Heterogeneity in data generation process, instrumental parameters, formats Need data and process provenance→ ontology-mediated provenance Hence, ProPreO models both the glycoproteomics experimental process and attendant data ProPreO ontology
ProPreO population: transformation to rdf Scientific Data Computational Methods Ontology instances
ProPreO population: transformation to rdf Scientific Data Computational Methods Key amino-acid sequence Protein Path Extract Peptide Amino-acid Sequence from Protein Amino-acid Sequence amino-acid sequence Protein Data Peptide Path Determine N-glycosylation Concensus Calculate Chemical Mass Calculate Monoisotopic Mass RDF Chemical Mass RDF Monoisotopic Mass RDF Amino-acid Sequence RDF n-glycosylation concensus chemical mass monoisotopic mass amino-acid sequence parent protein n-glycosylation concensus chemical mass monoisotopic mass amino-acid sequence “Protein RDF” “Peptide RDF”
ProPreO: Ontology-mediated provenance parent ion charge 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data Mass Spectrometry (MS) Data
ProPreO: Ontology-mediated provenance • <ms-ms_peak_list> • <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” • mode=“ms-ms”/> • <parent_ion m-z=“830.9570” abundance=“194.9604” z=“2”/> • <fragment_ion m-z=“580.2985” abundance=“0.3592”/> • <fragment_ion m-z=“688.3214” abundance=“0.2526”/> • <fragment_ion m-z=“779.4759” abundance=“38.4939”/> • <fragment_ion m-z=“784.3607” abundance=“21.7736”/> • <fragment_ion m-z=“1543.7476” abundance=“1.3822”/> • <fragment_ion m-z=“1544.7595” abundance=“2.9977”/> • <fragment_ion m-z=“1562.8113” abundance=“37.4790”/> • <fragment_ion m-z=“1660.7776” abundance=“476.5043”/> • </ms-ms_peak_list> OntologicalConcepts Semantically Annotated MS Data
Semantic annotation of Scientific Data • <ms/ms_peak_list> • <parameter • instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” • mode = “ms/ms”/> • <parent_ion_mass>830.9570</parent_ion_mass> • <total_abundance>194.9604</total_abundance> • <z>2</z> • <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> • <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> • <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> • <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> • <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> • <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> • <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> • <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> • <ms/ms_peak_list> Annotated ms/ms peaklist data
Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification N-dimensional array Peptide list Data correlation Signal integration N-GlycosylationProcess (NGP)
Biological Sample Analysis by MS/MS Agent Raw Data to Standard Format Agent Data Pre- process Agent DB Search (Mascot/Sequest) Agent Results Post-process (ProValt) O I O I O I O I O Storage Raw Data Standard Format Data Filtered Data Search Results Final Output Biological Information Semantic Web Process to incorporate provenance Semantic Annotation Applications
Integrated Semantic Information and knowledge System (Isis) Have I performed an error? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. SPARQL query-based User Interface ProPreO ontology Is the result erroneous? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. Experimental Data Semantic Annotation Metadata File Semantic Metadata Registry PROTEOMECOMMONS EXPERIMENTAL DATA ProVault result MACOT result mzXML Pkl pSplit Raw PROTEOMICS WORKFLOW Raw2mzXML mzXML2Pkl Pkl2pSplit MASCOT Search ProVault
Semantic Biological Web Service Registry Semantic Web Service
1 4 2 3 2 1 5 3 Gly|Asn| Gly|Ser 4 GLYDE-CT : GLYcan Data Exchange Based on a Connection Table Format <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE GlydeCT SYSTEM "http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11.DTD"> <GlydeCT xmlns:GlydeCT="http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11"> <structure type="molecule" id="molecule_1" name=“GP1"> <part type="moiety" id=“moiety_1" ref=“some_file#GNGS" name="GNGS"/> <part type="moiety" id=“moiety_2" ref=“some_file#Man3" name="Man3GlcNAc2"/> <link from=“moiety_2" to=“moiety_1"> <link from=“residue_1" to=“residue_2"> <link from="C1" to="N4"/> </link> </link> </structure> </Glyde-CT> moiety_2 moiety_1
Data, ontologies, more publications at Biomedical Glycomics project web site: http://knoesis.wright.edu/research/bioinformatics/index.html Thank You