1 / 1

INTRODUCTION

Ontology-based Workflow Designer. Ontology Assistant browsing querying. WF Editor composition browsing selection visualization. WS 1. WS 1. WS 1. WS 1. Network. WS 2. WS 2. WS 2. WS 2. Spectra PreprocessingServices. Spectra Preparation Services. Spectra Management Services.

caden
Download Presentation

INTRODUCTION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology-based Workflow Designer • Ontology Assistant • browsing • querying • WF Editor • composition • browsing • selection • visualization WS1 WS1 WS1 WS1 Network WS2 WS2 WS2 WS2 Spectra PreprocessingServices Spectra Preparation Services Spectra Management Services Spectra Visualization Services 1: S0 (26) 2: S1 (28) 3: S2 (27) 4: S3 (25) 5: S4 (26) 1.0 0.5 0.0 40 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 6: S5 (35) 7: S6 (19) 8: S7 (32) 9: S8 (31) 10: S9 (30) 1.0 0.5 30 0.0 ATE 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 11: S10 (24) 12: S11 (22) 13: S12 (22) 14: S13 (24) 15: S14 (20) 1.0 20 E(S) 0.5 0.0 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 16: S15 (27) 17: S16 (24) 18: S17 (22) 19: S18 (26) 20: S19 (18) 10 1.0 0.5 0.0 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 1 5 50 n1 21: S20 (27) 67 1 5 10 15 20 30 40 50 1.0 n 0.5 0.0 1 5 50 n1 1 5 50 n1 Number of features Three Internet Web Services are used to integrate remotely the two main system components. The BioDCV component is invoked from the MSAnalyzer workflow as a WebService (biodcv-ws-client) in the UniCZ network: data and metadata are copied in a FTP repository, then the data URL and a notification email address are transmitted to the BioDCV WebService (biodcv-ws) on a DMZ area of the ITC-irst network. This service is directly run by Apache with Mod_Python and the Zolera Soap infrastructure. The incoming data are transferred to the internal front-end server (server-cz-tn.py) within the firewalled area. The front-end launches first the feature extraction module and then a full complete validation process using the BioDCV component. The system outputs are thus formatted as graphs and tables by R and PHP scripts. The results are published by the front-end on the DMZ server, and notified back by email. The BioDCV system: EGEE BioMed VO • Commands: • grid-url-copy/lcg-cp db from local to SE • edg-job-submit BioDCV.jdl • grid-url-copy/lcg-cp db from SE to local 2-50 MB 50-400 MB scp grid-ftp scp grid-ftp grid-ftp grid-ftp Workflows, ontologies and standards for unbiased prediction in high-throughput proteomics Cannataro M*, Barla A**, Gallo A*, Paoli S**, Jurman G**, Merler S**, Veltri P*, Furlanello C**. *University Magna Graecia of Catanzaro, Italy, **ITC-irst, Trento, Italy MGED 9September 7-10, 2006 Seattle, WA, U.S.A. REFERENCES [1] M. Cannataro, P. Guzzi, T. Mazza, G. Tradigo, P. Veltri, Using ontologies for preprocessing and mining spectra data on the Grid,FGCS, 2006, In press, http://dx.doi.org/10.1016/j.future.2006.04.011 [2] M. Cannataro, P.H. Guzzi, T. Mazza, G. Tradigo, P. Veltri. Preprocessing of Mass Spectrometry Proteomics Data on the Grid. IEEE CBMS 2005: 549-554 [3] C. Furlanello, M. Serafini, S. Merler, and G. Jurman. Semi-supervised learning for molecular profiling. IEEE Transactions on Computational Biology and Bioinformatics, 2(2):110-118, 2005. [4] A.Barla, B.Irler, S.Merler, G. Jurman, S.Paoli and C. Furlanello, Proteome profiling without selection bias. IEEE CBMS 2006, 941—946 [5]R. Tibshirani, T. Hastie, B. Narasimhan, S. Soltys, G. Shi, A. Kong, and Q. Le. Sample classification from protein mass spectometry, by ”peak probability contrasts”. Bioinformatics, 20(17):3034–3044, 2004. INTRODUCTION We connect, in a complete pipeline, an ontology-based environment for proteomics spectra management with a distributed complete validation platform for predictive analysis. We leverage from two existing software platforms (MS-Analyzer and BioDCV) and from emerging proteomics standards. In the set-up, BioDCV is accessed from the MS-Analyzer workflow as a service, thus providing a complete pipeline for proteomics data analysis. Predictive classifica-tion studies on MALDI-TOF data based on this pipeline are presented. D2: mean A 9133,17 Da D2: .95 Student bootstrap CI 1 4000 D2: mean B 3000 D2: .95 Student bootstrap CI Intensity 2000 1000 Ontology-based Workflow Designer 0 9100 9120 9140 9160 9180 9200 m/z Error rate (tumour tissue) Error rate (non- tumoural tissue) No-information error rate 22: S21 (25) 23: S22 (19) 24: S23 (21) 25: S24 (23) 1 5 50 n1 1 5 50 n1 1 5 50 n1 M-WS • Data • Metadata FTP repository • DATASETS • D1. MALDI-TOF Ovarian Cancer Dataset, from (www-stat.stanford.edu/~tibs/PPC/ Rdist)[5] • 49 samples (24 diseased + 25 controls) • Each raw sample has 56384 m/z measurements (892 KB) • Each preprocessed sample has 564 m/z measurements (19 KB) • Preprocessing: • Normalization • Binning • Biomarker identification • Baseline subtraction • Peak Alignment – Clustering • 67 features identified • D2. (lab calibration sample) • MALDI-TOF, human serum, 20 technical replicates, 10 control samples, 10 with 2 proteins, 34671 measurement, 347 m/z after preprocessing, predictive discrimination with 7 peaks 1 Ontologies • Repository URL • email Ontology manager • Biomarkers data • REPORT WF Schema Abstract, Concrete WF ResourceDiscoveryServices Workflow Scheduler UDDI/MDS WF Translator Complete Validation WF Monitor • R scripts • visualizationATE, sampletracking WF Scheduler MetadataWSDL BioDCV WSfront-end Server • FEATUREEXTRACTION • Within sample • across sample BIODcv WS • PHP • biomarker lists • HTML publication • DMZ ServerApachemod_Python ZSI module SpecDB APIs WS RSR PSR PPSR raw spectra pre-processedspectra preparedspectra WEB SERVICESARCHITECTURE • ACKNOWLEDGMENTS • ITC-irst: R Flor, D Albanese, B Irler • UniCZ: G. Cuda, M. Gaspari, PH Guzzi,T Mazza • MS-ANALYZER • MS-Analyzer[1] is a platform for the integrated management and processing of proteomics spectra data. It supports the ontology based design of “in silico” proteomics studies: ontologies are used to model software tools and spectra data, while workflows are used to model applications. MS-Analyzer uses a specialized spectra database and provides a set of pre-processing services: • Interface to heterogeneous mass spectrometers formats such as MALDI-TOF, SELDI-TOF, ICAT-based LC-MS/MS. Formats are unified into mzData, in compliance with the HUPO-PSI proteomics standardization initiative. • Acquisition, storage, and management of MS data with the SpecDB database. Spectra are stored in their different stages (raw, pre-processed, prepared). Single, multiple, or portions of spectra can be queried (in-database preprocessing). • Preprocessing of MS data (smoothing, baseline subtraction, normalization, binning, peaks alignment), as well as spectra preparation for further data mining (spectra to ARFF conversion) [2]. • Sharing of experiments data, workflows and knowledge BIODCV The predictive modeling portion of the proposed system is provided by BioDCV, the ITC-irst platform for machine learning in high-throughput functional genomics. BioDCV fully supports complete validation in order to control selection bias effects. To harness the intensive data throughput, BioDCV uses E-RFE, an entropy based acceleration of the SVM-RFE feature ranking procedure [3]. For proteomics, it includes methods for baseline subtraction, spectra alignment, peak clustering and peak assignment that were adapted from existing R packages and concatenated to the complete validation system. BioDCV is also a grid application and it has been used in production within the EGEE Biomed VO [4].

More Related