1 / 23

Data Access & Integration in the ISPIDER Proteomics Grid

Explore the ISPIDER project focusing on accessing and integrating proteomics resources, challenges faced, middleware solutions like OGSA-DAI and AutoMed, data integration approaches, proteomics repositories, and global schema. Understand the system architecture, query processing, and future work in ISPIDER. Discover the benefits of an integrated platform, overcoming challenges of data evolution, and the importance of schema evolution support. Engage with various tools like PEDRo, gpmDB, PepSeeker, and PRIDE in the realm of proteomics research.

sriley
Download Presentation

Data Access & Integration in the ISPIDER Proteomics Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard, S. M. Embury, N. W. Paton

  2. Overview • The ISPIDER project • Data Access & Integration of Proteomics Resources • Challenges • Middleware • Proteomics resources & global schema • System architecture & query processing • Future Work

  3. ISPIDER • Project Goals: • Build an integrated platform of proteomic resources • Use existing resources – produce new ones • Create clients for querying, visualisation, etc.

  4. ISPIDER • Objective: develop an integrated platform of proteome-related resources, using existing standards • Benefits: • Access to increased breadth of information • More reliable analyses • Integration brings added value

  5. Challenges • Proteomics repositories in disparate locations need for distributed solution: • common access, distributed query processing need for integration: • overlapping data, different representations • Data/schemas constantly updated/evolve  need virtual or hybrid integration  need schema evolution support

  6. Middleware (1/2) • OGSA-DAI: middleware exposing data sources on Grids via web services • open-source and extensible • uniform access to relational & XML data sources • supports a variety of operations, e.g. querying/updating, data transformation, data delivery • OGSA-DQP: service-based distributed query processor • supports querying of relational OGSA-DAI data sources • offers implicit parallelism for data-intensive requests

  7. Middleware (2/2) • AutoMed: heterogeneous data transformation and integration system • subsumes traditional data integration approaches • handles various data models – easily extensible • virtual/materialised/hybrid integration • schema evolution • data warehousing tools

  8. Data Integration Approaches • Global-As-View (GAV) approach: describe GS constructs with view definitions over LSi constructs • Local-As-View (LAV) approach: describe LSi constructs with view definitions over GS constructs

  9. Both-As-View (BAV) Approach • Schema transformation approach • For each pair (LSi,GS): incrementally modify LSi/GS to match GS/LSi

  10. BAV Example • Transformation pathway consists of primitive transformations • Pathway contains both GAV & LAV definitions • Transformations are automatically reversible • Metadata in AutoMed Repository

  11. Proteomics Resources • PEDRo • collection of descriptions of experimental data sets in proteomics • has been used as a format for exchanging proteomics data • gpmDB • contains a large number of proteins and peptide identifications • initially designed to assist in the validation of peptide MS/MS spectra and protein coverage patterns • PepSeeker • developed as part of the ISPIDER project • comprehensive resource of peptide/protein identifications • PRIDE • centralised, standards compliant, public proteomics repository • contains protein/peptide identifications + evidence supporting them

  12. Global Schema • Trade-off between: • being able to answer specific user queries • a full integration • Properties: • Based on PEDRo’s peptide/ protein identification section and … • expanded with information unique in other resources • Entities identified by LSIDs

  13. System Architecture • Sources wrapped with OGSA-DAI • AutoMed toolkit wraps OGSA-DAI resources • Integration of OGSA-DAI resources • Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

  14. System Architecture • Sources wrapped with OGSA-DAI • AutoMed toolkit wraps OGSA-DAI resources • Integration of OGSA-DAI resources • Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

  15. System Architecture • Sources wrapped with OGSA-DAI • AutoMed toolkit wraps OGSA-DAI resources • Integration of OGSA-DAI resources • Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

  16. System Architecture • Sources wrapped with OGSA-DAI • AutoMed toolkit wraps OGSA-DAI resources • Integration of OGSA-DAI resources • Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

  17. System Architecture • Sources wrapped with OGSA-DAI • AutoMed toolkit wraps OGSA-DAI resources • Integration of OGSA-DAI resources • Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

  18. Query Processing • Query is submitted to AutoMed’s GQP: • Reformulated • Optimised • AutoMed-DQP Wrapper: • IQL  OQL • OGSA-DQP evaluates OQL queries • OQL result  IQL result

  19. Query Processing • Query is submitted to AutoMed’s GQP: • Reformulated • Optimised • AutoMed-DQP Wrapper: • IQL  OQL • OGSA-DQP evaluates OQL queries • OQL result  IQL result

  20. Summary • Proteomics repositories in disparate locations need for distributed solution need for integration • Data/schemas constantly updated/evolve  need virtual or hybrid integration  support schema evolution

  21. Future Work • Schema evolution • Evaluation of AutoMed advantage • Expose AutoMed functionality to the Grid • AutoMed and Taverna integration

  22. Future Work • Taverna: tool for Web Service orchestration in workflows • Related services may be incompatible • Current solution involves writing custom code for every pair of WS • Use AutoMed toolkit for semi-automatic integration of XML Web Services • mappings from WS to ontologies • automatic integration

  23. Birkbeck College Nigel Martin Alex Poulovassilis Lucas Zamboulis (R.A.) Hao Fan (former R.A.) European Bioinformatics Institute Rolf Apweiler Henning Hermjakob Weimin Zhu Chris Taylor Phil Jones Nisha Vinod University of Manchester Simon Hubbard Steve Oliver Suzanne Embury Norman Paton Carol Goble Robert Stevens Khalid Belhajjame (R.A.) Jennifer Siepen (R.A.) U.C.L. David Jones Christine Orengo Melissa Pentony (R.A.) ISPIDER Project Members

More Related