1 / 15

Supporting Heterogeneous Data Access in Genomics

This paper presents a system architecture for supporting heterogeneous data access in genomics research, focusing on the motivation, implementation status, and key components of the system. The goal is to provide an intuitive and effective interface for accessing and managing data from multiple sources in genomics research.

wilmere
Download Presentation

Supporting Heterogeneous Data Access in Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta Mladen Vouk NCSU Tom Potok ORNL Matt Coleman LLNL September 2002 UCRL-PRES-???????

  2. Outline • Motivation • System architecture • Status

  3. Motivated by current state of the art in genomics data access. The user is required to perform all data management tasks. Source Specific Schema PDB Different users end up doing the same thing. SWISS-PROT SCoP dbEST User applications Transform Map data format similar concepts Parse Access input/ the data output

  4. What is a realistic environment? A single location that provides effective access to of data and tools from many sources through an intuitiveanduseful interface. :: Parse Access input/ the data output User applications Transform Map data format similar concepts

  5. Gene name /accession # Clusfavor Genbank Model sequence Matt Sequence Blast against HTGS Modelbuilder Homologs Filter Sequence Accession # Transfac Sequence Subseq to 2000bp Motivating use case: Identifying model sequences Hundreds of sequences MILLAFSSGRRLDFVHRSGVFFFQTLLWILCATVCGTEQYFN

  6. SDM Center Data Integration Infrastructure Data Source Data Source Data Source Data Source DB Program Data Source Interface Data Source Data Source User (Matt) Data Sources

  7. SDM Center Data Integration Infrastructure Data Integration Agent(s) Data Source Data Source Data Source Data Source Service registry and brokering DB Program Data Source Interface Workflow Agent Data Source Communication Protocol Gateway Data Source User (Matt) Data Sources

  8. SDM Center Data Integration Infrastructure Data Integration Agent(s) Data Source Data Source Data Source Data Source Service registry and brokering DB Program Database Access Program Interfacing Data Source Interface Workflow Agent Other I/O Agents Data Source Communication Protocol Gateway Data Source Other Agents (e.g., VIPAR) User (Matt) Data Sources

  9. SDM Center Data Integration Infrastructure SDM Center Data Integration Infrastructure Data Integration Agent(s) Data Source Data Source Data Source Data Source Service registry and brokering DB Program Database Access Program Interfacing Data Source Interface Workflow Agent XML Wrapper Other I/O Agents Data Source XML Wrapper Wrapper based Agent Communication Protocol Gateway Data Source XML Wrapper Wrapper based Agent Wrapper based Agent XWRAP GUI Code Generator Other Agents (e.g., VIPAR) Extraction Rules User (Matt) Data Sources Human Knowledge

  10. SDM Center Data Integration Infrastructure Executable Workflow Plan: “Matt’s WF” Data Integration Agent(s) Data Source Data Source XML Wrapper Data Source XML Wrapper Data Source XML Wrapper Service registry and brokering XML Wrapper Data Mediation DB Program Database Access Program Interfacing Data Source Interface Workflow Agent XML Wrapper Other I/O Agents Data Source XML Wrapper Wrapper based Agent Communication Protocol Gateway Data Source XML Wrapper Wrapper based Agent Wrapper based Agent XWRAP GUI Code Generator Other Agents (e.g., VIPAR) Extraction Rules User (Matt) Data Sources Human Knowledge

  11. SDM Center Data Integration Infrastructure User Agent Parameterized Workflow Specification (PWS) WF infeasible: report reason User constraints & parameters Source Capabilities (SC) Binding Patterns Workflow Resolution Service (WRS) DB WF feasible Domain Map/Ontology Workflow Instantiation Service (WIS) Data Registration Services Registration Executable Workflow Plan: “Matt’s WF” Data Integration Agent(s) Data Source Data Source XML Wrapper Data Source XML Wrapper Data Source XML Wrapper Service registry and brokering XML Wrapper Data Mediation DB Program Database Access Program Interfacing Data Source Interface Workflow Agent XML Wrapper Other I/O Agents Data Source XML Wrapper Wrapper based Agent Communication Protocol Gateway Data Source XML Wrapper Wrapper based Agent Wrapper based Agent XWRAP GUI Code Generator Other Agents (e.g., VIPAR) Extraction Rules User (Matt) Data Sources Human Knowledge

  12. Status • Focus has been on developing a prototype of Matt’s workflow • Demonstrate basic infrastructure functionality • Provide a useful tool for Matt to use in his research efforts • Flushed out the details of architecture • Interconnections between components better defined • We have a prototype of that system in place • Wrappers generated from XWrap by GT • Combined into coherent workflow by SDSC • Workflow based interface completed by NCSU • The following presentations will go into more details about what has been accomplished and what our current tasks are

  13. Questions?

  14. LLNL Terence Critchlow (lead) Georgia Tech Calton Pu Ling Liu David Buttler Dan Rocco Henrique Paques Wei Han Target Users Matt Coleman (LLNL) Allen Christian (LLNL) Phil Bourne (PDB) SDSC Bertram Ludaescher Amarnath Gupta Ilkay Altintas Agent Technology Tom Potok (ORNL) Joel Reid (ORNL) Mladen Vouk (NCSU) Munindar Singh (NCSU) Sandeep Chandra (NCSU) Zhengang Cheng (NCSU) Sangeeta Bhagwanani (NCSU) People

  15. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48.

More Related