150 likes | 166 Views
This paper presents a system architecture for supporting heterogeneous data access in genomics research, focusing on the motivation, implementation status, and key components of the system. The goal is to provide an intuitive and effective interface for accessing and managing data from multiple sources in genomics research.
E N D
Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta Mladen Vouk NCSU Tom Potok ORNL Matt Coleman LLNL September 2002 UCRL-PRES-???????
Outline • Motivation • System architecture • Status
Motivated by current state of the art in genomics data access. The user is required to perform all data management tasks. Source Specific Schema PDB Different users end up doing the same thing. SWISS-PROT SCoP dbEST User applications Transform Map data format similar concepts Parse Access input/ the data output
What is a realistic environment? A single location that provides effective access to of data and tools from many sources through an intuitiveanduseful interface. :: Parse Access input/ the data output User applications Transform Map data format similar concepts
Gene name /accession # Clusfavor Genbank Model sequence Matt Sequence Blast against HTGS Modelbuilder Homologs Filter Sequence Accession # Transfac Sequence Subseq to 2000bp Motivating use case: Identifying model sequences Hundreds of sequences MILLAFSSGRRLDFVHRSGVFFFQTLLWILCATVCGTEQYFN
SDM Center Data Integration Infrastructure Data Source Data Source Data Source Data Source DB Program Data Source Interface Data Source Data Source User (Matt) Data Sources
SDM Center Data Integration Infrastructure Data Integration Agent(s) Data Source Data Source Data Source Data Source Service registry and brokering DB Program Data Source Interface Workflow Agent Data Source Communication Protocol Gateway Data Source User (Matt) Data Sources
SDM Center Data Integration Infrastructure Data Integration Agent(s) Data Source Data Source Data Source Data Source Service registry and brokering DB Program Database Access Program Interfacing Data Source Interface Workflow Agent Other I/O Agents Data Source Communication Protocol Gateway Data Source Other Agents (e.g., VIPAR) User (Matt) Data Sources
SDM Center Data Integration Infrastructure SDM Center Data Integration Infrastructure Data Integration Agent(s) Data Source Data Source Data Source Data Source Service registry and brokering DB Program Database Access Program Interfacing Data Source Interface Workflow Agent XML Wrapper Other I/O Agents Data Source XML Wrapper Wrapper based Agent Communication Protocol Gateway Data Source XML Wrapper Wrapper based Agent Wrapper based Agent XWRAP GUI Code Generator Other Agents (e.g., VIPAR) Extraction Rules User (Matt) Data Sources Human Knowledge
SDM Center Data Integration Infrastructure Executable Workflow Plan: “Matt’s WF” Data Integration Agent(s) Data Source Data Source XML Wrapper Data Source XML Wrapper Data Source XML Wrapper Service registry and brokering XML Wrapper Data Mediation DB Program Database Access Program Interfacing Data Source Interface Workflow Agent XML Wrapper Other I/O Agents Data Source XML Wrapper Wrapper based Agent Communication Protocol Gateway Data Source XML Wrapper Wrapper based Agent Wrapper based Agent XWRAP GUI Code Generator Other Agents (e.g., VIPAR) Extraction Rules User (Matt) Data Sources Human Knowledge
SDM Center Data Integration Infrastructure User Agent Parameterized Workflow Specification (PWS) WF infeasible: report reason User constraints & parameters Source Capabilities (SC) Binding Patterns Workflow Resolution Service (WRS) DB WF feasible Domain Map/Ontology Workflow Instantiation Service (WIS) Data Registration Services Registration Executable Workflow Plan: “Matt’s WF” Data Integration Agent(s) Data Source Data Source XML Wrapper Data Source XML Wrapper Data Source XML Wrapper Service registry and brokering XML Wrapper Data Mediation DB Program Database Access Program Interfacing Data Source Interface Workflow Agent XML Wrapper Other I/O Agents Data Source XML Wrapper Wrapper based Agent Communication Protocol Gateway Data Source XML Wrapper Wrapper based Agent Wrapper based Agent XWRAP GUI Code Generator Other Agents (e.g., VIPAR) Extraction Rules User (Matt) Data Sources Human Knowledge
Status • Focus has been on developing a prototype of Matt’s workflow • Demonstrate basic infrastructure functionality • Provide a useful tool for Matt to use in his research efforts • Flushed out the details of architecture • Interconnections between components better defined • We have a prototype of that system in place • Wrappers generated from XWrap by GT • Combined into coherent workflow by SDSC • Workflow based interface completed by NCSU • The following presentations will go into more details about what has been accomplished and what our current tasks are
LLNL Terence Critchlow (lead) Georgia Tech Calton Pu Ling Liu David Buttler Dan Rocco Henrique Paques Wei Han Target Users Matt Coleman (LLNL) Allen Christian (LLNL) Phil Bourne (PDB) SDSC Bertram Ludaescher Amarnath Gupta Ilkay Altintas Agent Technology Tom Potok (ORNL) Joel Reid (ORNL) Mladen Vouk (NCSU) Munindar Singh (NCSU) Sandeep Chandra (NCSU) Zhengang Cheng (NCSU) Sangeeta Bhagwanani (NCSU) People
This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48.