130 likes | 236 Views
Supporting Heterogeneous Data Access in Genomics. Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory March 2002. Outline. Motivation Approach Specific use cases Introduction to others.
E N D
Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory March 2002
Outline • Motivation • Approach • Specific use cases • Introduction to others
Motivated by current state of the art in genomics data access. The user is required to perform all data management tasks. Source Specific Schema PDB Different users end up doing the same thing. SWISS-PROT SCoP dbEST User applications Transform Map data format similar concepts Parse Access input/ the data output
What is the ideal environment? A single location that provides effective access to a consistent view of data and tools from many sources through an intuitiveanduseful interface. :: Parse Access input/ the data output User applications Transform Map data format similar concepts
a realistic What is the ideal environment? A single location that provides effective access to a consistent view of data and tools from many sources through an intuitiveanduseful interface. :: Parse Access input/ the data output User applications Transform Map data format similar concepts
PDB Semantic Wrapper XPath Wrapper Model-Based Mediator Semantic Wrapper XPath Wrapper DF Matt Semantic Wrapper XPath Wrapper VIPAR Wrapper Medline XPath Wrapper : XPath Wrapper Metadata Registry XWrap SDM Center Data Integration Infrastructure Query Dispatch and Collection (QDaC) GUI External Tools
Blast Matt ::: ::: Provide access to many more sources than Matt currently has Unfortunately, Matt cannot query all of the relevant data sources. Use case 1: Find everything related to a sequence MILLAFSSGRRLDFVHRSGVFFFQTLLWILCATVCGTEQYFN The more sources queried, the more valuable the results
Matt ::: Use case 1: Find everything related to a sequence Blast • Additional Desired Capabilities • Handle hundreds of sequences • Search using other tools • Preprocess sequence(s) • Use results as input to other tools and queries
Gene name /accession # Clusfavor Genbank Model sequence Matt Sequence Blast against HTGS Modelbuilder Homologs Filter Sequence Accession # Transfac Sequence Subseq to 2000bp Use case 2: Identifying model sequences Hundreds of sequences MILLAFSSGRRLDFVHRSGVFFFQTLLWILCATVCGTEQYFN
Summary • Matt’s current research objectives focus on Use Case 2 • That is our current target • Details of current status in following talks • Context-sensitive Service Composition for Support of Scientific Workflows • Mladen A. Vouk • XWRAPComposer: A wrapper generation system for Integrating Bioinformatics Data Sources • Ling Liu • Constructing Workflows by Integrating Interactive Information Sources • Amarnath Gupta
LLNL Terence Critchlow (lead) Georgia Tech Calton Pu Ling Liu David Buttler Dan Rocco Henrique Paques Wei Han SDSC Bertram Ludaescher Amarnath Gupta Ilkay Altintas Agent Technology Tom Potok (ORNL) Mladen Vouk (NCSU) Target Users Matt Coleman (LLNL) Allen Christian (LLNL) Phil Bourne (PDB) People
This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48.