1 / 11

New (Applications of) Compiler Techniques for Data Grids

New (Applications of) Compiler Techniques for Data Grids. Gagan Agrawal . Outline. Automatic Data Virtualization SQL Implementation XML/XQuery Automatic Wrapper Generation Data Integration in Bioinformatics Compiling XML Query Language XQuery Issues with streaming data .

craig
Download Presentation

New (Applications of) Compiler Techniques for Data Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New (Applications of) Compiler Techniques for Data Grids Gagan Agrawal

  2. Outline • Automatic Data Virtualization • SQL Implementation • XML/XQuery • Automatic Wrapper Generation • Data Integration in Bioinformatics • Compiling XML Query Language XQuery • Issues with streaming data

  3. Data Virtualization An abstract view of data dataset Data Virtualization Data Service -- Scientific Data being shared on Web/Grids -- Low-level layouts -- Need for efficient storage and processing

  4. Our Approach: Automatic Data Virtualization • Automatically create data services • A new application of compiler technology • A meta-data descriptor describes the layout of data in a repository • An abstract view is exposed to the users • Two implementations: • Relational /SQL-based (HPDC 2004, LCPC 2004) • XML/XQuery based (ICS 2003, LCPC 2003)

  5. SQL/Relational Implementation SELECT < Data Elements > FROM < Dataset Name > WHERE …. AND Filter( < Data Element> );

  6. XQuery ??? XML XML/XQuery Implementation HDF5 NetCDF TEXT RMDB …

  7. Approach / Contributions • Use of XML Schemas to provide high-level abstractions on complex datasets • Using XQuery with these Schemas to specify processing • Issues in Translation • High-level to low-level code • Data-centric transformations for locality in low-level codes • Issues specific to XQuery • Recognizing recursive reductions • Type inferencing and translation

  8. Wrappers • Goal: to provide the integration system transparent access to data sources • Challenges • Development cost • Performance • Scripting languages can be slow • Updates • Data Formats can change frequently

  9. Our Approach • Machine-interpretable metadata • A layout descriptor associated with each dataset • Wrappers generated on the fly • Applied to several bioinformatics examples

  10. Layout Descriptor Dataset name Schema name DATASET “FASTAData” { DATATYPE {FASTA} DATASPACE LINESIZE=80 { LOOP ENTRY 1:EOF:1 { “>” ID “ “ DESCRIPTION < “\n” SEQ > “\n” | EOF } } DATA {osu/fasta} } ID DESCRIPTION >Example1 envelope protein ELRLRYCAPAGFALLKCNDA DYDGFKTNCSNVSVVHCTNL MNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKH >Example2 synthetic peptide HITREPLKHIPKERYRGTNDT… SEQ SEQ File layout SEQ SEQ File location

  11. XQuery on Streaming Data • Infinite data streams • All processing must be single pass • Interesting Compiler Questions: • How do I transform a code to execute on a single pass • How to tell that it can be executed correctly with a single pass • Addressed this problem for XML Streams and XML query language XQuery • Appears in VLDB 2005

More Related