1 / 8

Data formats in e-Science

Data formats in e-Science. Two key requirements Interoperability and Scalability XML is flexible, but verbose Binary formats are compact, but specific e.g. VOTable vs FITS issue in the VO Two possible solutions: BinX from the edikt project at NeSC VX from the School of Informatics.

Download Presentation

Data formats in e-Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data formats in e-Science • Two key requirements • Interoperability and Scalability • XML is flexible, but verbose • Binary formats are compact, but specific • e.g. VOTable vs FITS issue in the VO • Two possible solutions: • BinX from the edikt project at NeSC • VX from the School of Informatics

  2. BinX: Binary in XML • A language: • Uses XML to describe the data types and structures in a binary data file • A library: • For manipulating XML and binary files • BinX files: • SchemaBinX: XML descriptor of binary file • DataBinX: SchemaBinX + data values • BinX will allow you to interact with a binary file as if it were XML – e.g. run XPath queries

  3. Astronomical testbed: format conversion with BinX

  4. …and back again This way is harder, due to ASCII text in FITS header.

  5. More about BinX • Download the BinX code & play with it • See www.edikt.org/binx • After BinX comes DFDL (daffodil ) • Data Format Description Language • Developing through GGF Working Group • BinX might morph into a DFDL implementation • Basic idea behind DFDL: • We can’t have a single data format, but we can have a single way of describing data formats

  6. VX: Vectorizing XML • XML seems especially verbose for data files with simple, repeating structures • e.g. VOTable – lots of <TR>s and <TD>s • Vectorize it: • Decompose the XML document into a skeleton describing the structure and vectors containing data values

  7. Example: bibliography in XML

  8. Astronomical VX application • Export the PhotoObj Table from the SDSS EDR database into VOTable • 360 columns and 10,000,000 rows • The Skeleton here is trivial • Querying the decomposed XML version can be as fast as querying the SkyServer database • For queries where SkyServer doesn’t make heavy use of indexes, which are not in VX yet More: http://homepages.inf.ed.ac.uk/v1bchoi/paper.pdf

More Related