1 / 17

Data Format Description Language (DFDL) WG

Data Format Description Language (DFDL) WG. Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org. Agenda. Introduction and welcome - Martin Westhead 10mins Binary Format Description Language (BFD) - Alan Chappell 10mins

tertius
Download Presentation

Data Format Description Language (DFDL) WG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

  2. Agenda • Introduction and welcome - Martin Westhead 10mins • Binary Format Description Language (BFD) - Alan Chappell 10mins • Binary XML (BinX) - Stephen Rutherford 10mins • DFDL - Martin Westhead 15mins • Big picture • Structural Description Language • Charter (20 mins Discussion) • Examples repository - Alan Chappell 10mins • Bruce Barkstrom Examples at NASA (15mins Discussion)

  3. Motivation • There will never be a standard data format • E.g. XML – verbose, tree-based, explicit structure • Legacy formats • Application specific formats • One size will never fit all • But could we provide a language for describing formats • Transparency of physical representation • Automatic format conversion • Unambiguous description of data

  4. There’s more… Explicit structure enables: • Standard transformation to/from XML representation • Could allow application to read/write XML • But provide underlying efficient binary representation • Data stream/file becomes database • Point to parts of the structure • Extract parts of the structure • Modify parts of the structure • Integrate parts of different structures

  5. And more… • Generic tools possible • Browsing • Conversion and transformation • Annotation of data • E.g. identify bits that depict hurricane in an image • Enables general semantic labels, many ontologies could be developed e.g.: • S.I. units, SQL types, Time • Community specific labels, “starClass = whiteDwarf” • Application specific labels, “nodeColour = green” • Could lead to a standard transformation language

  6. Not fairy tales • Based on implemented work • BinX http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/ • BFD part of the Scientific Annotation Middleware project (http://www.scidac.org/SAM/) • Generalized and extended a little • Formal semantics • Foundation for extensibility

  7. Approach • Separate out structure and semantics • General structural language • Repetition • Pointers • References to data • New structures can be built (compositionality) • Semantics • Hard to express so…we don’t • General labeling • Label semantics define elsewhere (ontologies) • Labels can be added (extensibility)

  8. Structure – arbitrary labels

  9. Structure – example labels

  10. Structural language • Formal semantics • Structured binary sequence • Defines hierarchical structure over underlying sequence of binary values • Language for describing hierarchical structure • Repetition • Explicit number repeats • Termination characters • Data reference • Conditionals • Data size • Pointers • Scope • As general as possible but • Must be concise and implementable • Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)

  11. CSV file example char:=byte data:=[(char - [',']).*] field:=[data; [',']] finalField:=[data; [‘\n’]] row:=[field.*] :: [finalField] table:=[row.*]

  12. Semantic labels • Many ontologies possible • Initial scope probably: • Basic types (floating point, integer, character) • Simple structures (structs, arrays, tables) • Obvious extensions: • SQL types • XML Schema types • Key WG goal: • Define form and requirements of new ontologies

  13. What is an Ontology? • XML Schema for new types • Structural description of new types • Definition of core API behaviour on new type • API extensions • Relationships to other types

  14. WG goals • Formal language for DFDL data structure • Standard representation of this language in XML • Requirements for DFDL ontology • Basic types ontology • Basic structures ontology

  15. Currently under discussion • Abstraction from the underlying binary • Compression, encoding, encryption • Physical vs. conceptual binary sequence • Abstraction of description • complex:=[foo; foo] • Instantiate “foo:= float” or “foo:= double” at use time • Filtering of results • Getting to data model and leave format behind • CSV -> [[value; value; value]; [value; value; value]]

  16. DFDL in the VO • Generic tools • Metadata possibilities • Ontologies can define relationships between types • E.g. polar to Cartesian • Standard classes over data objects

  17. Getting involved • Webpages: http://www.epcc.ed.ac.uk/dfdl • Mailing list (dfdl@gridforum.org) • My address: M.Westhead@epcc.ed.ac.uk

More Related