170 likes | 414 Views
Data Format Description Language (DFDL) WG. Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org. Agenda. Introduction and welcome - Martin Westhead 10mins Binary Format Description Language (BFD) - Alan Chappell 10mins
E N D
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org
Agenda • Introduction and welcome - Martin Westhead 10mins • Binary Format Description Language (BFD) - Alan Chappell 10mins • Binary XML (BinX) - Stephen Rutherford 10mins • DFDL - Martin Westhead 15mins • Big picture • Structural Description Language • Charter (20 mins Discussion) • Examples repository - Alan Chappell 10mins • Bruce Barkstrom Examples at NASA (15mins Discussion)
Motivation • There will never be a standard data format • E.g. XML – verbose, tree-based, explicit structure • Legacy formats • Application specific formats • One size will never fit all • But could we provide a language for describing formats • Transparency of physical representation • Automatic format conversion • Unambiguous description of data
There’s more… Explicit structure enables: • Standard transformation to/from XML representation • Could allow application to read/write XML • But provide underlying efficient binary representation • Data stream/file becomes database • Point to parts of the structure • Extract parts of the structure • Modify parts of the structure • Integrate parts of different structures
And more… • Generic tools possible • Browsing • Conversion and transformation • Annotation of data • E.g. identify bits that depict hurricane in an image • Enables general semantic labels, many ontologies could be developed e.g.: • S.I. units, SQL types, Time • Community specific labels, “starClass = whiteDwarf” • Application specific labels, “nodeColour = green” • Could lead to a standard transformation language
Not fairy tales • Based on implemented work • BinX http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/ • BFD part of the Scientific Annotation Middleware project (http://www.scidac.org/SAM/) • Generalized and extended a little • Formal semantics • Foundation for extensibility
Approach • Separate out structure and semantics • General structural language • Repetition • Pointers • References to data • New structures can be built (compositionality) • Semantics • Hard to express so…we don’t • General labeling • Label semantics define elsewhere (ontologies) • Labels can be added (extensibility)
Structural language • Formal semantics • Structured binary sequence • Defines hierarchical structure over underlying sequence of binary values • Language for describing hierarchical structure • Repetition • Explicit number repeats • Termination characters • Data reference • Conditionals • Data size • Pointers • Scope • As general as possible but • Must be concise and implementable • Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)
CSV file example char:=byte data:=[(char - [',']).*] field:=[data; [',']] finalField:=[data; [‘\n’]] row:=[field.*] :: [finalField] table:=[row.*]
Semantic labels • Many ontologies possible • Initial scope probably: • Basic types (floating point, integer, character) • Simple structures (structs, arrays, tables) • Obvious extensions: • SQL types • XML Schema types • Key WG goal: • Define form and requirements of new ontologies
What is an Ontology? • XML Schema for new types • Structural description of new types • Definition of core API behaviour on new type • API extensions • Relationships to other types
WG goals • Formal language for DFDL data structure • Standard representation of this language in XML • Requirements for DFDL ontology • Basic types ontology • Basic structures ontology
Currently under discussion • Abstraction from the underlying binary • Compression, encoding, encryption • Physical vs. conceptual binary sequence • Abstraction of description • complex:=[foo; foo] • Instantiate “foo:= float” or “foo:= double” at use time • Filtering of results • Getting to data model and leave format behind • CSV -> [[value; value; value]; [value; value; value]]
DFDL in the VO • Generic tools • Metadata possibilities • Ontologies can define relationships between types • E.g. polar to Cartesian • Standard classes over data objects
Getting involved • Webpages: http://www.epcc.ed.ac.uk/dfdl • Mailing list (dfdl@gridforum.org) • My address: M.Westhead@epcc.ed.ac.uk