110 likes | 220 Views
Data Format Description Language (DFDL) WG. Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk. Overview. Background Motivation Approach Current status. Motivation. There will never be a standard data format E.g. XML – verbose, tree-based, explicit structure
E N D
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk
Overview • Background • Motivation • Approach • Current status
Motivation • There will never be a standard data format • E.g. XML – verbose, tree-based, explicit structure • Legacy formats • Application specific formats • One size will never fit all • But could we provide a language for describing formats • Transparency of physical representation • Automatic format conversion • Unambiguous description of data
There’s more… Explicit structure enables: • Standard transformation to/from XML representation • Could allow application to read/write XML • But provide underlying efficient binary representation • Data stream/file becomes database • Point to parts of the structure • Extract parts of the structure • Modify parts of the structure • Integrate parts of different structures
And more… • Generic tools possible • Browsing • Conversion and transformation • Annotation of data • E.g. identify bits that depict hurricane in an image • Enables general semantic labels, many ontologies could be developed e.g.: • S.I. units, SQL types, Time • Community specific labels, “starClass = whiteDwarf” • Application specific labels, “nodeColour = green” • Could lead to a standard transformation language
Not fairy tales • Based on implemented work • BinX http://www.edikt.org/binx/ • BFD part of the Scientific Annotation Middleware project (http://www.scidac.org/SAM/) • ESML http://esml.itsc.uah.edu/ • Generalized and extended a little • Clear semantics • Foundation for extensibility
Layers Fortran C/C++ Java API • Data Model • Structure • Primitives Data Model Transformations Binary file Text file Data stream
Approach • Data model • XML infoset • Obvious way to describe it: XSD • API • DOM/SAX • Extended to provide non-string value access • Transformations • Ontology of predefined transformations (extensible) • XML language for: • Composition • Attaching to file contents • Populating the model
Or to put it another way… • XSD defines models for XML documents • DFDL extends XSD to define models for data in different formats • Efficient read/write access to binary and text data sources using DOM/SAX
Current status • WG status • Formed 1 year ago • 6 months on a false start • First draft expected GGF11 • Key discussion: • Mapping/transformation language • Linking mechanisms • XML representation • Flexibility
Getting involved • Webpages: http://forge.gridforum.org/projects/dfdl-wg/ • Mailing list (dfdl-wg@gridforum.org) • My address: M.Westhead@epcc.ed.ac.uk