420 likes | 678 Views
BinX – A Tool for Binary File Access. eDIKT project team Ted Wen tedwen@edikt.org Robert Carroll robert.carroll@edikt.org. Agenda. About the BinX project Introduction to the BinX language Introduction to the BinX library Example application Overview of the BinX API Discussion.
E N D
BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.org Robert Carroll robert.carroll@edikt.org
Agenda • About the BinX project • Introduction to the BinX language • Introduction to the BinX library • Example application • Overview of the BinX API • Discussion
The problem • Most scientific data are in binary files • Binary data files are not all standardized • Binary data files are platform-dependent • XML is useful to represent metadata • Scientific datasets can be too large in XML
What is BinX? • Binary inXML • Annotation language • Using XML • Descriptive • Low-level • Software components • BinX library • Generic utilities • API
0101010101010101010100010000101110101010101010101010110 0101010101 0101010101 How and Why BinX is used Special Application Program Application Program Application Program BinX Library <dataset> … … </dataset> Application Program
The BinX Language Annotating a binary data stream Mark up data types Mark up sequences Mark up arrays Complex structures
Primitive data elements Byte, character, integer, real Complex data elements Arrays, struct, union User-defined data elements Data elements
Primitive Data Types • Character • <character-8> • <string> (Fixed length, variable length and delimited) • Integer • <byte-8> • <short-16>, <unsignedShort-16> • <integer-32>, <unsignedInteger-32> • <long-64>, <unsignedLong-64> • Real • <float-32> • <double-64> • <quadruple-128>
Primitive Data Types • Mark up data types FF 7F7F FF FF FF 00 00 C8 4242 C8 00 00 1 2 3 4 • <short-16 byteOrder=“littleEndian”> 32767</short-16> • <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32> • <float-32 byteOrder=“littleEndian”>100.0</float-32> • <float-32 byteOrder=“bigEndian”>100.0</float-32>
Abstract “struct” types • Mark up a sequence Screen descriptor in GIF: Screen width: unsigned short; Screen height: unsigned short; Packed field: a byte Background colour index: byte Pixel aspect ratio: byte <struct> <unsignedShort-16 /> <unsignedShort-16 /> <byte-8 /> <byte-8 /> <byte-8 /> </struct>
Abstract “array” types • Mark up an array A 2-dimensional array containing 10-by-100,32-bit integers <arrayFixed> <integer-32 /> <dim indexTo=“99”> <dim indexTo=“9” /> </dim> </ arrayFixed >
Embedded abstract types • Complex structures <struct> <short-16 /> <arrayFixed> <byte-8 /> <dim indexTo=“7” /> </arrayFixed> <struct> <integer-32 /> <float-32 /> <double-64 /> </struct> </struct>
User-defined metadata • Label the data types and structures <struct varName=“Data Sample”> <short-16 varName=“ID” /> <arrayFixed varName=“List of 10 complex numbers”> <struct varName=“Complex”> <float-32 varName=“Real” /> <float-32 varName=“Imaginary” /> </struct> <dim indexTo=“9” /> </arrayFixed> </struct>
Reusable type definitions • Define macros for reuse <definitions> <defineTypetypeName=“FourCC”> <arrayFixed> <character-8 /> <dim count=“4” /> </arrayFixed> </defineType> </definitions> <struct varName=“Wave_Header”> <useTypetypeName=“FourCC” varName=“Keyword” /> <integer-32 varName=“Chunk_Size” /> </struct>
Linking to binary data • Reference the binary data file <definitions> <defineType typeName=“Header”>… …</defineType> <defineType typeName=“Format_Chunk”>… …</defineType> <defineType typeName=“Data_Chunk”>… …</defineType> </definitions> <datasetsrc=“myfile.wav”> <useType typeName="Header" /> <useType typeName="Format_Chunk" /> <useType typeName="Data_Chunk" /> </dataset>
The BinX document <?xml version=“1.0”?> <binx xmlns=“http://www.edikt.org/binx”> <dataset src=“binary.bin” byteOrder=“littleEndian”> <short-16/> <integer-32/> <double-64/> </dataset> </binx>
A BinX document • <binxbyteOrder=“bigEndian”> • <definitions> • <defineType typeName=“myTyp”> • <arrayFixed> • <character-8/> • <dim indexTo=“9”/> • </arrayFixed> • </defineType> • </definitions> • <datasetsrc=“myfile.bin”> • <useType typeName=“myTyp”/> • <integer-32 varName=“X” /> • </dataset> • </binx> Root element Data class section Abstract data type Data instance section
DataBinX DataBinX = BinX with Data <dataset src=“myfile.bin”> <struct> <short-16 /> <long-64 /> <double-64 /> </struct> <arrayFixed> <integer-32 /> <dim count=“2” /> </arrayFixed> </dataset> <dataset> <struct> <short-16>100</short-16> <long-64>1000</long-64> <double-64>5.257</double-64> </struct> <arrayFixed> <dim> <integer-32>1</integer-32> </dim> <dim> <integer-32>2</integer-32> </dim> </arrayFixed> </dataset>
The BinX Library Core library Utilities Applications
Output from the library • DataBinX combined data and BinX document • SchemaBinX • Binary data stream DataBinX = SchemaBinX + Binary data
BinX Components • The library has core functionality to support generic utilities and applications Applications BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX Utilities BinX Library Core Generic tools DataBinx pack/unpack Extractor Applications Domain-specific
BinX application models • Data manipulation model • Data transportation model • Data service model • Data query model • Data catalogue model
Data manipulation model • Extraction • Subset of a dataset • Combination • Merge several datasets • Transformation • Conversion of data types • Change of sequence order • Transposition of array dimensions • Transparency • Automatic change of byte order
BinX + Binary Schema BinX Data transportation model DataBinX as interlingua XSLT BinX Util ZIP tool Send Receive XML document DataBinX ZIP (MIME) XSLT BinX Util ZIP tool
Data service model • Publishing logical datasets in BinX 0101010101 Dataset from multiple data sources DB BinX 0101010101 0101010101 0101010101 0101010101 BinX BinX Grid Dataset from several binary files Dataset from one binary file Client
BinX + Binary BinX + Binary 010101010 010101010 Data query model • Create DataBinX • From Binary and BinX • Query DataBinX • Use XPath • Create New DataBinX • Results from query • Parse DataBinX • Create new Binary and BinX DataBinX XPath New DataBinX
Data catalogue model Abstract BinX 1 Primary storage Binary data files Metadata Syntactic annotation Semantic annotation Classification Domain specific Cross-reference XLink BinX 1.2 METADATA BinX 1.1 BinX 1.2.1 BinX 1.2.2 BinX 1.2.3 Detailed 0101010101 0101010101 0101010101 0101010101 BINARY
Application in Astronomy Case Study Data Conversion Between FITS and VOTable
Application in astronomy • FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … … END 01010101 <?xml version=. <VOTABLE> … … </VOTABLE>
FITS file 0 79 Primary HDU Header Data Extension Header Data
VOTable <VOTABLE> <RESOURCE> <PARAM name=“Obs” value=“Bob”/> <TABLE name=“Stars”> <FIELD name=“Star-name” datatype=“char” arraysize=“10” /> <FIELD name=“RA” datatype=“float” /> <FIELD name=“Dec” datatype=“float” /> <FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” /> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>
FITS →DataBinX →VOTable • FITS to VOTable conversion DataBinX Utility FITS XSLT transformer DataBinX Schema BinX Preprocessor XSLT VOTable
VOTable→DataBinX→FITS • VOTable to FITS conversion Schema BinX VOTable DataBinX Utility DataBinX XSLT transformer Binary Data Post processor FITS Header XSLT FITS
Support • Information and software download: • http://www.edikt.org/binx • Questions: • support@edikt.org • Requirements and suggestions: • tedwen@edikt.org • robertc@edikt.org
Parsing a BinX document BxBinxFile* pReader = new BxBinxFile(); If (pReader->parse(“mybinx.xml”)) { BxDataset* pDataset = pReader->getDataset(); }
Reading a BinX document BxArrayFixed* pArray = pDataset->getArray(0); BxArrayFixed* pArray = pDataset->getArray(“fixed”); • Get an array object BxDataset* pStruct = pArray->get(0, 0); • Get a struct from the array
Reading a BinX document BxFloat32* pReal = pStruct->getFloat(“Real”); Float real = pReal->getFloat(); • Get the data value
Creating BinX document BxBinxFileWriter* pWriter = new BxBinxFileWriter(); • Create a object to write out the document BxDataset* pData = new BxDataset(); • Create a new dataset (in memory BinX document) BxShort16* i16 = new BxShort16(100); pData->addDataObject(i16);
Creating BinX document BxBinaryFile* pbf = new BxBinaryFile(); • Create a new binary file pbf->setDatasetPointer(pData); • Create a link to the BinX document pWriter->setBinaryFilePtr(pbf); pWriter->save("TestDataset.xml"); • Save the BinX document
Merge binary data BxBinxFileReader * pFile1 = new BxBinxFileReader(“file1.xml”); BxBinxFileReader * pFile2 = new BxBinxFileReader(“file2.xml”); BxDataset * pDataset1 = pFile1->getDataset(); BxDataset * pDataset2 = pFile2->getDataset(); BxArray * pArray1 = pDataset1->getArray(0); BxArray * pArray2 = pDataset2->getArray(0); BxDataObject * pData1 = pArray1->getNext(); BxDataObject * pData2 = pArray2->getNext(); FILE * fo = fopen(“output.dat”,”wb”); pData1->toStreamBinary(fo); pData2->toStreamBinary(fo);
Summary • One BinX document can describe many binary files • Generate BinX document from code • Easy to use interfaces • Flexible