1 / 13

ILDG File Format

ILDG File Format. Chip Watson, for Middleware & MetaData Working Groups. Outline. The (Real) Requirements Soft Requirements Issues Options Status Proposal. The Real File Format Requirements. Must be able to share configuration files Find and retrieve the files

edna
Download Presentation

ILDG File Format

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ILDG File Format Chip Watson, for Middleware & MetaData Working Groups

  2. Outline • The (Real) Requirements • Soft Requirements • Issues • Options • Status • Proposal ILDG 5 Workshop, Chip Watson

  3. The Real File Format Requirements • Must be able to share configuration files • Find and retrieve the files • Addressed by meta data catalog, middleware components • Consume (use) foreign files • Potential implications on how to produce files & meta data • Must have a (recommended) way to keep correspondence between binary data in files and the full meta data in the MDC • Must not keep mutable (changeable) meta data within the binary files • Otherwise maintenance is too painful ILDG 5 Workshop, Chip Watson

  4. Soft Requirements Making foreign files useable: format should… • Adapt easily to variability in binary data type • single / double precision • byte ordering (consensus seems to be big endian) • 3x3 or 3x2 (consensus seems to be 3x3) • Support data integrity checks • CRC, plaquette • Allow additional (collaboration specific) data to be included • Make it easy to skip over uninteresting pieces ILDG 5 Workshop, Chip Watson

  5. Issues • How to incorporate legacy data? • Convert & re-store? • Provide conversion utility (convert at use)? • How to include collaboration specific preferences or standards? • Certainly want to avoid double storing data (collaboration specific format, and ILDG format) • Simplicity vs flexibility… • Flexibility (to address everyone’s desires) comes at a price; can the price be kept low enough? ILDG 5 Workshop, Chip Watson

  6. General Approaches • Virtual shared format (different formats, common way to read, hide actual storage format) • binX as universal reader • Collaborations provide binX description OR • C code as reader • Collaborations provide C code • Need to develop a common calling convention (API) • Physical shared format • Data retrieved within ILDG is in this format • May require double storage, or conversion on the fly OR • Translation tools are provided by each group ILDG 5 Workshop, Chip Watson

  7. Option 1: Binary-only Files Implications: • Meta data exists only in the MDC • Users must keep the correspondence between the file copy and the meta data • File naming conventions (Global File Name, GFN) OR • Local database to track correspondence file : GFN ILDG 5 Workshop, Chip Watson

  8. Option 2: NERSC style Meaning: • ASCII header containing essential meta data Implications: • Develop new standard for header • Can include GFN, to allow retrieval of other meta data from MDC ILDG 5 Workshop, Chip Watson

  9. Option 3: Structured File Format • Goal: encapsulate, in an extensible way, binary data and meta data within a single file • Good Candidate: LIME / SciDAC-derived format • SciDAC software committee considered several possibilities for encapsulation (including tar, cpio) • DIME (Microsoft Direct Internet Message Encapsulation) similar in approach to MIME, used for e-Mail attachments, was considered a good fit http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnservice/html/service01152002.asp • LIME == LQCD modification of DIME to be a bit simpler, and support 64 bit sizes for records • Software implementation (library) exists ILDG 5 Workshop, Chip Watson

  10. Option 3 (cont): LIME Details: • File has multiple messages, messages have multiple records • Record format: • 32 bits: 3 flags, id-length (13), type-format (3) type-length (13) • record id (variable length, round up to 4 byte multiple) • record type (variable length, round up) • data length (64 bit – DIME was 32) • payload (round up) • SciDAC Records contain either XML meta data (string), or binary • Possible records: • ILDG meta data (XML) • binX descriptor for binary layout • Collaboration specific extensions • Binary data (stored using NERSC conventions) • ILDG meta data record options: • Existing configuration schema (subset, non-mutable) • OR, new, simpler (flat) schema ILDG 5 Workshop, Chip Watson

  11. ILDG record idea (minimalist, from Carlton): <?xml version="1.0" encoding="UTF-8"?> <ildgFormat> <version> 1.0 </version> <endian> big </endian> <precision> 32 </precision> <lx> 20 </lx> <ly> 20 </ly> <lz> 20</lz> <lt> 64 </lt> </ildgFormat> This is a bit more verbose than the NERSC ASCII header, but is completely extensible (add new fields without breaking old applications), and the string can be parsed by standard XML libraries (which are already planned to be used for ILDG meta data). ILDG 5 Workshop, Chip Watson

  12. Current Status • ILDG board mandated a solution to file formats to be found prior to this workshop (missed goal) • There is a wide range of opinions on best path forward (XML, NERSC format, pure binary) • There may be a current movement towards accepting XML and LIME ILDG 5 Workshop, Chip Watson

  13. Proposal • Current ad-hoc committee to work out implications of adopting LIME (January 2005) • Standardize ILDG record XML schema • Produce doc, simple test codes to show usage • Compare to virtual file format, and to pure binary and NERSC-like (pro’s and con’s) and select a path forward (Jan 2005) • Refine selected approach, reaching version 1.0 by the end of February 2005 • Documentation of schema, code • C library (if appropriate), test codes available for download ILDG 5 Workshop, Chip Watson

More Related