1 / 22

Metadata Working Group Report

Metadata Working Group Report. Members (fixed in mid-January) G.Andronico INFN,Italy P.Coddington Adelaide,Australia R.Edwards Jlab,USA C.Maynard Edinburgh,UK D.Pleiter DESY,Germany J.Simone FNAL,USA T.Yoshie Tsukuba,Japan B.Joo (observer) Edinburgh,UK

africa
Download Presentation

Metadata Working Group Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Working Group ReportMetadata Working Group Report • Members (fixed in mid-January) G.Andronico INFN,Italy P.Coddington Adelaide,Australia R.Edwards Jlab,USA C.Maynard Edinburgh,UK D.Pleiter DESY,Germany J.Simone FNAL,USA T.Yoshie Tsukuba,Japan B.Joo (observer) Edinburgh,UK • Mailing List qcdml@rccp.tsukuba.ac.jp • About 80 mails circulated • QCDML (QCD Markup Language) for ILDG

  2. 0. Introduction • QCDML: Strategy and Standard Configuration Format (T.Yoshie) • QCDML: Physics (C.Maynard) • QCDML: Machine and Management (D.Pleiter) • My proposal for QCDML not be used in my talk may be useful for discussions

  3. Strategy • QCDML: XML schema for ILDG • write a QCDML document for each configuration • store QCDML documents in (a) database(s) • search/retrieve configurations design QCDML so that developing applications is easy • QCDML defines a minimal set of XML tags • necessary for exchanging configurations • tags which will be searched • researchers are usually interested in • required: physics parameters (beta,mq) • not included: random number seed

  4. Strategy (cont.) • Each collaboration can extend QCDML and use it for own purposes • Every collaborations are asked to provide values of all relevant QCDML tags

  5. Category of QCDML Standard configuration format (SCF) • Physics and parameters • Algorithm and status • Code • Machine • Management • Miscellaneous • finalized • 4,5: almost finalized • 1:discussions on-going (different opinions)

  6. SCF: Strategy • Standard Format is an abstract (reference) format for exchanging configurations • collaborations submitting configurations to ILDG do not have to convert archived files • some groups have already archived a lot of configurations with an original format • each format is chosen for convenience • Conversions will be done at a user side • two methods to convert format of configurations • given format to the standard one via C-library • one format to another using BinX technology (without referring to the standard format)

  7. SCF: Format • Definition of Gauge configuration • i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t) • employ NERSC (Gauge Connection) format • a sequence of 8-byte double precision real numbers • coded in 32-bit IEEE numerical format • endian is not specified

  8. Row-Column Complex*16 Column-Row SCF: Format (cont.) • In C-program, • last index runs faster, index runs from 0 • re =0 (real part) re=1 (imaginary part) • Store first two rows (2x3) of 3x3 link matrix • U11,U12,U13,U21,U22,U23 • mu=1,2,3,4 • x=0,1,2,...NX-1 y=0,1,2,...NY-1 z, t double

  9. SCF: C-library • Each collaboration submitting configurations to ILDG prepares a C-library to read their configurations in the standard format • pointer to the C-library is stored in QCDML document • read a hyper-cubic region • (ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1) of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice void ILDG_read_conf(file, NX, ix0,ix1, NY, iy0,iy1, NZ, iz0,iz1, NT, it0,it1, endian,config)

  10. SCF: C-library (cont.) main() { int NX=8,NY=8,NZ=8,NT=16 ; int endian=1 ; /* big endian, =0 for little endian */ double U[8][4][4][4][4][2][3][2] ; ILDG_read_conf("test-file", NX,0,3, NY,4,7, NZ,4,7, NT,0,15, endian,U) ; } the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice (0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format and stored in U[8][4][4][4][4][2][3][2].

  11. SCF: C-library (cont.) • in general, the conversion program requires huge memory of 1-2 configuration size: --- memory bottleneckcannot be avoided • We propose the above interface: • Simple • mainly for full QCD configurations 32^3 x Nt lattice for forthcoming several years can be handled by a high-end PC with memory of 2GB • some extension might be necessary in future

  12. SCF: BinX • BinX • an XML schema to describe format of binary file developed by the edikt project (a part of OGSA-DIA) http://www.edikt.org/ • software to convert one binary format to the other will be available in May, 2003 • enables us to convert configuration without referring to the standard format • Each collaboration submitting configurations to the ILDG describes its own format by BinX • User may write his/her favorite format in BinX

  13. SCF: BinX (Cont.) <dataset> <definitions> <typeDef typeName="complexDouble"> <struct> <ieeeDouble-32 varName="Real"/> <ieeeDouble-32 varName="Imaginary"/> </struct> </typeDef> <typeDef typeName="matrix2x3"> <arrayFixed> <defType typeName="complexDouble"/> <dim name="row" indexFrom="0" indexTo="1"/> <dim name="column" indexFrom="0" indexTo="2"/> </arrayFixed> </typeDef> </definitions>

  14. SCF: BinX (Cont.) <file src="sample.configuration" byteOrder="bigEndian"> <arrayFixed varName="StandardGaugeConfig"> <defType typeName="matrix2x3"/> <dim name="t" indexFrom="0" indexTo="31"/> <dim name="z" indexFrom="0" indexTo="15"/> <dim name="y" indexFrom="0" indexTo="15"/> <dim name="x" indexFrom="0" indexTo="15"/> <dim name="mu" indexFrom="0" indexTo="3"/> </arrayFixed> </file> </dataset> • Mechanism for describing an array split across several files

  15. Distribution • SCF defines format of only binary configuration • no parameters (size,coupling..) • no management info (checksums, collaboration name..) • all of them are described in a QCDML document • Keeping identification of configuration • encapsulate the configuration and the QCDML document into one file • distribute it via ILDG • (need opinions and help from the middleware working group)

  16. Distribution (cont.) • Candidate : DIME (Direct Internet Message Encapsulation) • format is fixed (different from MIME) • header (fixed bytes) • length (fixed bytes) • body of data (QCDML document) • length (fixed bytes) • body of data (QCDML-BinX document) • length (fixed bytes) • body of data (configuration itself) • footer (fixed bytes)

  17. Distribution (cont.) • Merits • don’t have to unpack files before reading • file size is not increased (cf. MIME: factor 3/2 incl.) • Discussions: • prepare a tool to extract QCDML document • C-library has to seek the file to point the origin (the first byte) of binary configuration • Compatibility with BinX

  18. My opinion for QCDML my opinion/proposalagreed by working group • Physics • actions, physics parameters, lattice size • Simulation • algorithm, machine, code, series, trajectory • Management • revision, crc, reference, collaboration, project, action • Pointers • site, file, C-library

  19. Action • a human readable document for each action • XML schema is powerful, but cannot describe completely the action • Three versions • UKQCD Schema v0.5 • A compromise proposal • My very simple version • Problems in UKQCD schema • too complicated • Action consists of operators • Operators consist of coupling and fields • Action and operator names are XML tags

  20. Action (cont.) • My very simple version • just listing up coupling names and values • A compromised versionhttp://www.rccp.tsukuba.ac.jp/people/yoshie/QCDML-my-sample2.xml • fields for each operator are removed • names of actions and operators are described by values • action is divided into gluon and quark sections • enables us to include boundary conditions

  21. Simulation • Algorithm section: • we may have to prepare a human readable document • simple version is sufficient • Machine • Code • Series • several runs with the same parameter sets • distinguishes them • Trajectory_or_Sweep

  22. Management • Action • Checksums • CRC32 or MD5 • for binary configuration with original format • Collaboration name and Project Name • Useful tags to search configuration • Reference • some information not suitable to include into QCDML • auto-correlation time • do not have to include all references • Revision • To check whether the QCDML document is changed

More Related