220 likes | 319 Views
Metadata Working Group Report. Members (fixed in mid-January) G.Andronico INFN,Italy P.Coddington Adelaide,Australia R.Edwards Jlab,USA C.Maynard Edinburgh,UK D.Pleiter DESY,Germany J.Simone FNAL,USA T.Yoshie Tsukuba,Japan B.Joo (observer) Edinburgh,UK
E N D
Metadata Working Group ReportMetadata Working Group Report • Members (fixed in mid-January) G.Andronico INFN,Italy P.Coddington Adelaide,Australia R.Edwards Jlab,USA C.Maynard Edinburgh,UK D.Pleiter DESY,Germany J.Simone FNAL,USA T.Yoshie Tsukuba,Japan B.Joo (observer) Edinburgh,UK • Mailing List qcdml@rccp.tsukuba.ac.jp • About 80 mails circulated • QCDML (QCD Markup Language) for ILDG
0. Introduction • QCDML: Strategy and Standard Configuration Format (T.Yoshie) • QCDML: Physics (C.Maynard) • QCDML: Machine and Management (D.Pleiter) • My proposal for QCDML not be used in my talk may be useful for discussions
Strategy • QCDML: XML schema for ILDG • write a QCDML document for each configuration • store QCDML documents in (a) database(s) • search/retrieve configurations design QCDML so that developing applications is easy • QCDML defines a minimal set of XML tags • necessary for exchanging configurations • tags which will be searched • researchers are usually interested in • required: physics parameters (beta,mq) • not included: random number seed
Strategy (cont.) • Each collaboration can extend QCDML and use it for own purposes • Every collaborations are asked to provide values of all relevant QCDML tags
Category of QCDML Standard configuration format (SCF) • Physics and parameters • Algorithm and status • Code • Machine • Management • Miscellaneous • finalized • 4,5: almost finalized • 1:discussions on-going (different opinions)
SCF: Strategy • Standard Format is an abstract (reference) format for exchanging configurations • collaborations submitting configurations to ILDG do not have to convert archived files • some groups have already archived a lot of configurations with an original format • each format is chosen for convenience • Conversions will be done at a user side • two methods to convert format of configurations • given format to the standard one via C-library • one format to another using BinX technology (without referring to the standard format)
SCF: Format • Definition of Gauge configuration • i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t) • employ NERSC (Gauge Connection) format • a sequence of 8-byte double precision real numbers • coded in 32-bit IEEE numerical format • endian is not specified
Row-Column Complex*16 Column-Row SCF: Format (cont.) • In C-program, • last index runs faster, index runs from 0 • re =0 (real part) re=1 (imaginary part) • Store first two rows (2x3) of 3x3 link matrix • U11,U12,U13,U21,U22,U23 • mu=1,2,3,4 • x=0,1,2,...NX-1 y=0,1,2,...NY-1 z, t double
SCF: C-library • Each collaboration submitting configurations to ILDG prepares a C-library to read their configurations in the standard format • pointer to the C-library is stored in QCDML document • read a hyper-cubic region • (ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1) of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice void ILDG_read_conf(file, NX, ix0,ix1, NY, iy0,iy1, NZ, iz0,iz1, NT, it0,it1, endian,config)
SCF: C-library (cont.) main() { int NX=8,NY=8,NZ=8,NT=16 ; int endian=1 ; /* big endian, =0 for little endian */ double U[8][4][4][4][4][2][3][2] ; ILDG_read_conf("test-file", NX,0,3, NY,4,7, NZ,4,7, NT,0,15, endian,U) ; } the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice (0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format and stored in U[8][4][4][4][4][2][3][2].
SCF: C-library (cont.) • in general, the conversion program requires huge memory of 1-2 configuration size: --- memory bottleneckcannot be avoided • We propose the above interface: • Simple • mainly for full QCD configurations 32^3 x Nt lattice for forthcoming several years can be handled by a high-end PC with memory of 2GB • some extension might be necessary in future
SCF: BinX • BinX • an XML schema to describe format of binary file developed by the edikt project (a part of OGSA-DIA) http://www.edikt.org/ • software to convert one binary format to the other will be available in May, 2003 • enables us to convert configuration without referring to the standard format • Each collaboration submitting configurations to the ILDG describes its own format by BinX • User may write his/her favorite format in BinX
SCF: BinX (Cont.) <dataset> <definitions> <typeDef typeName="complexDouble"> <struct> <ieeeDouble-32 varName="Real"/> <ieeeDouble-32 varName="Imaginary"/> </struct> </typeDef> <typeDef typeName="matrix2x3"> <arrayFixed> <defType typeName="complexDouble"/> <dim name="row" indexFrom="0" indexTo="1"/> <dim name="column" indexFrom="0" indexTo="2"/> </arrayFixed> </typeDef> </definitions>
SCF: BinX (Cont.) <file src="sample.configuration" byteOrder="bigEndian"> <arrayFixed varName="StandardGaugeConfig"> <defType typeName="matrix2x3"/> <dim name="t" indexFrom="0" indexTo="31"/> <dim name="z" indexFrom="0" indexTo="15"/> <dim name="y" indexFrom="0" indexTo="15"/> <dim name="x" indexFrom="0" indexTo="15"/> <dim name="mu" indexFrom="0" indexTo="3"/> </arrayFixed> </file> </dataset> • Mechanism for describing an array split across several files
Distribution • SCF defines format of only binary configuration • no parameters (size,coupling..) • no management info (checksums, collaboration name..) • all of them are described in a QCDML document • Keeping identification of configuration • encapsulate the configuration and the QCDML document into one file • distribute it via ILDG • (need opinions and help from the middleware working group)
Distribution (cont.) • Candidate : DIME (Direct Internet Message Encapsulation) • format is fixed (different from MIME) • header (fixed bytes) • length (fixed bytes) • body of data (QCDML document) • length (fixed bytes) • body of data (QCDML-BinX document) • length (fixed bytes) • body of data (configuration itself) • footer (fixed bytes)
Distribution (cont.) • Merits • don’t have to unpack files before reading • file size is not increased (cf. MIME: factor 3/2 incl.) • Discussions: • prepare a tool to extract QCDML document • C-library has to seek the file to point the origin (the first byte) of binary configuration • Compatibility with BinX
My opinion for QCDML my opinion/proposalagreed by working group • Physics • actions, physics parameters, lattice size • Simulation • algorithm, machine, code, series, trajectory • Management • revision, crc, reference, collaboration, project, action • Pointers • site, file, C-library
Action • a human readable document for each action • XML schema is powerful, but cannot describe completely the action • Three versions • UKQCD Schema v0.5 • A compromise proposal • My very simple version • Problems in UKQCD schema • too complicated • Action consists of operators • Operators consist of coupling and fields • Action and operator names are XML tags
Action (cont.) • My very simple version • just listing up coupling names and values • A compromised versionhttp://www.rccp.tsukuba.ac.jp/people/yoshie/QCDML-my-sample2.xml • fields for each operator are removed • names of actions and operators are described by values • action is divided into gluon and quark sections • enables us to include boundary conditions
Simulation • Algorithm section: • we may have to prepare a human readable document • simple version is sufficient • Machine • Code • Series • several runs with the same parameter sets • distinguishes them • Trajectory_or_Sweep
Management • Action • Checksums • CRC32 or MD5 • for binary configuration with original format • Collaboration name and Project Name • Useful tags to search configuration • Reference • some information not suitable to include into QCDML • auto-correlation time • do not have to include all references • Revision • To check whether the QCDML document is changed