230 likes | 359 Views
Motivation Vocabolary wrappers. Looking for a (standard) Common Format for (Quantum). Computational Chemistry. A WG activity within COST action 23 ( WG D23/0006/01 ). Elda Rossi , Andrew Emerson – CINECA Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna
E N D
Motivation Vocabolary wrappers Looking for a(standard)CommonFormatfor(Quantum) Computational Chemistry A WG activity withinCOST action 23(WG D23/0006/01) • Elda Rossi, Andrew Emerson – CINECA • Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna • Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara • Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse • José Sanchez-Marin - Universitat de Valencia • Peter Szalay - Eötvös Loránd University • Rosa Caballol - Universitat Rovira i Virgili Tarragona
Motivation for the work Motivation Vocabolary wrappers To build a meta-system for supporting research collaboration in the field of “Localised Orbitals in post-SCF methods …Linear Scaling methods in a Multi-Reference context”
The scenario Motivation Vocabolary wrappers • Different laboratories need to collaborate • Different “home-made” codes need to be used together since they give different views of the same problem • General purpose “basic” codes needed to pre-compute data in a sort of pipeline • Programmes should remain on their original sites under the responsibility of their authors • Different platforms • Network connections(grid architecture) • Workflow
The need of a Common Format Motivation Vocabolary wrappers The first problem we faced: How different codes(on different platforms)can communicate we need aCommon Formatfor (at least) Quantum Chemistry codes
Preliminary steps Motivation Vocabolary wrappers • Looking around … • CML available since long time • XML is use by Accelrysfor internal files • XML is used by ArgusLabfor internal files All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties • XMLseems the best technology so we took the decision to try another XML based format • HDF5looked nice for storing large binary data typical of QC
How should work the engine Motivation Vocabolary wrappers IN-wrapper • Leaves the program unchanged • One wrapper for each program – If a code is added only one wrapper to be written IN-files Data RepositoryXML/HDF Program OUT-files OUT-wrapper
QCML: an XML format for QC Motivation Vocabolary wrappers • In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities • As a first approximation three domains can be identified • Base FACTS initial data for describing the physics of the system • DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …) • W-FLOWwhich codes are in the pipeline, specific input Parameters data, … • A base fact is a fact that is a given in the world and is remembered (stored) in the system. • A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.
FACT: molecule Motivation Vocabolary wrappers <systemtitle date program author> <moleculenElectrons charge spinMultiplicity spaceSymmetry> <symmetry>groupName/> <geometrytype unit numAtoms symmetryRef > <atomsymbol isotope x3 y3 z3/> <basisname type numOrbitals > <atomBaseangularMomMAX symbol > <angularMomvalue symbol numOrbitals> <orbitalid numPrimitives> <exps/> <coeffs/> Symmetry: group name & other symmetry data Geometry: only cartesian, full or unique for sym Basis: by name or fully defined • FACTS • DERIVED • W-FLOW
DERIVED data: computedData Motivation Vocabolary wrappers <system…> <computedData> <energy unit levelOfTheory quality value> <state spaceSymmetry spinMultiplicity excitationLevel /> <property unit levelOfTheory quality value> <state “bra” spaceSymmetry spinMultiplicity excitationLevel /> <state “ket” spaceSymmetry spinMultiplicity excitationLevel /> <operator ordername/> <file address URL/> A “schema” has been written for QCML • FACTS • DERIVED • W-FLOW
DERIVED : computedData/file Motivation Vocabolary wrappers Two possible strategies: • Leave data in their native format and translate them only when needed. Maintain different version (formats) of the same data • Define a “standard” format for binary data and convert them anyway • Problem with large binary datasets • include the reference not the actual data • The second was the solution of choice • HDF5 appears to be a good solution
HDF Mission Motivation Vocabolary wrappers To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery. • Format and software for scientific data • Stores images, multidimensional arrays, tables, etc. • Emphasis on storage and I/O efficiency • Free and commercial software support • Emphasis on standards • Users from many engineering and scientific fields
Example HDF5 file Motivation Vocabolary wrappers Property Overlap Repulsion Kinetic Kinetic+Repulsion Orb | occ | energy ----|-----|----- 1 | 0 | 0.35 2 | 0.5| 0.26 3 | 2. | 0.69 Table “/” (root) “/MO” “/MO” “/AO” “/bi” “/mono” “/mono” “/bi” “/coefficients” 4-D array
HDF file structure for QC Motivation Vocabolary wrappers Norb Name QCML_ref Norb Root AO <i/j> <i/T/j> <i/Vnuc/j> <i/T/j>+<i/Vnuc/j> <ij/kl> MO <i/T/j> <i/V/j> <i/T/j>+<i/Vnuc/j> <ij/kl> coeff(i,j) Property <i/p/j> Spin Polar.: a=b a b Orb Classif: Core Active Virtual Orb Energies: Orb Symm: [1-order] + format metadata (integer, binary, Endian-ism, …)
QCML processing: wrappers Motivation Vocabolary wrappers • One couple of wrappers for each code in the metasystem • They should be written & maintained by the authors of the chemical codes • XML processing can be used (DOM) but … what language??? • Fortran: no easy and stable DOM available • Scripting languages (Perl/Python/Java): not known by chemists • We tried both ways (Fortran & Python)
Fortran DOM: drawbacks Motivation Vocabolary wrappers • The only problem is the Fortran binding • It doesn’t exist (at least last year …) • DOM is OO and Fortran is not • It exists a C binding (Gdome2) • Gdome2 was installed – very hard work – on a mainframe platform (it was conceived for Linux) • We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …)
Why Fortran Motivation Vocabolary wrappers GOOD • Users don't need to learn a new language • Homogeneous environment BAD • Tricky: need an external library (f77xml) built on top of gdome2 • Porting problems for gdome2/libxml2 may arise
Still in development v0.4 is out (experimental, with limited features) v1.0 upcoming, API changed to be nearly DOM2 compliant Written in C on top of gdome2 http://gdome2.cs.unibo.it/index.html Designed for interfacing to F77 (also F90 soon) Reduced namespace pollution F77xml library Motivation Vocabolary wrappers Cons: • F77 syntax is difficult (DOM2 + tricks) • F90 syntax is simpler • A pre-processor will convert F90 syntax to F77 http://freshmeat.net/projects/f77xml
F77xml library - V1.0 example Motivation Vocabolary wrappers Gdome2 (C) GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc); F90 Call f77xml_el_firstChild(nodeCode, elemCode, exc) First position: Return value NodeCode, elemCode,exc mapped to INTEGER F77 Func='el_firstChild' Call xp3t1(nodeCode,func,elemCode,exc) Multiplexer function:x: p3: 3 parameters (+ name function) t1: type 1 parameter schema(code/code/error)
Why Python Motivation Vocabolary wrappers GOOD • Very Easy Object Oriented Language • Works well with strings • Simple ed efficient DOM interface for XML • Present in almost all UNIX/LINUX distribution BAD • Users do need to learn a new language • Maybe less powerful than Perl • Usually not used by chemists
At the present a prototype does work with molpro-fci chain. It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI) Python Wrapper Motivation Vocabolary wrappers
Python or not Motivation Vocabolary wrappers • Python is very simple to learn and works very efficiently with xml • Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade • Possibility of a GUI could make our project much more user-friendly
What we have done … MolProIN-file IN-wrapper Single platform: IBM SP4 Two code chains • MolPro to FCI • MolPro to CasDI MolPro OUT-wrapper FCIDUMP Start here QCMLRepository HDF5Repository IN-wrapper Bin file for FCI FCIIN-file IN-wrapper FCI Stop here
In conclusion … Two important hints on data… • Use some XML dialect for describing simple structured data • Use HDF5 for storing large array and binary data Need of a good and easy API to XML & HDF How to manage the workflow How to manage the grid connection