350 likes | 551 Views
Collaboratory for Multi-scale Chemical Sciences (CMCS) – New Informatics Capabilities for CHEMKIN users. David Leahy and Larry Rahn Sandia National Laboratories Combustion Symposium 2004 University of Illinois at Chicago July 25, 2004 http://cmcs.org. Outline.
E N D
Collaboratory for Multi-scale Chemical Sciences (CMCS) – New Informatics Capabilities for CHEMKIN users David Leahy and Larry Rahn Sandia National Laboratories Combustion Symposium 2004 University of Illinois at Chicago July 25, 2004 http://cmcs.org
Outline • Multi-Scale, Multi-Domain Science Challenges • The Collaboratory for Multi-Scale Sciences (CMCS) • An adaptive informatics infrastructure • Data and Metadata Services • Examples • Related projects • Conclusions and Future Work
Combustion is a Multi-scale Chemical Science Challenge • Science relies upon validated of information shared across physical scales • New knowledge is assimilated from different data, tools, and disciplines at each scale • Critical science lies at scale interfaces • Impact through industrial application is mostly at larger scales • Multi-scale scientific collaboration faces barriers • Normal publication route is slow and excludes much important data • Multi-scale information is complex and its pedigree associated metadata matters • New approaches to developing and sharing ‘trustworthy’ data are needed • Community resources are highly distributed • Complexity of multi-scale science can lead to unnecessary duplication and impede investment
Challenge: Multi-scale science takes too long Industrial researcher Peer Reviewed Publication Autoignition not predicted by chemical mechanism NIST review Publish in data base Evaluation: Need Thermochemistry for new radical New Mechanism developed, validated Read paper Peer Reviewed Publication Conference Presentation Evaluation: Need computational data, collaborate with Quantum Chemist Peer Reviewed Publication of new radical thermochemistry Peer Reviewed Publication of computation ~ 1 year Time
Shared repository speeds multi-scale communication Industrial researcher accesses new mechanism Peer Reviewed Publication NIST review, Publish in data base Annotation: autoignition not predicted by chemical mechanism Notification:New Mechanism developed, validated Notification: Results in decision to develop new mechanism Peer Reviewed Publication Publicly Shared Data Repository Parsers Translators Annotators NIST Repository Conference Presentation Peer Reviewed Publication of new radical thermochemistry Evaluation: Need computational data, collaborate with Quantum Chemist Peer Reviewed Publication of computation ~ 1 year Time
Collaboratory for Multi-scale Chemical Science (CMCS) • A collaboration of 8 national labs and universities • Chemical scientists spanning the scales from electronic structure of molecules to simulations of reacting flow • Computer and information scientists expert in emerging web-based technologies • Funded by DOE/SC MICS office • Part of the National Collaboratory Program • Pilot project within DOE combustion research community • In our third year, renewed through 2008 • Targets Chemical Science Community and BES SciDAC projects with much broader goals in the longer term
Multi-disciplinary CMCS Team SNL - Larry Rahn*, Christine Yang, Carmen Pancerella, David Leahy, Darrian Hale PNL - Brett Didier*, James D. Myers, Karen Schuchardt, Theresa Windus, Carina Lansing ANL - Al Wagner*, Branko Ruscic, Gregor von Laszewski, Reinhardt Pinzon, Kaizar Amin LLNL-William Pitz* LANL- David Montoya*, Rick Knight NIST-Thomas C. Allison* MIT - William H. Green, Jr. *, Luwi Oluwole UCB - Michael Frenklach* *denotes Institutional Point of Contact CMCS Development Partnerships SAM National Collaboratory Program
Goal of CMCS Enhance chemical science research by providing an adaptive informatics infrastructure with an integrated set of collaboration tools, data management tools, and chemistry-specific applications, data resources. New forms of data sharing, pedigree annotation New Paradigms for collaborative research Increased access to state-of-the-art research knowledge More rapid and efficient multi-scale scientific progress
CMCS Approach • Pilot in combustion science to enable data-centric collaboration knowledge grid • Develop Portal supporting collaboration, community evaluation, knowledge management, and research tools • Innovate approaches to capture and present metadata, annotation, and semantic information • Enable data translation and data interoperability • Emphasize lightweight just-in-time integration, aspect-oriented design, open source, & Web/grid standards, technologies
Reacting Flow Modeler Thermo-chemist Thermo-dynamics Application CHEMKIN Application XML-based Web technologies enable data interoperability, metadata capture, annotation Thermo- dynamics Data base Thermo data Kinetics data Parsers Annotators Translators Parsers Annotators Shared Data Repository Distributed Authoring and Versioning (WebDAV) protocol Annotation Annotation Annotation … XML Thermo- dynamics Data Set XML Kinetics Data Set XML Transport Data Set
CMCS Informatics Infrastructure Capabilities • Collaboration • Data/metadata management • Annotation • Translation • Visualization • Notification • Search • Security
CMCS Pilot User Groups • HCCI University Consortium – Bill Pitz (LLNL) • DNS Feature Tracking & Detection – David Leahy and Larry Rahn (SNL) • Reduced Chemical Mechanisms – Bill Green (MIT) • PrIMe – led by Michael Frenklach (UCB) • NIST/PrIMe Data Warehouse • PrIMe Library of on-demand chemistry models • IUPAC – led by Branko Ruscic (ANL) • Develop and publish validated thermochemical data • Real Fuels Project– Wing Tsang and Tom Allison (NIST) • Lead real fuels chemistry at NIST • Quantum Chemistry – Theresa Windus (PNNL)
CMCS Pilot Databases, Applications • LLNL Chemistry Database – Bill Pitz (LLNL) • Computational Result Database – David Feller (PNNL) • RIOT – Reduced Chemical Mechanisms – Bill Green (MIT) • ReactionLab – Michael Frenklach (UCB) • Development and publishing chemical reaction models, interfaced with NIST/PrIMe Data Warehouse • ATcT – Active Thermochemistry Tables – Branko Ruscic (ANL) • Optimizes networks of thermochemical data • Chemical Kinetics and Thermochemistry Database for High-Temperature Materials Synthesis – Mark Allendorf (SNL)
MCS Portal Shared Data Repository SAM Grid Fabric Integration of Applications Enabled by Flexible Infrastructure Browser Active Table Command line applications Portlet API Web service Web service CMCS/DAV API XML/SOAP Java Parser of ASCII data Web service XML/SOAP
Translations in CMCS • Extensible Stylesheet Language : Transformation (XSLT) • XML HTML for web viewing • XML HTML for interactive applet tools • XML ASCII formats for other programs • Web Service Interface • Command line web services, e.g. OpenBabel for Geometry translations • Java interfaces for parsing ASCII or binary files, e.g. Chemkin XML
Application Integration in CMCS • Portlets interfaced to web services via XML/SOAP • Active ThermoChemistry Tables (ATcT) – Branko Ruscic (ANL) • Range Identification Optimization Tool (RIOT) – Bill Green (MIT)
Shared Applications for Collaborative Data Analysis Thermochemical Active Tables (ATcT) functionality available as a Web service accessible from an enabled Project Team workspace in the CMCS Portal.
RIOT -- Reduced Kinetic Models Significantly Reduce Cost Of Reacting Flow Simulations 11 reduced models plus the full model (model 0) cover the 12,000 finite volumes 4x speed-up in 2-d laminar methane flame simulation without loss of accuracy (Lu, Bhattacharjee, Barton, & Green, 2003)
‘Disk-like’ Access to Data Using Desktop Clients Example: Lab View Application on Windows Desktop LabView writes to Webdrive DAV Client (http://www.southrivertech.com/) which deposits data directly in CMCS archive.
Shared Applications for Collaborative Data Analysis Thermochemical Active Tables (ATcT) functionality available as a Web service accessible from an enabled Project Team workspace in the CMCS Portal.
Summary • CMCS provides a public data sharing collaborative workspace for chemists • Modern XML technologies provide better ways for scientists to share knowledge • Web-based interfaces for data and applications • Metadata management • Translations • Visualization • CMCS Pilot Groups are providing valuable feedback to the CMCS’ iterative development cycle
CMCS Data/Metadata Philosophy • Scientific metadata has meaning across chemical science domains • Scientific data is generally opaque and can be somewhat meaning-free outside of a discipline • Metadata must be understood and manipulated and formatted in a machine-comprehensible way • We are not enforcing standards • There is no schema that spans the scales the CMCS addresses • Enforcing standards across multiple chemistry communities would not be pragmatic • Enforcing standards would alienate scientists • When and if standards exist … • CMCS provides a technological framework for standard adoption • We encourage the community to develop, review and adopt standards • We can map our scientific content to and from standards, as needed
522.09 2.02 Chemical Science Data and metadata H°atomiz ( ) = 0 ± kcal/mol CH3OOH [calculated, G3//B3LYP, T. Windus, more at http://...] data : value and uncertainty units: kcal/mol quantity: enthalpy of atomization species: methylhydroperoxide, CAS# 3031-73-0 temperature: 0 K calculated: G3//B3LYP creator: T. Windus using Ecce more info: http://avatar.emsl.pnl.gov:8080/Ecce/.../CH3OOH/.../GxEnergy
Scientific Data Provenance • Data provenance (or data pedigree) -- where a piece of data came from and the process by which it arrived in the data repository – is essential for the sharing of scientific or technical data • Data provenance is the metadata that describes the data’s context and provides a traceable path to its origin • Provenance captures the identification of the data, the traceability of the data, possibly across scales and/or domains, as well as information about accuracy and sensitivity • Provenance metadata is associated with CMCS resources (WebDAV protocol, XML annotation standards), and is browsable and searchable from the CMCS portal • Pedigree may include the series of steps necessary to reproduce the data generalized workflow development, or virtual data • Data is linked to projects, references, inputs, and outputs
Metadata Title: Active Tables Thermochemistry Data Table for Methyl peroxy Contributors: Reinhardt Pinzon, Albert F. Wagner, Melita L.Morton, Gregor von Laszewski, Sandra Bittner, Sandeep Nijsure, Kaizar Amin, Baoshan Wang Creation Date: 2003-11-10 Creator: Branko Ruscic Keywords: Thermodynamics, molecule, species MIME Type: text/xml-activetables-thermochemistry … Annotations Text Whiteboard Sound Equations CH3OOqueryResult.xml references hastranslations O Atom Reference – NASA7ElementsLexicon in MainLibrary 0.004 Plot View (text/html) JANAF format (text/plain) Active Tables Bibliography in Main Library (0.001) hasinputs PolyatomicRRHOLexicon references NetworkEncyclopedia pitzNotesBibliography FixedEnthalpiesCompendium SpeciesDictionary issanctionedby Data Provenance Relationships as Graph IUPAC
Key Data/Metadata Management Components • WebDAV -- Web Digital Authoring and Versioning • Extension to HTTP for file management and collaborative file sharing on remote web servers • Files/collections and properties (data about data) • Methods: GET, POST, PUT, COPY, MOVE, DELETE, MKCOL, PROPFIND, PROPPATCH , LOCK/UNLOCK, ACLs • DASL (DAV Searching and Locating) -- search extension for DAV • http://www.webdav.org • SAM -- Scientific Annotation Middleware • Built on top of WebDAV, in particular Jakarta Slide • Automatic annotation and translation services • Notifications (tied to email daemon in CMCS) • Supports multiple perspectives, workflows, goals • Different users and different applications • Event enabled data/metadata repository • Jim Myers, PNNL, P.I. http://www.scidac.org/SAM
Sample Metadata from XML Data file <dc:title>Active Tables Elements Cookbook</dc:title> <dc:description>This document contains standard thermochemical reference states for elements/isotopes.</dc:description> <dc:creator> <rdf:Bag><rdf:li>Branko Ruscic</rdf:li></rdf:Bag> </dc:creator> <dcterms:created>2003-04-06</dcterms:created> <cmcs:ispartofproject> <rdf:Bag><rdf:li> <cmcs:href xlink:type="simple" xlink:title="methylperoxyNotes (0.001)“ xlink:href="/slide/files/projects/primeThermo/methylperoxyNotes"/> </rdf:li></rdf:Bag> </cmcs:ispartofproject> <cmcs:hasinputs> <rdf:Bag><rdf:li> <cmcs:href xlink:type="simple" xlink:title="Active Tables Bibliography in Main Library (0.001)" xlink:href="/slide/files/public/ActiveTables/MainLibrary/"/> </rdf:li></rdf:Bag> </cmcs:hasinputs> <cmcs:speciescas> <rdf:Bag><rdf:li>183748-02-9</rdf:li></rdf:Bag> </cmcs:speciescas> Dublin Core Metadata Elements and Terms CMCS Metadata Properties
Related Projects: Expanding CMCS • Reacting Flows: Feature Tracking in Numerical Simulation Datasets • Feature detection and tracking is a data mining approach with the motivation to extract further scientific understanding from valuable DNS data sets. CMCS is working with BES/SciDAC projects here at the CRF towards adopting new standard formats for feature data/metadata.
Related Projects: Leveraging CMCS • DART Metadata • CMCS team’s experience with metadata management was relevant to the DTA team successfully reaching its Material Transparency Milestone. • DHS Data Integration • An integrated Rad/Nuc Countermeasures System requires the well-organized and efficient flow of data. • C-MS3D (Outstanding NIH Proposal) • Structure and function of biological macromolecules is central problem in biology. MS3D, an emerging approach, uses intra-molecular chemical cross-linking followed by mass spectrometric analysis to gain insights into the structure of these macro-molecules. C-MS3D would be a data-centric collaboration infrastructure for the leaders of the MS3D research community.
Metadata at Work: Data Viewer Registered With SAM Data translations provided automatically by SAM for this file type.
CMCS Metadata Stored as WebDAV Properties DAV property is a keyword/value pair: namespace:tag, and a well-formed XML value.
CMCS Metadata Use • Metadata provides identification and documentation to scientific data. • Example: Attaching an owner, creation date, abstract, type to data. • Example: Tracking data to program versions, and possibly bugs for that version. • Metadata documents the context and value of the data. • Example: The theoretical atomization energy of methylhydroperoxide (and its uncertainty) from Ecce (used as input to ATcT) contains information identifying the species and the quantity, units, the theoretical method used, vibrational frequencies and geometry, reference to source file, creator, etc. • Metadata facilitates cross-scale transfer of data. • Example: Can show a chain of inputs, including input parameters and configuration files, across scales. • Example: Can retrieve literature references which describe this data. • Metadata allows users to comment on the data and its quality. • Example: CMCS infrastructure can be used for scientific peer review of data. • Metadata is necessary for effective collaboration. • Example: Scientific data becomes more usable to others when it is documented. Metadata, also referred to as data annotation, converts scientific data into knowledge.
CMCS Metadata Elements • Using Dublin Core for some basic pedigree properties: creator, dates, publisher, is-referenced-by, references, replaces, is-replaced-by, has-version, etc. • Dublin Core Element Set and Qualified Dublin Core • Use of both XML and RDF to encode metadata values • Use of XLink to express values of hyperlinks • CMCS properties for chemical science to enable searching: species name, CAS, chemical properties, and chemical formula. • CMCS properties for defining scientific data: has-inputs, has-outputs, and is-part-of-project. • CMCS properties for scientific publication and peer review annotations: is-sanctioned-by. • Currently defined 36 elements in the core CMCS pedigree. • Flexible infrastructure for addition of new metadata. As new metadata is added to infrastructure,current apps will not break! CMCS metadata is strongly encouraged, though not required, for all CMCS data, and CMCS metadata is extensible.