Reliable High-Throughput Computation in Computational Chemistry and Robots

Computational Chemistry Robots ACS Sep 2005 Computational ChemistryRobots J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang jat45@cam.ac.uk

Can high-throughput computation provide a reliable “experimental” resource for molecular properties? • Can protocols be automated? • Can we believe the results?

Humans must validate protocols rather than individual data Low rates of error must be addressed Users should know the rates of error and degree of conformance Aspects of complete automation

Explore limits of job behaviour (times, convergence, etc.) Analyse reproducibility Vary and analyse effects of parameters and algorithms Compare output with other “measurements” of same quantity Approaches to conformance

The overall view molecules computation dissemination

The overall view molecules computation dissemination Check results

Workflow for management of jobs (Taverna) Natural Language Processing based parsing of outputs (JUMBOMarker) Pairwise comparison of data sets (R) Analysis of mean and variance Detection and analysis of outliers Components of System

Computing the NCI database MOPAC PM5a aMOPAC PM5 – collaboration with J.J.P. Stewart

Unsuitable Data Program Crashes Pathological Behaviour Inform Developer Protocol System Crashes Log Files Statistics Science Errors Parse Analysis Other Science Disseminate Results

Taverna • Workflow programs allow a series of small tasks to be linked together to develop more complex tasks • Open Source • myGRID, eScience • European Bioinformatics Institute • University of Manchester

An Example Taverna Workflow

Computational Chemistry Log Files Parsing Log Files to CML Coordinates Calculation Type Molecular Formula Point Group Total Energy Dipole

Parsers CompChem Output CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Coordinates Coordinates Energy Levels Energy Level Vibrations Vibration

Dissemination of results LOG FILE CML FILE HUMAN DISPLAY JUMBOMarker NLP-based log file parser Outside world WWMM* Server and DSpace * World Wide Molecular Matrix

InChI: IUPAC International Chemical Identifier • A non-proprietary unique identifier for the representation of chemical structures. • A normal, canonicalised and serialised form of a chemical connection table. • InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/

Proteus molecules* JUNK Cured by MOPAC Calculation * Proteus was a shape changing ocean deity

Proteus molecules Input JUNK Calculation

How do we know our results are valid? Computational Method 1 Computational Method 2 Experiment

J.J.P. Stewart’s example Calculated DHf – Expt DHf

GAMESS MOPAC results GAMESSa 631G* B3LYP Log Files a Project with Kim Baldridge and Wibke Sudholt

Unsuitable Data Program Crashes Pathological Behaviour Inform Developer Protocol System Crashes Log Files Statistics Science Errors Parse Analysis Other Science Disseminate Results

Repeat runs, different methods Multiple runs give same final structure from same input Changing memory allocation doesn’t make a difference

Pathological behaviour - Early detection divinyl ether trans-Crotonaldehyde 100 min 631G*, B3LYP 200 min Z matrix 15 min 631G*, B3LYP 10080 min

Times to run jobs

Analysis of different computational methods Mean - Overall difference Normality - Distribution of values Outliers - Unusual molecules? Variance - Spread of the data, depends on both distributions. (standard deviation)

Probability Plot (Normal QQ plot)

Probability Plot (Normal QQ plot) S.D. 0.020 Å Mean of distribution (Approx - 0.03 Å) Range over which sample distribution is approximately normal Outliers

All bonds* Dr (MOPAC – GAMESS) / Å * Excludes bonds to Hydrogenc

All bonds* Dr (MOPAC – GAMESS) / Å Good agreement S.D. 0.005 Å Nearly normal Outliers * Excludes bonds to Hydrogenc

Bad molecules and data usually cause outliers H O 2- P H O Na

Mean Dr (M - G) / Å Standard Error of the Mean / Å All values given to 3 significant figures

Dr CC bonds (M - G) / Å

Dr CC bonds (M - G) / Å Good agreement S.D. 0.013 Å Nearly normal Outliers JUNK

Selection of molecules with C C Dr (M - G) > 0.05 Angstroms

Non aromatic C C bonds adjacent to CFn Y = 0.0277 X – 0.0061

Dr NN bonds (M - G) / Å

Dr NN bonds (M - G) / Å Good agreement S.D. 0.022 Å Nearly normal Kink

Density plot of Dr NN bonds (M - G) / Å

Density plot of Dr NN bonds (M - G) / Å RIGHT LEFT

Most common fragments found in Left set but not Right set N(ar) S(sp2) N (ar) (sp3) C(sp2) Or C(sp3) C(sp3) N(ar) S(sp2) N (ar) C(sp2)

Comparison of theory and experiment CIF* CIF* GAMESS CIF 2 CML CIF* CIF* CIF* Log Files * CIF: Crystallographic Information File

Reading Acta Crystallographica Section E

All bonds* Dr (Cryst. – GAMESS) /Å Single molecules, no disorder * Excludes bonds to Hydrogenc

All bonds* Dr (Cryst. – GAMESS) /Å Single molecules, no disorder S.D. 0.014 Å Mean Dr - 0.011 Å Nearly normal Outliers * Excludes bonds to Hydrogenc

Dr CC bonds (C – G) /Å

Mean Dr - 0.01 Å Dr CC bonds (C – G) /Å S.D. 0.009 Å Nearly normal

Dr CO bonds (C – G) /Å

Dr CO bonds (C – G) /Å S.D. 0.011 Å Good agreement Nearly normal Outliers ?

Chemistry can cause outliers Dr = +0.08 Å H movement

Conclusions • Protocols can be automated • Machines can highlight unusual behaviour, • geometries and distribution of results for • humans to consider • Computational programs can provide high • quality “experimental” molecular properties

Reliable High-Throughput Computation in Computational Chemistry and Robots

Reliable High-Throughput Computation in Computational Chemistry and Robots

Presentation Transcript

Computational Chemistry

Computational Chemistry for Dummies

Computational Chemistry

Computational Science: Computational Chemistry in the FAMU Chemistry Department

Introduction to Computational Chemistry

Computational Chemistry

Computational Chemistry Group

Computational Chemistry

Introduction to computational chemistry

Blogging meets Computational Chemistry

Chemistry for ACS

Introduction to Computational Chemistry

AMCOM 7 Sep 2005

Computational Chemistry

Computational Chemistry

Computational Chemistry

Introduction to Computational Chemistry

Adventures in Computational Chemistry

Molecular Modeling Computational Chemistry

The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry

Introduction to Computational Chemistry

PDF_ ACS Organic Chemistry: ACS Examination in Organic Chemistry, Practice Ques