220 likes | 309 Views
Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004. Development of Molecular Geometry Knowledge Bases from the Cambridge Structural Database. Cambridge Structural Database Stored geometric information for ~300,000 structures Search using Conquest
E N D
Stephanie Harris Crystal Grid Workshop Southampton, 17th September 2004 Developmentof Molecular Geometry Knowledge Bases from the Cambridge Structural Database
Cambridge Structural Database • Stored geometric information for ~300,000 structures • Search using Conquest • Substructure search, user input required • Molecular Geometry Knowledge Bases • Library of chemically well-defined geometric information • Limited user input • Rapid retrieval of statistical data
Molecular Geometry Knowledge Base: • Mogul • Bond lengths, valence angles and torsion angles • Compiled from the CSD • Applications • Model building • Refinement restraints • Structure validation • Comparative values • Published bond length tables: • Organic and metal containing structures • Published late 1980s • Compiled from CSD of ~50,000 structures • Cannot be accessed by computer programs
Mogul 1.0 • Whole molecule input • Graphical (cif, SHELX, mol2 files) or command-line interface • Integration with client applications, e.g. Crystals • Quick, automatic retrieval of statistical data, histogram distributions, CSD structures • Search Algorithm • All non-metal fragments in the CSD coded • Set of keys code chemical environments • Fragments with identical keys are chemically identical • Use hierarchical search tree • Generalised searching if insufficient hits
Search Mogul Search .S1 .C7
Co-O bond length? Metal – Ligand Bond lengths • To be considered: • Ligand type: Carboxylate • Metal Oxidation State: Co(II) • Metal coordination number: 6 • Ligand trans: Oxygen ligand • Spin State?
Method • Analysis of M-L bond lengths. • For a range of metal and ligand types identify factors which influence M-L bond lengths and evaluate their importance. • For a defined Metal-Ligand group sub-divide bond length distribution to produce ‘chemically meaningful’ datasets: • Unimodal distributions. • ‘Reasonably small’ sample standard deviations. • From hand-crafted examples develop an algorithm to produce a molecular geometry knowledge base for metal complexes.
Data Tree Metal-Ligand Group Bin A1 Bin A2 Bin B1 Bin B2 Bin B3 Bin B4 Sharpened distributions Smaller sample standard deviations Bin C1 Bin C2
Criteria Influencing M-L Bond Lengths • Ligand, L • Coordination mode of ligand • Effective Metal Coordination Number • Metal Oxidation State • Metal clusters and cages • Spin state • Jahn-Teller effect • Metal coordination geometry • Ligand trans to L
Ligand Template Library Ligand • Non-metal atom or fragment bonded to a metal. • Two ligands are the same if they have same connectivity (topology) and stereochemistry. Method • All ligands in CSD to be classified. • Classify according to contact atom coordinated to metal. • Ligands with multiple contact atoms can be present in more than one ligand group. e.g. SCN-
Cambridge Structural Database • Approximately 22,000 formulae • Approximately 780,000 ligands • Ligand Template Hierarchy • Exact ligand templates (724) • R-substituted templates (H’s replaced with ‘innocent’ R groups) • Generic templates (ALL ligands classified)
No. of Frags. Co-O: 1.929(62) Å 619 Fragments Co-O (Å) Cobalt Carboxylate Bond Lengths
Co(II) Co(III) 2.049(58) Å 1.904(20) Å 1.929(62) Å 2.073(42) Å 1.904(20) Å 1.910(15) Å 2.074(32) Å 1.895(17) Å
Chlorides Fe-Cl 2.242(68) Å 2.189(24) Å • Pyridines e.g. Fe (spin state) Fe(II)L5py High Spin 2.166(84) Å 2.225(29) Å • Copper complexes (Jahn-Teller effect) Standardisation of Cu connectivity Cu(II)-OH2 2.232(225) Å • Tertiary phosphines, Carbon-ligands
Metal-Ligand Knowledge Base • 1. CSD data adjustment: • Standardisation of metal connections • Assignment of metal as part of a metal cluster • Assignment of metal oxidation state 2. Classification of ligands by ligand template library 3. Perform algorithm on all possible M-L fragments to produce knowledge base
Metal-Ligand Group Algorithm: From ligand template library: Generic or more specific e.g. Carboxylates:
‘Metal Clusters’ Division on Oxidation State Division on Metal effective coordination number Division on spin and Jahn-Teller effect • Only for particular metals, oxidation states and coordination numbers. • Not found for all ligand types. • Not searchable in CSD. • Flag users, effects evident by: • bimodal histogram, high SSD, outliers. Metal-Ligand Group
Division on Metal coordination geometry E.g. 4-coordinate geometry: Tetrahedral, square planar, disphenoidal Metal-Ligand Group ‘Metal Clusters’ Division on Oxidation State Division on Metal effective coordination number Division on spin and Jahn-Teller effect
Divide on trans ligand to L More specific ligand e.g. alkyl carboxylate Final Ligand division Metal-Ligand Group ‘Metal Clusters’ Division on Oxidation State Division on Metal effective coordination number Division on spin and Jahn-Teller effect Division on Metal coordination geometry
Generalised Searching • No hits or insufficient number of hits. • Allows the retrieval of data on related fragments. • Hierarchical search tree structure • Move up to a higher, less specific level of data tree. • Order of algorithm important. • Should order of criteria be changed? • Should order depend on M-L group? E.g. Should oxidation state always be the first main division?
Conclusions • Pre-processing of structural data from the CSD to construct molecular geometry knowledge bases. • Knowledge bases to contain chemically well-defined datasets. • Limited user input required. • Quick, automatic retrieval of statistical data, distributions. • Efficient analysis of large number of chemical fragments. • Outliers, high SSD? • Further Analysis – Computational Chemistry. • Further development to include extra chemical information e.g. computational data.
Acknowledgements Bristol University: Guy Orpen Natalie Fey X-Ray Crystallography Group Cambridge Crystallographic Data Centre: Robin Taylor Frank Allen Ian Bruno Greg Shields