380 likes | 394 Views
Explore advanced approaches in crystallography to understand intermolecular interactions beyond conventional wisdom. Develop tailor-made compound families using a statistical and holistic methodology. Discover the significance of F-substitution possibilities for shaping interactions. Enhance crystallographic research through systematic evaluation of similarity and statistics.
E N D
Crystallographic Informatics Similarity and Statistics Simon Coles Associate Professor Director, UK National Crystallography Service Dr Graham Tizzard (UK National Crystallography Service) (Dr) Philip Adler (now Haverford College, PA) ACS Spring Meeting 2016, San Diego
Two Approaches • Beyond the Molecule • Conventional wisdom: ‘its all about h-bonding’ • Hasn’t ‘shape’ been forgotten in all this hype? • Intermolecular interactions are about so much more than directed ‘bonding’ • Chemical, and hence solid-state, space is sparsely populated… • Build tailor-made families of homologous compounds • Adopt a more holistic, statistical approach
Engineering interactions • Substitute systematically with F. Simple synthesis. Simple shape - stacking dimers? • Direct complementary H…F overlap • Variety of complementary H…F overlap • Frustrated or clashing F…F overlap
F-Substitution Possibilities 100% complementary H…F overlap 100% clashing F…F overlap Varying degree of clashing F…F or overlapping H…F
Some expectations come true 0-23456 345-26 • 100% Complementary H…F Overlap • 100% Clashing F…F Overlap • Varying Clashing & Overlap 25-25 35-35 236-24 245-25
Avoidance tactics? Tapes - short “side on” interactions (ca 20 observations) Threads - short “end on”interactions (25 observations)
Isostructurality • 35-246, 4-2356
Manual comparisons • Unpredicted motifs found - painfully • Some predominant themes occur simultaneously in the same structure • Need a way to automate and scale the process • Independent of ‘traditional’ synthon-type approaches
High Throughput Systematics X Y Cryst. Eng. Comm., 2005
X~IX~CF3CH3~CF3(X = CF3, I, Br, Cl, F, H) Br~Br(ii) Br~Br (i)I~BrCF3~Cl Br~Br (iii)I~ClI~Br (ii)I~I (iii) CN~Br (i)CN~CN 3D 2D C2 C3 C1 1D I-Dimer 0D Evaluaing Similarity - XPac Similarity – supramolecular constructs (SCs) Isostructurality Sheets Chains, Tapes Discrete (eg dimer)
Substituted Mandelic Acids X = H F Cl Br I CF3 Me OMe • Simple chiral molecule – 2 x H-bond donors, 3 x acceptors • Substituted at 2, 3 and 4 positions • Substituents • No H-bond donors • Sterically undemanding • Mono substituted Cryst. Growth & Des., 2014 (x2)
The Bigger Picture • Part of larger project • Quasiracemates, diastereoisomers and racemate/enantiomers • ~2000 structures
A-type Constructs (1) • Based on COOH dimer • Exhibited in 11 structures • 3 x 1D constructs • 1 x 2D construct
B-type Constructs (1) • Based on C=O and chain OH dimer • More prevalent than A-type – 20 structures • 2 x 1D constructs
B-type Constructs (2) • B11 basis for 2 x 2D constructs – B21 & B22 • B21 + B22 → B32, largest isostructural group • B21→B31, B22→B33 • B12 basis for 4 x 2D constructs B23 & B24 and hybrid AB21 & AB22
AB-type Constructs • 2 x 2D constructs combining A11 & B12 stacks • AB21 – A11 & B12 H-bond via available OH → ABAB… bilayer • AB22 – stacks of A11 + 2 x B12 BAB bilayers linked by Hal-Hal interactions
Polymorphs • 9 substituents yielded polymorphs (so far!) • 7 have no relationship or a common dimer only • 2Cl – common A13 1D dimer chain • 3Br and 3Cl – common B22 2D sheet • 3Cl-1 and 3Cl-2 – isostructural!
3Cl – Isostructural Polymorphs • Phase transition T dependent • Reversible but subject to hysteresis effect • Bond lengths and angles and lattice overlay near identical
Isostructural 2F and 3F • Isostructural within B32 group • Swapping between ortho and meta gives same structure • Why is isostructurality between 2- and 3-substituted heteroaromatics not more common?
Further Observations • Prevalence of structures based on 10-membered rings vs 8-membered rings • Identify ‘missing’ structures to target with cross seeding 3Cl ‘seed’ 3Me new polymorph
Acylanilides Y:p-X Y:m-X Y:o-X
Crystallisation Trends Synthesis Crystallisation 220 acylanilides 400 reactions 300 crystalline samples 260 XRD data sets 40 side products IncreasingTime CH3 C(Me)3 CF3 NH2 H OC2H5 OCH3 C2H5 C3H7 Organic Process Research & Development, 2009
Statistics to the Rescue? • XPac relies on n pairwise comparisons of structures • Can we extract, generate, find features from any/all structures independently • Build statistical models • Look for correlations • Rationalise sets we already have • Predict what might happen • Correlate features (structure property, different aspects of structure, etc) • QSAR for crystal structures?
What does Crystallographic Descriptor Space Look Like? • 1000s of potential descriptors • Develop appropriate descriptors • Molecular (well explored) • Ordinary: a single value • Spectral: calculated a priori and cannot be directly compared
What does Crystallographic Descriptor Space Look Like? • Crystallographers use ‘quantities’ - these are not necessarily statistical descriptors! • Response descriptors must be invariant • OK: Energy calculations, Specific geometry • Dubious: Graph sets, Space groups • Therefore we invented: Graph-based descriptor
Correlations - n pairwise comparisons, a Big Data problem? Correlation ?! 1000’s of structures
Cambridge - we have a problem… • Solid-state descriptor space is really quite big • Chemical complexity space is really, really big • We have only just dipped into this space. • There is a vast wide open space - draw a line between SAN and NYC: how much is populated?! • Simply too much uncovered territory for statistics to be meaningful – publishing “negative results” would help (a bit) • Look at ‘constrained’ space (regions we have control over)
Something more tractable? • Trials with F-anil compounds vs melting point • Reasonable model fit, but not 100% conclusive – still a complex problem • So what question are we asking? • Lets try a yes/no problem instead
Statistical Prediction • Pharmaceutical co-crystal formation – screen design
Getting somewhere? • Training set of co-crystals (CSD) used to find descriptors that discriminate for formation • 3D geometry, complexity of bonding, LogP, shape • Decision tree
Conclusions • From experiments, everything is not what it might seem – unexpected isostructurality between family members. • Similarity in unexpected places • Can use an understanding of similarity in solid-state space to engineer that which doesn’t readily form (seeding etc) • For stats to be meaningful we need better sampling of chemical space • Constrained problems beginning to get meaningful answers