1 / 29

Prediction of NMR Chemical Shifts. A Chemometrical Approach

Prediction of NMR Chemical Shifts. A Chemometrical Approach. К.А. Blinov , Y . D . Smurnyy , Т. S . Churanova , М.Е. Elyashberg Advanced Chemistry Development (ACD). Structure and its spectral data. Spectra. Structure. Sometimes solution is not obvious.

dard
Download Presentation

Prediction of NMR Chemical Shifts. A Chemometrical Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)

  2. Structure and its spectral data Spectra Structure

  3. Sometimes solution is not obvious • In many cases we obtain several structures corresponding to spectral data. • In this case we need a method to rank the structures. • Most powerful method - compare experimental and predicted 13C NMR spectra

  4. 13C NMR spectral data Experimental Predicted 2,00 9.62

  5. How to find the best structure? • In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum • In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm

  6. The role of the spectra prediction • Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR). • 20 min to generate all structures (> 12 000) • 24 hoursto predict the NMR 13С spectraof all the obtained structures • Speed of spectra prediction should be increased

  7. Methods of the prediction ofNMR spectra – extremely slow • Quantum Mechanics • Database approach • HOSE Codes • Maximum Common Substructure • Rule-based • Additive scheme • Neural Networks – accurate but slow – fast but inaccurate • Our choice – improve accuracy of fast method

  8. Additive scheme 0.52 -1.85 -2.79 d = åaixi -1.35 153.71 144.31 -1.39 0.52 -4.49 1.43 d = 153.71 -1.85-4.49-1.39 -2.79+1.43+0.52+0.52 -1.35 =144.31 Main problem – find correct values of atom increments

  9. Available data • We have database of 1.5 millions of chemical shifts for 13С. • We can try to obtain correct values!

  10. How to encode atom environment … Atom’s type CH3 CH2 CH2 CH C O … 2 1 1 1 1 Number of atoms 1 1st sphere 2nd sphere Input variables

  11. Data for PLS regression Atom environment encoding Chemical shifts X Y Samples

  12. Find best structure encoding • Initially best scheme of structure representation does not evident • We should find scheme which has best accuracy • We should optimize • substitutents coding scheme • number of used “spheres”

  13. Used data • 210 K of chemical shifts used as a training set. • 170 K of chemical shifts from recent literature used as external validation set.

  14. How to describe atom type “Central” atom 7 (N) • Atom type (C, O, etc.). • Hybridization (sp3, sp2, etc). • Valence • Number of neighbor H. • Charge • Distance to “central” atom (bonds) 1 (sp3) 3 2 0 3 “Substitutent”

  15. Result fordifferent atom encoding

  16. Result fornumber of spheres

  17. Is it the best possible accuracy? • Best possible average deviation is 3.5 ppm. • We need less than 3 ppm (2 is preferable). • Should we use additional variables? • We should be very careful adding variables.

  18. 125,38 134,16 138,30 125,90 141,48 Substitutents interference (cross effect) +11,26 +2,48 122,90 136.64 127.86 145.42 D-1.94 D+1.34 D-3.94

  19. Enhanced structure encoding … Atom pair type CH2 and CH CandO … 1 1 Number of pairs Atoms Pairs of atoms (Crosses) Input variables

  20. Result foratom pairs (crosses) Mean error, ppm Distance between atoms within a cross Number of spheres

  21. More enhancements? • Now accuracy is good enough (2.3 ppm) • But it is still bad in some cases • Unfortunately these cases are very important • This “special” cases should be taken into account

  22. Stereo effects: double bonds • We use “topological” distance • Sometimes equal topological distance correspond to different “real” distances 25.7 3,9 A 17.6 2,9 A

  23. Modified structure encoding “Stereo” effects Atoms Pairs of atoms (Crosses) Variables

  24. Prediction of spectra by different methods (mean error, ppm)

  25. Size of training set • We have 1.5 millions of chemical shifts • We should try to use all available data • Only one problem – matrix size • In many cases matrix size becomes more than 2 GB

  26. Bigger dataset – smaller mean error!

  27. The final results Faster by 3 order!

  28. Prediction time: the past and present C29H32N2O5

  29. Conclusions • Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result

More Related