1 / 51

Chemical descriptors and molecular graphs

Problems and approaches in computational chemistry. Chemical descriptors and molecular graphs. Alessandra Roncaglioni - IRFMN. aroncaglioni@marionegri.it. Outline. Descriptors definition Structure  Descriptors Descriptors classification (bi- or tri- dimensional) Pros & Cons

raleigh
Download Presentation

Chemical descriptors and molecular graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problems and approaches in computational chemistry Chemical descriptors and molecular graphs Alessandra Roncaglioni - IRFMN aroncaglioni@marionegri.it

  2. Outline • Descriptors definition • Structure  Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  3. Introduction • Molecular descriptors are numerical values that characterize properties of molecules • Examples: • Physicochemical properties (empirical) • Values from algorithms, such as 2D fingerprints • Vary in complexity of encoded information and in compute time Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  4. Theoretical descriptors “A molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment” www.moleculardescriptors.eu Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  5. Desiderable descriptors characteristics • Invariance with respect to labelling and numbering of the molecule atoms • Invariance with respect to the molecule roto-translation • An unambiguous computable definition • Values in a suitable numerical range • allowing structural interpretation • no trivial correlation with other molecular descriptors • gradual change in its values with gradual changes in the molecular structure • widely applicable • preferably, allowing reversible decoding (back from the descriptor value to the structure) Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  6. Outline • Descriptors definition • Structure  Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  7. From chemical compounds to descriptors CAS RN. 145131-25-5 N-(2,6-Bis(1-methylethyl)phenyl)-N'-((1-(1-methyl-1H-indol-3-yl)cyclohexyl)methyl)urea CC(C)C1=CC=CC(C(C)C)=C1NC(=O)NCC2(CCCCC2)C3=CN(C)C4=C3C=CC=C4 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  8. Descriptors classification Depending on the structural dimensionality: • Up to 2D (0D-2D) Derived from the atomic composition and connectivity of molecules • 3D Encodingforenergetic and spatial information • Molecular interaction fields (MIF) Encodingforelectrostatic and stericvariation COMPLEXITY Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  9. 2D Descriptors (I) • Many groups accounting for different characteristics • May requires explicit H (check file format) • Fast to be calculated (almost all expert systems rely on 2D descriptors) • More reproducible (do not require 3D structure) but ... • Might be focused on local contribution neglecting intramolecular interactions • Ignore conformational flexibility Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  10. 2D Descriptors (II) but ... • Ignore stereo configuration • Not invariants to tautomerism      Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  11. 3D Descriptors (I) • Invarainttoroto-traslationalchanging • Theyrequireconformationalsearch • Followedby QM/MMoptimization Sampling Minimize Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  12. 3D Descriptors (II) • More complete and realistic description of relevant molecular characteristics • Can discriminate among isomers and provide hints to select the most stable tautomer but ... • Computationally more demanding • Involve stochastic steps: non deterministic result • Results depend upon the QM/MM theory used for the optimization • Referencestructure: minimum conformation in vacuumnotnecessairlybeing the bioactiveone Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  13. MIF (I) • Requires 3D conformationalligned in the Euclideanspace • Relatesvariation in the fieldwithvariation in the activity (3D-QSAR) St1 St2 … Stm El1 El2 … Elm Mol 1 … ……………………… Mol 2 … ……………………… … …………………………… … …………………………………………………………………………………………… … …………………………… Mol n … ……………………… Mol 1 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  14. MIF (II) Probes: N3+ sp3 Amine NH3 cation N2+ sp3 Amine NH2 cation N2: sp3 NH2 with lone pair N2= sp2 Amine NH2 cation N2 Neutral flat NH2 eg amide N1+ sp3 Amine NH cation N1: sp3 NH with lone pair N1= sp2 Amine NH cation N1 Neutral flat NH eg amide NH= sp2 NH with lone pair N1# sp NH with one hydrogen N: sp3 N with lone pair N:= sp2 N with lone pair N:# sp N with lone pair N-: Anionic tetrazole N NM3 Trimethyl-ammonium cation O sp2 carbonyl oxygen O:: sp2 Carboxy oxygen atom O- sp2 phenolate oxygen O= O of SO4 or sulfonamide OH Phenol or carboxy OH O1 Alkyl hydroxy OH group OC2 Ether oxygen OES sp3 ester oxygen atom ON Oxygen of nitro group OS O of sulfone / sulfoxide OH2 Water OFU Furan oxygen atom C3 Methyl CH3 group C1= sp2 CH aromatic or vinyl .... ............ .... ............ BOTH The amphipathic Probe DRY The hydrophobic Probe Countur map Green = steric +; Yellow = steric -; Red = charge -; Blue = charge + Steric interaction (van der Waals energy calculated by Lennard-Jones function) Electrostatic interaction (calculated by coulombian type function) ... ... ... Hydrogenbondingenergy Solvationenergy Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  15. MIF (III) • More biologically plausible (receptor interactions) • Identifies areas responsible for the variation of the activity but … • Very sensitive to conformation selection and to the chosen alignment • Proper selection of force fields • Large number of grid point cotribution • QSAR modelling complexity Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  16. Outline • Descriptors definition • Structure  Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  17. Typesofdescriptors • Constitutional descriptors • Topological descriptors (topological indexes, connectivity indexes, information contents) • Atom centred fragments • Functional groups • Fingerprints • Electrostatic descriptors(*) (charge descriptors) • Geometric descriptors* • Physico-chemical properties • Quantum- chemicaldescriptors* • Thermodynamicdescriptors(*) • Pharmacophores • WHIM & GETAWAY* • BCUT (or Burdeneigenvalues) • Autocorrelationdescriptors • EVA descriptors* * 3D descriptors 17 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  18. Constitutional descriptors • The most simple and commonly used descriptors • Reflecting the molecular composition of a compound without any information about its molecular geometry • Examples • Molecular weight • Count of atoms and bonds • Count of rings Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  19. Molecular graph • A molecular graph or chemical graph is a representation of the structural formula of a chemical compound in terms of graph theory. • It’s a very convenient and natural way of representing the relationships between objects: objects are represented by vertexes and the relationship between them by edges. . . . . . . . . . Vertex Edge Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  20. Topological descriptors • Calculated from the 2D graph of the molecule on the basis of connection tables or closely-related formats • e.g. the distance matrix • an N x N table showing the distance (in bonds) between each pair of atoms • Obtained by operations on the distance matrices and whose values are independent of vertex numbering or labelling (graph invariants) • Characterize structures according to size, degree of branching, and overall shape, symmetry and cycling Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  21. Connection table 1 O1 2 1 2 C 0 1 1 3 2 4 1 3 O 0 2 2 4 C 1 2 1 5 1 6 1 5 N2 4 1 6 C2 4 1 7 1 7 C0 6 1 8 2 12 1 8 C 1 7 2 9 1 9 C1 8 1 10 2 10 C 0 9 2 11 1 13 1 11 C 1 10 1 12 2 12 C 1 11 2 7 1 13 O 1 10 1 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  22. Distance matrix Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  23. Wiener index • Counts the number of bonds between pairs of atoms and sums the distances between all pairs • Add up all the off-diagonal elements and divide by 2 (because matrix is symmetrical) W = 268 23 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  24. Molecular connectivity indexes • A whole series of indexes, developed by Kier & Hall in the late ‘70s, following earlier work by Randić • Identify all possible subgraphs of different sizes in the molecule • Size of subgraph determines the order of the index • 0 bond subgraph gives a zero order index • 1-bond subgraph gives a 1st order index • 2-bond subgraph gives a 2nd order index • 3-bond subgraph gives a 3rd order index • ... Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  25. Randić index • Calculated from a the H-depleted molecular graph where each vertex is weighted by the vertex degree, i.e. the number of connected non-hydrogen atoms • Example: 1 3 .577 2 3 9 6 .333 3 .577 2 .707 3 .408 1 1 3 .577 1 valence at vertexes bond values as products of vertex valence edge terms as reciprocal of squared root of bond values Randić index = sum of edge terms = 3.179 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  26. Kier & Hall indexes • Chi indexes introduces valence values to encode sigma, pi, and lone pair electrons δi and δj (i ≠ j) = values of the atomic connectivity • Atomic connectivity δi is calculated by: Zi = tot nr electrons in the i-th atom Zi υ = nr of valence electrons Hi = nr H attached to the i-th atom 26 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  27. Kier Shape Indexes • Characterize aspects of molecular shape • Compare the molecule with the “extreme shapes” possible for that number of atoms • Based on the number of atoms (N) and the number of bonds (P) in the graph: • 1 = N (N-1)2 / P2 •  2 = (N-1) (N-2)2 / P2 •  3 = (N-1) (N-3)2 / P2 (if N is odd) •  3 = (N-3) (N-2)2 / P2 (if N is even) • alpha-modified kappa indexes can be generated taking into account the sizes of atoms, relative to C sp3 atom • A molecular flexibility index is derived from these  = 1 2/ N Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  28. Information content indexes • Defined on the basis of the Shannon information theory ni= nr of atoms in the i-th class n= tot nr of atoms in the molecule • Classes are determined by the coordination sphere taken into account, leading to indexes of different order k. • Other information content indices: SIC - structural IC CIC - complementary IC BIC - bonding IC q = nr of edges Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  29. Considerations about topological descriptors • Frequently used, easily calculated • It is often difficult to disclose the chemical meaning of highest order indexes • Topological indexes effectively encode the same information as fingerprint fragments • in a less obvious way • but can be processed numerically Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  30. Atom centred fragments & functional groups • Number of specific atom types in a molecule calculated by knowing the molecular composition and atom connectivities • Number of specific functional groups in a molecule, calculated by knowing the molecular composition and atom connectivities Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  31. 2D Fingerprints • Two types: • One based on a fragment dictionary • Each bit position corresponds to a specific substructure fragment • Fragments that occur infrequently may be more useful • Another based on hashed methods • Not dependent on a pre-defined dictionary • Any fragment can be encoded • Originally designed for substructure searching, not for molecular descriptors Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  32. Fragment dictionaries 000101000101000100000000011010100110101000000101000000001000 000101000101000100000000011010100110101000000001000000001000 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  33. Pharmacophores • Used in drug design • Based on atoms or substructures thought to be relevant for receptor binding: specification of the spatial arrangement of a small number of atoms or functional groups • Typically include H bond donors and acceptors, charged centers, aromatic ring centers and hydrophobic centers • With the model in hand, search databases for molecules that fit this spatial environment • Might be 3D Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  34. Creating a Pharmacophore Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  35. Physico-chemical Properties • Will hear about them during QSPR lesson • The key descriptor widespread in QSAR is hydrophobicity • LogP – the logarithm of the partition coefficient between n-octanol and water • LogD – correct LogP on the basis of the dissociated fraction of the compound • Experimentally assessed with shaker flask or reversed phase HPLC • It is often useful to be able to calculate a physico-chemical property for a compound from its structure Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  36. LogPcalculation • Many methods have been proposed for calculating a good estimate for LogP • Fragment-based methods (ClogP) • pioneered by Corwin Hansch and Al Leo (Pomona College) • identify large fragments, whose contribution to logP value is known from their occurrence in other compounds with measured logP • large “training set” of compounds with accurately-measured logP (the “Starlist”) • works very well if test compound has the right fragments • problems arise if test compound contains fragments that are “missing” from the training set 36 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  37. LogPcalculation • Atom-based methods (AlogP, XlogP, SlogP) • pioneered by Gordon Crippen (Univ. Michigan) • based on identifying a series of “atom types” in the molecule • essentially, small atom-centred fragments • usually 60-200 such fragments are involved • each atom-type is assigned a numerical value • logP is obtained by adding values for the atom types present in the test molecule • atom-type values are obtained by regression analysis, based on a set of compounds with measured logP • sometimes some extra correction factors are used too 37 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  38. Summary Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano Rognan D., British Journal of Pharmacology (2007) 152, 38–52

  39. Outline • Descriptors definition • Structure  Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  40. Quantitative Structure-Activity Relationships • Tomorrow … • Lessons 4&5 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  41. Chemoinformatics • Molecular database management • Reverse engineering • Chemical similarity assessment Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  42. Molecular similarity • The descriptors of a molecule can be considered a vector of attributes (properties). • The attributes may be real number (continuous variables) or they may be binary in nature (binary variables). For binary variables For continuous variables Tanimotosimilaritycoefficient (Range 0 to 1) (Range -.333 to +1) Hodgkinindex (Range –1 to +1) (Range 0 to 1) Euclideandistance (Range 0 to N) (Range 0 to ) a numnber of bits on for A b numnber of bits on for B c numnber of bits on for A AND B X are vectors Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  43. Drug design • Hightroughput virtual screening Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  44. Outline • Descriptors definition • Structure  Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  45. Software resources • Db of calculated descriptors • MOLE db http://michem.disat.unimib.it/mole_db/ • Commercial sw • CODESSA, Dragon, MDL, TSAR, .... • Free sw • Virtual Computational Chemistry Laboratory www.vvclab.org • MODEL - MolecularDescriptorLabhttp://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi • Open source sw/libraries • Chemistry Development Kit (CDK) http://almost.cubic.uni-koeln.de/cdk/cdk_top • Linux4Chemistry http://www.redbrick.dcu.ie/~noel/linux4chemistry/ Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  46. Further reading • Web • www.moleculardescriptors.eu • Book • “Handbook of Molecular Descriptors”. Roberto Todeschini and Viviana Consonni, Wiley-VCH, 2000. • Papers • Estrada,E., Molina,E. and Perdomo-López,I. (2001). Can 3D Structural Parameters Be Predicted from 2D (Topological) Molecular Descriptors? J.Chem.Inf.Comput.Sci., 41, 1015-1021. • Katritzky,A.R. and Gordeeva,E.V. (1993). Traditional Topological Indices vs Electronic, Geometrical, and Combined Molecular Descriptors in QSAR/QSPR Research. J.Chem.Inf.Comput.Sci., 33, 835-857. • Randic,M. (1990). The Nature of the Chemical Structure. J.Math.Chem., 4, 157-184. • Tetko,I.V. (2003). The WWW as a Tool to Obtain Molecular Parameters. Mini Reviews in Medicinal Chemistry, 3, 809-820. Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  47. Concluding remarks • Depending on the application define the preferred complexity level for chemical description • Avoid to use meaningless numbers: all descriptor types have advantages and limitations but easily interpretable descriptors might be preferred • Examples Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  48. Tautomers (I) Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  49. Tautomers (II) Predicted values for logBCF model Lipophilicitydescriptorvariation Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

  50. 3D descriptorsvariability (I) LUMO energy Intra Lab. Inter Lab. (PM3) Inter Lab. (AM1) Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

More Related